crash with std::bad_alloc

Hi!

I am running my own bro policy script (bro 1.4, debian lenny), together with a set of signatures that should be matched. After a few hours of runtime bro always and repeatably crashes, with the following error message:

terminate called after throwing an instance of 'std::bad_alloc'
   what(): std::bad_alloc

Bit by bit I stripped parts from my script in order to find the critical
part, and I ended up with a script as trivial as:

@load conn
@load notice
@load notice-action-filters

redef use_connection_compressor = F;
redef capture_filters = {["ALL"] = ""};
redef dpd_match_only_beginning = F;
redef local_nets[...];

redef signature_files += "./my_signatures.sig";

The critical part seems to be the signature matching. When including my
signatures, the error occurs. When outcommenting the last redef line, it
works without crashing (at least for a much longer time until I
terminate it deliberately, I cannot be totally sure that it hadn't crashed later). Note, that I don't even handle the signature
matches anymore, still the error occurs. My signature file is approx
100Kb, contains more than 600 signatures, and all of them look like:

signature xxx {
         dst-ip == local_nets
         event "xxx"
         payload /xxx/
}

I'd be very happy about learning what exactly causes the error, and of
course how to avoid it.

Regards,
Peter.

terminate called after throwing an instance of 'std::bad_alloc'
   what(): std::bad_alloc

That sounds like Bro is running out of memory. What's the process'
size just before it crashes and how much memory does the machine
have?

matches anymore, still the error occurs. My signature file is approx
100Kb, contains more than 600 signatures, and all of them look like:

If it's indeed memory exhaustion, then it looks like either a memory
leak in the signature engine or a general problem of handling the
many regexps. Generally, the engine can consume quite a bit of
memory due to the DFAs it builds incrementally. How do your regexps
look like? Do they contain many unanchored subparts (e.g.,
"foo.*bar")?

Robin

Robin writes:

That sounds like Bro is running out of memory. What's the process'
size just before it crashes and how much memory does the machine
have?

Note, you can track resource consumption over time by loading
either stats.bro (lightweight) or profiling.bro (more info but
larger performance hit).

    Vern

Since this is a common problem people encounter, these policies are
explained (to some extent) at:
http://www.bro-ids.org/wiki/index.php/Development_HOWTOs#How_to_understand_memory_consumption

Robin Sommer wrote:

terminate called after throwing an instance of 'std::bad_alloc'
   what(): std::bad_alloc

That sounds like Bro is running out of memory. What's the process'
size just before it crashes and how much memory does the machine
have?

I reran my code using profiling.bro. The memory consumption continuously increased, and the last lines before crashing were:

Memory: total=3126520K total_adj=3116888K malloced: 2878549K
Run-time: user+sys=2861.5 user=2528.8 sys=332.7 real=3390.1
Conns: total=915256 current=19795/19795 ext=0 mem=0K avg=0.0 table=0K connvals=0K
ConnCompressor: pending=0 pending_in_mem=0 full_conns=0 pending+real=0 mem=0K avg=nan/nan
Conns: tcp=7431/8335 udp=11845/20278 icmp=519/783
TCP-States: Inact. Syn. SA Part. Est. Fin. Rst.
TCP-States:Inact. 16 159 2 3
TCP-States:Syn. 106 243 147 14
TCP-States:SA 1 185 64 1
TCP-States:Part. 38 1436 101 175 9
TCP-States:Est. 2052 1023 22
TCP-States:Fin. 3 258 1014 261 6
TCP-States:Rst. 8 12 63 9
Connections expired due to inactivity: 697053
Total reassembler data: 134947K
Timers: current=30215 max=34256 mem=1652K lag=0.00s
          ConnectionDeleteTimer = 180
          ConnectionInactivityTimer = 19688
          NetworkTimer = 1
          ScheduleTimer = 241
          TableValTimer = 2
          TCPConnectionAttemptTimer = 238
          TCPConnectionExpireTimer = 9859
          TCPConnectionResetTimer = 6

matches anymore, still the error occurs. My signature file is approx
100Kb, contains more than 600 signatures, and all of them look like:

If it's indeed memory exhaustion, then it looks like either a memory
leak in the signature engine or a general problem of handling the
many regexps. Generally, the engine can consume quite a bit of
memory due to the DFAs it builds incrementally. How do your regexps
look like? Do they contain many unanchored subparts (e.g.,
"foo.*bar")?

Yes, '.*' is massively used. Actually, that is the only regexp feature that is used. The patterns generally look like: ".*byte_seq1.*byte_seq2.*byte_seq3.*"

Peter.

Try running Bro for a limited amount of time and load print-globals.bro,
as pointed out at:
http://www.bro-ids.org/wiki/index.php/Development_HOWTOs#print-globals.bro

Upon termination, Bro will tell you the amount of memory your global
variables have accumulated. If you have a state-keeping problem in one
of those variables, you'll spot the issue there.

Hi Peter,
or another idea: load heavy-analysis.bro for reducing *_timeout.
Regards
Rmkml
Crusoe-Researches.com

or another idea: load heavy-analysis.bro for reducing *_timeout.

heavy-analysis does the opposite - raises timeouts and increases resource
consumption. (It won't affect memory consumption due to massive .*
regular expressions either way.)

    Vern

Memory: total=3126520K total_adj=3116888K malloced: 2878549K

Yeah, that's a lot ...

".*byte_seq1.*byte_seq2.*byte_seq3.*"

I'm guessing that these are indeed the problem, assuming there's no
leak somewhere. Having lots of such patterns is essentially the
worst case for a DFA-based pattern matcher (recall that Bro
internally combines many of these into a *one* regexp, which will
let the number of states explode).

Three things you could try:

(a) there is a tuning option for the signature engine which tells
Bro how many regexps to combine internally into Big Ones. It's
called sig_max_group_size and the default is 50. It might help to
reduce this quite a bit (e.g., 10 or 20).

(b) you could split each signature into several, one for each
component of the regexp (byte_seq1, byte_seq2, ...), and then either
chain these signatures with requires_signature condititions, or
raise an event for each one individually and correlate the matches
on the script-level to find out when all have matched. Both
approahces have the disadvantage that they don't consider the order
in which the subpatterns appear.

(c) this one is kind of scary. :slight_smile: There's a configure option
--expire-dfa-states which enables some internal code to limit the
size of the DFAs Bro builds (by expiring less frequently used states
and recalculating them later if necessary). Enabling this has quite
a performance impact on the matching process but even more worse is
the fact that this option has most likely not been used by anybody
for >5 years ... I'd almost bet it's broken in some way but you can
still give it a try ...

Robin

Hi!

I have applied suggestion (a) and set sig_max_group_size = 10. That
greatly improved the situation. I have been running Bro with my
signature set for a week now without a problem concerning memory
(before, it used to crash after hours), and from the prof.log it looks
like there is still a lot of headroom.

Thanks a bunch!

Peter.

Robin Sommer wrote:

Great to hear, thanks for the update. Do you see any significant
change in CPU usage after the change? I'm wondering whether it might
make send to change the default value.

Robin