Bro2 Random Crashes

When initially I set up Bro2, it ran for a few days with no problems.

I recently noticed recently that Bro2 has started to crash randomly.
Typically it will take Bro2 5 or 6 hours before a crash, but sometimes
it crashes immediately.

I installed Bro2 on Ubuntu 10.04 LTS. Ubuntu is a xen VM on a Citrix
Xen server. The throughput on the network is < 400 Mb/s.

After taking a look at one of my crash reports, Seth suggested that
I'm running out of memory.

The VM I have initially started out with 4GB of RAM, but I bumped it
up to 8GB of RAM, only to get the same results. There was no time
difference, and there is no pattern as to why or when Bro2 crashes.

The only thing I've been able to key in on, is that over time Bro2
eventually causes all free memory in Ubuntu to change to cached
memory. From the Citrix Xen Console, that cached memory shows up as
used memory. So, maybe Xen interprets cached memory as being used?
Also -- when Xen senses that most of the memory is "used" (but,
really, it's cached inside Ubuntu), the percent utilization in one of
the CPUs in the VM spikes. After Bro2 crashes, CPU utilization
returns to normal, but memory is never freed -- it remains cached
forever.

I can dump the cache using

echo 1 > /proc/sys/vm/drop_caches

Which converts all the memory allocated as cached to free. Any chance
that this is related?

Anyone have ideas on what is causing Bro2 to crash?

-Chris

I have included some additional information below.

Here are the steps I used to install Bro2:

sudo aptitude -y install swig libmagic-dev libgeoip-dev cmake
build-essential flex bison libpcap-dev libssl-dev python-dev gawk
cd /tmp
wget http://www.bro-ids.org/downloads/release/bro-2.0.tar.gz
tar xvzf bro-2.0.tar.gz
cd bro-2.0
./configure
make
sudo make install
sudo chmod a+w /etc/bash.bashrc
sudo echo '' >> /etc/bash.bashrc
sudo echo 'export PATH=/usr/local/bro/bin:$PATH' >> /etc/bash.bashrc
sudo chmod go-w /etc/bash.bashrc

# Add to /usr/local/bro/etc/networks.cfg:
[...]

broctl install
broctl start

### End Installation ###

Here is a recent crash report:

core
[New Thread 5101]
Core was generated by `/usr/local/bro/bin/bro -i eth1 -U .status -p
broctl -p broctl-live -p standalon'.
Program terminated with signal 6, Aborted.
#0 0x00007f72aefcaa75 in raise () from /lib/libc.so.6

==== reporter.log
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path reporter
#fields ts level message location
#types time enum string string
1330462476.662662 Reporter::ERROR bro wasn't compiled with IPv6
support (empty)

==== stderr.log
listening on eth1, capture length 8192 bytes

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
/usr/local/bro/share/broctl/scripts/run-bro: line 60: 5101 Aborted
            (core dumped) nohup $mybro $@

==== stdout.log
unlimited
unlimited
unlimited

==== .cmdline
-i eth1 -U .status -p broctl -p broctl-live -p standalone -p local -p
bro local broctl broctl/standalone broctl/auto

==== .env_vars
PATH=/usr/local/bro/bin:/usr/local/bro/share/broctl/scripts:/usr/local/bro/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
BROPATH=/logs/bro/spool/policy/site::/logs/bro/spool/policy/auto:/usr/local/bro/share/bro:/usr/local/bro/share/bro/policy:/usr/local/bro/share/bro/site
CLUSTER_NODE=

==== .status
RUNNING [net_run]

==== No prof.log

==== packet_filter.log
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path packet_filter
#fields ts node filter init success
#types time string string bool bool
1330462468.693819 - not ip6 T T

==== loaded_scripts.log
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path loaded_scripts
#fields name
#types string
/usr/local/bro/share/bro/base/init-bare.bro
/usr/local/bro/share/bro/base/const.bif.bro
/usr/local/bro/share/bro/base/types.bif.bro
/usr/local/bro/share/bro/base/strings.bif.bro
/usr/local/bro/share/bro/base/bro.bif.bro
/usr/local/bro/share/bro/base/reporter.bif.bro
/usr/local/bro/share/bro/base/event.bif.bro
/usr/local/bro/share/bro/base/frameworks/logging/__load__.bro
/usr/local/bro/share/bro/base/frameworks/logging/./main.bro
/usr/local/bro/share/bro/base/logging.bif.bro
/usr/local/bro/share/bro/base/frameworks/logging/./postprocessors/__load__.bro
/usr/local/bro/share/bro/base/frameworks/logging/./postprocessors/./scp.bro
/usr/local/bro/share/bro/base/frameworks/logging/./postprocessors/./sftp.bro
/usr/local/bro/share/bro/base/frameworks/logging/./writers/ascii.bro
/usr/local/bro/share/bro/base/init-default.bro
/usr/local/bro/share/bro/base/utils/site.bro
/usr/local/bro/share/bro/base/utils/./patterns.bro
/usr/local/bro/share/bro/base/utils/addrs.bro
/usr/local/bro/share/bro/base/utils/conn-ids.bro
/usr/local/bro/share/bro/base/utils/directions-and-hosts.bro
/usr/local/bro/share/bro/base/utils/files.bro
/usr/local/bro/share/bro/base/utils/numbers.bro
/usr/local/bro/share/bro/base/utils/paths.bro
/usr/local/bro/share/bro/base/utils/strings.bro
/usr/local/bro/share/bro/base/utils/thresholds.bro
/usr/local/bro/share/bro/base/frameworks/notice/__load__.bro
/usr/local/bro/share/bro/base/frameworks/notice/./main.bro
/usr/local/bro/share/bro/base/frameworks/notice/./weird.bro
/usr/local/bro/share/bro/base/frameworks/notice/./actions/drop.bro
/usr/local/bro/share/bro/base/frameworks/notice/./actions/email_admin.bro
/usr/local/bro/share/bro/base/frameworks/notice/./actions/page.bro
/usr/local/bro/share/bro/base/frameworks/notice/./actions/add-geodata.bro
/usr/local/bro/share/bro/base/frameworks/notice/./extend-email/hostnames.bro
/usr/local/bro/share/bro/base/frameworks/cluster/__load__.bro
/usr/local/bro/share/bro/base/frameworks/cluster/./main.bro
/usr/local/bro/share/bro/base/frameworks/control/__load__.bro
/usr/local/bro/share/bro/base/frameworks/control/./main.bro
/usr/local/bro/share/bro/base/frameworks/notice/./actions/pp-alarms.bro
/usr/local/bro/share/bro/base/frameworks/dpd/__load__.bro
/usr/local/bro/share/bro/base/frameworks/dpd/./main.bro
/usr/local/bro/share/bro/base/frameworks/signatures/__load__.bro
/usr/local/bro/share/bro/base/frameworks/signatures/./main.bro
/usr/local/bro/share/bro/base/frameworks/packet-filter/__load__.bro
/usr/local/bro/share/bro/base/frameworks/packet-filter/./main.bro
/usr/local/bro/share/bro/base/frameworks/packet-filter/./netstats.bro
/usr/local/bro/share/bro/base/frameworks/software/__load__.bro
/usr/local/bro/share/bro/base/frameworks/software/./main.bro
/usr/local/bro/share/bro/base/frameworks/communication/__load__.bro
/usr/local/bro/share/bro/base/frameworks/communication/./main.bro
/usr/local/bro/share/bro/base/frameworks/metrics/__load__.bro
/usr/local/bro/share/bro/base/frameworks/metrics/./main.bro
/usr/local/bro/share/bro/base/frameworks/metrics/./non-cluster.bro
/usr/local/bro/share/bro/base/frameworks/intel/__load__.bro
/usr/local/bro/share/bro/base/frameworks/intel/./main.bro
/usr/local/bro/share/bro/base/frameworks/reporter/__load__.bro
/usr/local/bro/share/bro/base/frameworks/reporter/./main.bro
/usr/local/bro/share/bro/base/protocols/conn/__load__.bro
/usr/local/bro/share/bro/base/protocols/conn/./main.bro
/usr/local/bro/share/bro/base/protocols/conn/./contents.bro
/usr/local/bro/share/bro/base/protocols/conn/./inactivity.bro
/usr/local/bro/share/bro/base/protocols/dns/__load__.bro
/usr/local/bro/share/bro/base/protocols/dns/./consts.bro
/usr/local/bro/share/bro/base/protocols/dns/./main.bro
/usr/local/bro/share/bro/base/protocols/ftp/__load__.bro
/usr/local/bro/share/bro/base/protocols/ftp/./utils-commands.bro
/usr/local/bro/share/bro/base/protocols/ftp/./main.bro
/usr/local/bro/share/bro/base/protocols/ftp/./file-extract.bro
/usr/local/bro/share/bro/base/protocols/http/__load__.bro
/usr/local/bro/share/bro/base/protocols/http/./main.bro
/usr/local/bro/share/bro/base/protocols/http/./utils.bro
/usr/local/bro/share/bro/base/protocols/http/./file-ident.bro
/usr/local/bro/share/bro/base/protocols/http/./file-hash.bro
/usr/local/bro/share/bro/base/protocols/http/./file-extract.bro
/usr/local/bro/share/bro/base/protocols/irc/__load__.bro
/usr/local/bro/share/bro/base/protocols/irc/./main.bro
/usr/local/bro/share/bro/base/protocols/irc/./dcc-send.bro
/usr/local/bro/share/bro/base/protocols/smtp/__load__.bro
/usr/local/bro/share/bro/base/protocols/smtp/./main.bro
/usr/local/bro/share/bro/base/protocols/smtp/./entities.bro
/usr/local/bro/share/bro/base/protocols/smtp/./entities-excerpt.bro
/usr/local/bro/share/bro/base/protocols/ssh/__load__.bro
/usr/local/bro/share/bro/base/protocols/ssh/./main.bro
/usr/local/bro/share/bro/base/protocols/ssl/__load__.bro
/usr/local/bro/share/bro/base/protocols/ssl/./consts.bro
/usr/local/bro/share/bro/base/protocols/ssl/./main.bro
/usr/local/bro/share/bro/base/protocols/ssl/./mozilla-ca-list.bro
/usr/local/bro/share/bro/base/protocols/syslog/__load__.bro
/usr/local/bro/share/bro/base/protocols/syslog/./consts.bro
/usr/local/bro/share/bro/base/protocols/syslog/./main.bro
/logs/bro/spool/policy/site/local.bro
/usr/local/bro/share/bro/policy/misc/loaded-scripts.bro
/usr/local/bro/share/bro/policy/tuning/defaults/__load__.bro
/usr/local/bro/share/bro/policy/tuning/defaults/./packet-fragments.bro
/usr/local/bro/share/bro/policy/tuning/defaults/./warnings.bro
/usr/local/bro/share/bro/policy/frameworks/software/vulnerable.bro
/usr/local/bro/share/bro/policy/frameworks/software/version-changes.bro
/usr/local/bro/share/bro/policy/protocols/ftp/software.bro
/usr/local/bro/share/bro/policy/protocols/smtp/software.bro
/usr/local/bro/share/bro/policy/protocols/ssh/software.bro
/usr/local/bro/share/bro/policy/protocols/http/software.bro
/usr/local/bro/share/bro/policy/protocols/dns/detect-external-names.bro
/usr/local/bro/share/bro/policy/protocols/ftp/detect.bro
/usr/local/bro/share/bro/policy/protocols/conn/known-hosts.bro
/usr/local/bro/share/bro/policy/protocols/conn/known-services.bro
/usr/local/bro/share/bro/policy/protocols/ssl/known-certs.bro
/usr/local/bro/share/bro/policy/protocols/ssl/cert-hash.bro
/usr/local/bro/share/bro/policy/protocols/ssl/validate-certs.bro
/usr/local/bro/share/bro/policy/protocols/ssh/geo-data.bro
/usr/local/bro/share/bro/policy/protocols/ssh/detect-bruteforcing.bro
/usr/local/bro/share/bro/policy/protocols/ssh/interesting-hostnames.bro
/usr/local/bro/share/bro/policy/protocols/http/detect-MHR.bro
/usr/local/bro/share/bro/policy/protocols/http/detect-sqli.bro
/usr/local/bro/share/bro/broctl/__load__.bro
/usr/local/bro/share/bro/broctl/./main.bro
/usr/local/bro/share/bro/policy/frameworks/control/controllee.bro
/usr/local/bro/share/bro/policy/frameworks/communication/listen.bro
/usr/local/bro/share/bro/broctl/standalone.bro
/logs/bro/spool/policy/auto/standalone-layout.bro
/usr/local/bro/share/bro/policy/misc/trim-trace-file.bro
/usr/local/bro/share/bro/broctl/auto.bro
/logs/bro/spool/policy/auto/local-networks.bro
/logs/bro/spool/policy/auto/broctl-config.bro

After taking a look at one of my crash reports, Seth suggested that
I'm running out of memory.

…snip...

==== stderr.log
listening on eth1, capture length 8192 bytes

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

For a bit of extra context, this was the line that made me think he was running out of memory. I used to see this more often than I wish I ever did when I ran an extremely memory starved cluster (and before many of the memory leaks were fixed for 2.0).

  .Seth

I am using broctl to manage a single, local instance of bro. Would it
make a difference if I just ran bro by hand, without broctl?

-Chris

Cached memory IS used memory by the OS. Modern operating systems are very aggressive about using all available "physical" memory as a disk cache when otherwise not being used, especially Linux variants. But at the same time, it should also aggressively evict those entries when it think that there are limits on physical memory. [1]

So naturally, a virtual machine should see the disk cache as being "used" memory, since it is being used by the guest OS. But at the same time, this should not necessarily be a problem, since the guest OS in the virtual machine is managing this memory and will evict it for other, higher priority uses on demand.

But it does bring up a question: within the virtual machine, how much physical memory does the VM think it has? How does that compare to the actual allocation to the VM environment?

That's a good point. I wonder if Xen is telling the guest OS that it still has memory available but then it eventually rejects a request for more memory?

  .Seth

Indeed. I found in several other cases that Linux caching strategy can be really annoying. I'm regularly having performance problem on with high throughput workloads....

cu
Gregor

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Specifically:

dirty_background_ratio: Primary tunable to adjust, probably downward. If your goal is to reduce the amount of data Linux keeps cached in memory, so that it writes it more consistently to the disk rather than in a batch, lowering dirty_background_ratio is the most effective way to do that. It is more likely the default is too large in situations where the system has large amounts of memory and/or slow physical I/O.

You might also take a look at:

http://www.mjmwired.net/kernel/Documentation/cgroups/memory.txt

HTH

--Gilbert

Thanks for the suggestions. I ended up using tcpdump to do the
network capture. I wrote a cron job to regularly run bro over the the
files generated by tcpdump. Definitely not as ideal as having bro run
continuously, but it gets the job done reliably without crashing.

This might be an interesting issue for Bro in the future, though.
Running bro continuously from a xen VM would be incredibly useful.
I'd be curious if anybody can reproduce the crash conditions, or if it
is just something weird about my setup or environment.

-Chris