Bro Memory Consumtion

Bro Gurus,

I am having an issue with Bro and memory exhaustion. Currently I’m using click on a system with 8 x CPU cores to break up a network tap into three virtual interfaces (tap0, tap1 and tap2). I’m then running my Bro cluster on the same machine with a three workers operating on different CPU cores and virtual interfaces. The system has 16G of physical RAM. After running for about 24 hours or so all of the physical RAM is exhausted and Bro being to go after swap. I increased swap to 8GB but this is a never ending battle as Bro will eventually eat everything it can find and crash the system.

How do I go about diagnosing which scripts/policies are causing this, or if it is an internal memory leak somewhere? I have seen references to reduce-memory.bro and profile.bro in some of the Wiki and or mailing list searches but these don’t appear to be in the current 1.5.1 release.

I am running a large number of scripts from Seth Hall’s script repository in addition to the ones that are enabled by default. Below are the policies I’m loading in local.bro:

@load alarm

@load notice

@load weird

@load dpd

@load detect-protocols

@load detect-protocols-http

@load dyn-disable

@load inactivity

@load dns

@load dns-lookup

@load finger

@load frag

@load ftp

@load icmp

@load hot

@load http-request

#@load http-reply

@load ident

@load irc

@load irc-bot

@load login

@load ntp

@load pop3

@load portmapper

@load scan

@load smtp

@load ssh

@load ssl

@load synflood

@load tcp

@load tftp

@load udp

@load worm

Seth Hall Scripts

@load dns-passive-replication

@load http-identified-files

redef HTTP::ignored_urls = /^http://(www.download.windowsupdate.com)|(download.windowsupdate.com)|(au.download.windowsupdate.com)|(download.microsoft.com)|(office.microsoft.com)//;

@load known-hosts

@load known-services

@load logging.dns-ext

@load logging.ftp-ext

@load logging.http-ext

@load logging.smtp-ext

@load logging.ssh-ext

@load smtp-ext-count-rejects

@load ssh-ext

@load ssl-ext

redef SSH::authentication_data_size = 4000;

Thanks,

Scott Powell

Unix Systems Engineer / Information Security Analyst

Office of the CIO - Information Systems (OCIO-IS)

Medical University of South Carolina

powellsm@musc.edu

(843) 792-6651

Bro Gurus,

I am having an issue with Bro and memory exhaustion. Currently I'm using
click on a system with 8 x CPU cores to break up a network tap into three
virtual interfaces (tap0, tap1 and tap2). I'm then running my Bro cluster on
the same machine with a three workers operating on different CPU cores and
virtual interfaces. The system has 16G of physical RAM. After running for
about 24 hours or so all of the physical RAM is exhausted and Bro being to go
after swap. I increased swap to 8GB but this is a never ending battle as Bro
will eventually eat everything it can find and crash the system.

I'm having a similar problem, but it usually takes about 4 days to get that bad
here. I've been considering just going back to restarting bro every day in the
middle of the night like I used to. I used to do that before I installed
broctl, as it was the easiest way to rotate the logs every day.

redef HTTP::ignored_urls = /^http:\/\/(www\.download\.windowsupdate\.com)|(download\.windowsupdate\.com)|(au\.download\.windowsupdate\.com)|(download\.microsoft\.com)|(office\.microsoft\.com)\//;

FYI, I don't think that regex matches what you think it does.. the way the |'s are positioned it matches:

    /^http:\/\/(www\.download\.windowsupdate\.com)
  > (download\.windowsupdate\.com)
  > (au\.download\.windowsupdate\.com)
  > (download\.microsoft\.com)
  > (office\.microsoft\.com)\//;

Basically you have

   /^http:\/\/(site)|(site)|(site)|(site)\//;

You want something like this:

   /^http:\/\/(site|site|site|site)\//;

Bro Gurus,

I am having an issue with Bro and memory exhaustion.

..

I am running a large number of scripts from Seth Hall's script repository in
addition to the ones that are enabled by default. Below are the policies I'm
loading in local.bro:

Hi all, following up on this..

4 days ago I merged my bro policy with the latest updates from Seth, and
since then the memory usage on my bro machine has flatlined(it used to look
like a sawtooth wave from being restarted all the time). I'm not sure what the
cause was, but my guess is something to do with the http file identification.
The latest version of the script uses bro signatures instead of libmagic for
file identification. I wonder if the libmagic code has a memory leak in it
somewhere?

If you are still having memory problems it would be really interesting to
see if updating fixes things for you as well.

Doh! It's certainly possible (I wrote the BiF for that). I'll write a test for it today.

   .Seth

I synced my scripts up with the latest and greatest from Seth's repository but am still seeing Bro consume all 16gb of memory after only an hour or two. When time permits I will try to debug further to see if I can narrow it down to a particular script/policy.

I forgot to mention, the name of the policy for the file detection changed..
Are you still loading http-identified-files or are you loading
http-ext-identified-files?

I am loading the new one (http-ext-identified-files). I completely removed the old script as well as its @load statement.

Here are the scripts of Seth's that I'm currently running:

@load dns-passive-replication
@load http-ext-identified-files
@load http-hash
@load known-hosts
@load known-services
@load logging.ftp-ext
@load logging.http-ext
@load logging.smtp-ext
@load logging.ssh-ext
@load smtp-ext-count-rejects
@load software-ext
@load ssh-ext
@load ssl-ext

-Scott

If it's using that much memory that quickly, my guess would be that there is a state table growing out of control. Load the "profiling" script, it will print out globals sizes every 20 minutes or so in a file named prof.log, then and you'll be able to see what variable(s) is/are so huge.

If you are able to find out what variable is causing the memory consumption issue, please reply and let us know. It may be an issue that needs to be resolved or at least addressed in some way.

   .Seth

Instead of loading each of the "logging." scripts, you could just load enable-ext-logging at the top.

Oh! I think I just noticed your problem (and it's my fault!). Remove dns-passive-replication.bro from your list of scripts and I think your memory problems will go away. The two dns scripts need work still. I may merge the two together at some point, but they don't clean up after themselves very well yet and they *do* cause bad memory consumption problems. Sorry about that! I really need to get all of the documentation written for my scripts. :slight_smile:

   .Seth

I just moved both of the dns scripts into the testing/ directory to clear up any confusion about their stability. :slight_smile: When I get time and make them better with memory I'll move them back to the main directory.

   .Seth

Seth,

Thanks. I'm now running without the DNS scripts and have profiling enabled. I will see how it goes. Right now Bro is using about 4.5GB between the manager, proxy and my three workers (all running on the same system w/click splitting up the tap). I was restarting each day at 1am but I have commented out the cron. I'll check it in the morning and see if things are cleaning up after themselves.

Thanks,
Scott

Seth and all,

My memory consumption is better but still growing and not shrinking. I've been examining the globals in the prof.log files for each of the various components (workers, manager, etc.) but am not sure what is causing so much memory to be allocated. Below is an example from one of my workers. There is ~3.6GB of memory allocated, total, but the globals are only 214MB. This is replicated across my three workers... plus the memory being used by the manager and proxy... so grand total I'm now up to ~12GB of allocated memory and it continues to grow.

Mar 19 13:42:20 ------------------------
Mar 19 13:42:20 Memory: total=3821576K total_adj=3765668K malloced: 3814486K
Mar 19 13:42:20 Run-time: user+sys=55905.4 user=53574.3 sys=2331.1 real=99872.4
Mar 19 13:42:20 Conns: total=12370755 current=4998/859 ext=0 mem=3372528K avg=3926.1 table=3430K connvals=2328K
Mar 19 13:42:20 ConnCompressor: pending=36 pending_in_mem=582 full_conns=-4895 pending+real=4175 mem=48K avg=1368.7/84.7
Mar 19 13:42:20 Conns: tcp=0/0 udp=844/1984 icmp=15/50
Mar 19 13:42:20 TCP-States: Inact. Syn. SA Part. Est. Fin. Rst.
Mar 19 13:42:20 TCP-States:Inact. 76 2 4
Mar 19 13:42:20 TCP-States:Syn.
Mar 19 13:42:20 TCP-States:SA
Mar 19 13:42:20 TCP-States:Part. 12 755 1 26
Mar 19 13:42:20 TCP-States:Est. 2412 98 7
Mar 19 13:42:20 TCP-States:Fin. 8 5 62 416 3
Mar 19 13:42:20 TCP-States:Rst. 95 12 90 54 1
Mar 19 13:42:20 Connections expired due to inactivity: 2012770
Mar 19 13:42:20 Total reassembler data: 236K
Mar 19 13:42:20 RuleMatcher: matchers=2 dfa_states=599 ncomputed=9765 mem=1309K avg_nfa_states=19
Mar 19 13:42:20 Timers: current=12852 max=19240 mem=1004K lag=0.00s
Mar 19 13:42:20 ConnectionDeleteTimer = 590
Mar 19 13:42:20 ConnectionInactivityTimer = 6874
Mar 19 13:42:20 DNSExpireTimer = 385
Mar 19 13:42:20 NetworkTimer = 1
Mar 19 13:42:20 NTPExpireTimer = 60
Mar 19 13:42:20 RotateTimer = 35
Mar 19 13:42:20 ScheduleTimer = 840
Mar 19 13:42:20 TableValTimer = 79
Mar 19 13:42:20 TCPConnectionAttemptTimer = 255
Mar 19 13:42:20 TCPConnectionExpireTimer = 3733
Mar 19 13:42:20 Global_sizes > 100k: 0K
Mar 19 13:42:20 SSH::did_ssh_version = 24K (109/109 entries)
Mar 19 13:42:20 Login::login_sessions = 122K (140/140 entries)
Mar 19 13:42:20 SMTP::smtp_sessions = 973K (17/17 entries)
Mar 19 13:42:20 KnownServices::established_conns = 191K (386/386 entries)
Mar 19 13:42:20 ssl_cipher_desc = 30K (106/106 entries)
Mar 19 13:42:20 dpd_analyzer_ports = 128K (35/700 entries)
Mar 19 13:42:20 Scan::rops_idx = 39K (171/171 entries)
Mar 19 13:42:20 notice_tags = 262K (690/690 entries)
Mar 19 13:42:20 KnownHosts::known_hosts = 1861K (14160/14160 entries)
Mar 19 13:42:20 Login::output_trouble = 399K
Mar 19 13:42:20 DNS::distinct_PTR_requests = 481K (648/648 entries)
Mar 19 13:42:20 Scan::distinct_ports = 5880K (5376/20084 entries)
Mar 19 13:42:20 HTTP::http_sessions = 9018K (1697/1697 entries)
Mar 19 13:42:20 ssl_connections = 2436K (905/905 entries)
Mar 19 13:42:20 ftp_cmd_reply_code = 40K (273/273 entries)
Mar 19 13:42:20 Weird::weird_ignore = 99K (94/188 entries)
Mar 19 13:42:20 DNS::distinct_answered_PTR_requests = 45K (145/145 entries)
Mar 19 13:42:20 SMTP::reject_counter = 5115K (9475/9475 entries)
Mar 19 13:42:20 Scan::distinct_backscatter_peers = 269K (126/724 entries)
Mar 19 13:42:20 DetectProtocolHTTP::conns = 438K (470/940 entries)
Mar 19 13:42:20 HTTP::sql_injection_regex = 603K
Mar 19 13:42:20 Scan::accounts_tried = 94K (96/222 entries)
Mar 19 13:42:20 Portmapper::rpc_programs = 35K (129/129 entries)
Mar 19 13:42:20 HTTP::known_user_agents = 10475K (8027/29020 entries)
Mar 19 13:42:20 Scan::possible_scan_sources = 14K (106/106 entries)
Mar 19 13:42:20 IRC::active_channels = 334K (47/47 entries)
Mar 19 13:42:20 ssl_sessionIDs = 117981K (27276/27276 entries)
Mar 19 13:42:20 FTP::hot_files = 112K
Mar 19 13:42:20 Scan::pre_distinct_peers = 31560K (35230/72640 entries)
Mar 19 13:42:20 HTTP::sensitive_URIs = 519K
Mar 19 13:42:20 DetectProtocolHTTP::protocols = 278K (7/7 entries)
Mar 19 13:42:20 Scan::distinct_low_ports = 89K (98/196 entries)
Mar 19 13:42:20 IRC::active_users = 525K (96/96 entries)
Mar 19 13:42:20 Scan::scan_triples = 7386K (106/17547 entries)
Mar 19 13:42:20 Software::host_software = 9502K (5079/10272 entries)
Mar 19 13:42:20 DNS::dns_sessions = 1011K (629/629 entries)
Mar 19 13:42:20 Scan::distinct_peers = 4584K (571/30458 entries)
Mar 19 13:42:20 HTTP::suspicious_http_posts = 733K
Mar 19 13:42:20 KnownServices::known_services = 42K (261/261 entries)
Mar 19 13:42:20 Login::input_trouble = 108K
Mar 19 13:42:20 Weird::weird_action = 39K (170/170 entries)
Mar 19 13:42:20 HTTP::conn_info = 3007K (759/759 entries)
Mar 19 13:42:20 Global_sizes total: 219225K
Mar 19 13:42:20 Total number of table entries: 115411/243104
Mar 19 13:42:35 ------------------------

PID 5562 (manager): 74112K 72.375M
PID 5932 (manager): 201492K 196.77M
PID 5950 (manager): 96240K 93.9844M
PID 5962 (proxy-1): 74112K 72.375M
PID 5974 (proxy-1): 144396K 141.012M
PID 5975 (proxy-1): 97824K 95.5312M
PID 6002 (worker-1): 74112K 72.375M
PID 6038 (worker-1): 3608784K 3524.2M
PID 6042 (worker-1): 94884K 92.6602M
PID 5999 (worker-2): 74112K 72.375M
PID 6036 (worker-2): 3966504K 3873.54M
PID 6040 (worker-2): 94888K 92.6641M
PID 6001 (worker-3): 74112K 72.375M
PID 6037 (worker-3): 3930396K 3838.28M
PID 6041 (worker-3): 93916K 91.7148M
Total: 12.1116G

Any ideas?

Thanks,
Scott

I think your memory usage is not too bad. Here is mine based on the output of top:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19954 root 16 0 575m 512m 3264 R 36.1 6.4 2786:58 bro
19959 root 16 0 608m 546m 3208 R 26.3 6.8 2836:23 bro
19957 root 16 0 1876m 213m 3044 R 25.3 2.7 798:22.93 bro
19960 root 15 0 2269m 1.3g 2972 S 13.8 16.2 583:47.36 bro
19955 root 15 0 2242m 1.5g 2940 R 13.5 19.7 598:52.38 bro
19956 root 15 0 4381m 1.4g 3116 R 13.5 17.7 3207:07 bro
19958 root 15 0 2240m 94m 2892 R 5.9 1.2 1483:13 bro
20261 root 21 5 81400 3184 464 R 5.3 0.0 47:04.14 bro
20254 root 20 5 81400 3832 464 R 3.9 0.0 32:29.51 bro
20257 root 21 5 81400 2996 464 R 3.6 0.0 30:25.10 bro
20258 root 20 5 82584 4156 464 R 3.0 0.1 32:05.68 bro
20259 root 20 5 81664 3784 464 R 3.0 0.0 32:17.27 bro
20260 root 20 5 82584 4240 464 R 3.0 0.1 33:45.70 bro
20256 root 20 5 82664 3292 464 R 2.0 0.0 30:29.67 bro
14161 root 15 0 12740 1096 820 R 0.3 0.0 0:00.21 top

Your issue has been nagging me the past few days because I couldn't explain why your memory use is so high. Today I finally realized what it could be. Did you provide the '--enable-brov6' flag when you built Bro? Even more worthwhile, could you provide the full configure line you used when you built Bro? (it's in the config.log file in the directory extracted from the tar.gz)

   .Seth

Seth,

Yes, I did include '--enable-brov6' because we are getting ready to rollout IPv6 in or perimeter and I was also seeing messages from Bro that it was not compiled with IPv6 support (via "broctl diag").

Here are all the parameters passed to configure:

$ ./configure --prefix=/var/local/bro-1.5.1 --enable-cluster --enable-int64 --enable-brov6 --no-create --no-recursion

-Scott

Rebuild Bro without brov6 and int64 for now. Currently when you enable IPv6, all IP addresses consume 128-bits of memory (even IPv4 addresses!). You can see that this is what's happening by looking at the line in your prof.log that starts with "Conns:". It indicates that memory consumed just by connection state is over 3G (3372528K).

There has been talk about changing things around so that IPv4 addresses still only take up 32-bits of memory even when IPv6 is enabled, but I don't know where those discussions ended and I don't know how difficult of a change that would be to make. Maybe Robin or Vern will comment on that? :slight_smile:

The IPv6 code has not been tested all that well either, so it's also possible that there are some memory leaks or other bugs lurking that could lead to high memory use.

   .Seth

Yeah, that's quite tricky. There's a ticket summarizing an earlier
discussion:
        
    http://tracker.icir.org/bro/ticket/68

Robin

I recompiled without IPv6 and int64 today and so far my memory footprint is considerably lower, as expected. I will keep an eye on it over the next few days (I have disabled my nightly restart cron) and see how it behaves.

We have just brought IPv6 to our border router and will soon be testing it in the perimeter. Hopefully by the time we get anywhere close to wide spread usage Bro will have better support for it. Wishful thinking, huh? :slight_smile:

Seth, I wanted to circle back around on this. This was definitely the issue as my memory usage has now flat lined. I have not restarted Bro in 4 days and my total memory usage is < 3GB for all workers, proxy and manager combined.

Thanks for the help.

-Scott

This was definitely the issue as my memory usage has now flat lined. I have not restarted Bro in 4 days and my total memory usage is < 3GB for all workers, proxy and manager combined.

Awesome!

Thanks for the help.

No problem, I'm glad that helped. I'm taking a look at some of the IPv6 code now to see if there is anything I can do to help reduce memory usage because I'd also like to be able to run Bro with IPv6 enabled.

   .Seth