local.bro causing memory leak

Benjamin_Wood · March 19, 2018, 7:31pm

I’ve got some custom log names happening, and it’s causing a memory leak. Bro never closes the file descriptors or releases the objects. This is causing the manager to crash over a period of time.

I’m running my cluster with broctl, and rotation is turned off because I’m naming files with a timestamp to begin with.

Any suggestions on how to perform a periodic “clean up”?

function datepath(id: Log::ID, path: string, rec: any) : string
{
local filter = Log::get_filter(id, “default”);
return string_cat(filter$path, strftime("%F%H", current_time()));
}

event bro_init() {
Log::disable_stream(Syslog::LOG);

for ( id in Log::active_streams ) {
local filter = Log::get_filter(id, “default”);
filter$path_func = datepath;
Log::add_filter(id, filter);
}
}

Thanks,

Benjamin_Wood · March 20, 2018, 2:24pm

I now have the diag output for the crash. I think I will be using a custom routine to identify and “close” files on a regular basis.

[BroControl] > diag manager
[manager]

No core file found. You may need to change your system settings to
allow core files.

Bro 2.5.2
Linux 3.10.0-693.17.1.el7.x86_64

Bro plugins: (none found)

==== No reporter.log

==== stderr.log
/usr/local/bro/share/broctl/scripts/run-bro: line 61: ulimit: core file size: cannot modify limit: Operation not permitted
terminate called after throwing an instance of ‘std::system_error’
what(): Resource temporarily unavailable
/usr/local/bro/share/broctl/scripts/run-bro: line 110: 144420 Aborted nohup “$mybro” “$@”

==== stdout.log
max memory size (kbytes, -m) unlimited
data seg size (kbytes, -d) unlimited
virtual memory (kbytes, -v) unlimited
core file size (blocks, -c) 0

==== .cmdline
-U .status -p broctl -p broctl-live -p local -p manager local.bro broctl base/frameworks/cluster local-manager.bro broctl/auto

==== .env_vars
PATH=/usr/local/bro/bin:/usr/local/bro/share/broctl/scripts:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/dell/srvadmin/bin:/home/bro/.local/bin:/home/bro/bin
BROPATH=/usr/local/bro/spool/installed-scripts-do-not-touch/site::/usr/local/bro/spool/installed-scripts-do-not-touch/auto:/usr/local/bro/share/bro:/usr/local/bro/share/bro/policy:/usr/local/bro/share/bro/site
CLUSTER_NODE=manager

==== .status
RUNNING [net_run]

==== No prof.log

==== No packet_filter.log

==== No loaded_scripts.log

Thanks,

Ben

Benjamin_Wood · March 20, 2018, 4:55pm

I may have solved the problem. I don’t believe this was actually a memory leak. It appears to be a problem with max user processes instead. I upped my ulimits for bro and it works now.

“ulimit -u” was set to 4096. I upped it to 65536, and that seems to have resolved the problem.

It was a little challenging to narrow down, because I didn’t have debug on, and “Resource temporarily unavailable” wasn’t telling me WHICH resource it was trying to allocate, just that it couldn’t. If I have problems in the future, or upgrade, I’ll definitely be enabling debug so I can get better information for problems like this.

I’m still not sure if bro is leaving files open, but digging into the source it looks like it will clean up file descriptors independent of the log rotation interval being set.
https://github.com/bro/bro/blob/a8c0580b45157793da22984f700f92cb3a5745d5/src/File.cc#L357

Thanks,

Ben

Benjamin_Wood · March 20, 2018, 5:29pm

It didn’t solve the problem. It just removed the roadblock. After doing a full “restart” on the cluster, lsof reports 2K+ files. While before reset it reported 1M+. So I still need to figure out a way to clean up those leftover file descriptors.

Azoff_Justin_S · March 20, 2018, 5:38pm

Pretty sure it is.. i don't think path_func is intended to be used the way you are using it and I don't think anything garbage collects writers that have not been used in a while.

It's trivial to verify this, just wait a few hours and run lsof or just

ls -l /proc/*/fd | grep log

It's probably not that hard to fix though.

seth · March 20, 2018, 6:50pm

Justin got your problem right. If you turn off file rotation, then Bro is never closing any of these hourly logs. You have to be really careful with how you use $path_func because you can easily get yourself into hot water.

Alternately you need to define a rotation interval and post processor. Something like this…

Benjamin_Wood · March 20, 2018, 8:11pm

Thanks Seth.

The whole problem I’m trying to solve is steaming data into splunk. Splunk forwarder’s don’t like it when filenames change, and the artificial delay created by rotating logs adds too much latency. The solution that was proposed was “don’t rotate logs”, and leave them in place long enough for the forwarders to finish.

At this point I’ve got to step back and ask, “Am I doing it wrong?” This problem has to have been solved by others. I’m certain there is a way to stream my data to splunk that is better than this.

The file rotation and renaming functions give me enough to play with to solve the problem using bro-script.

Thanks again for the feedback,

Ben

seth · March 21, 2018, 1:45pm

Ah! I'm trying to solve a similar problem with my json-streaming-logs package. I'm planning on doing some testing and getting that fixed soon. I think it's still a little broken right now, but I can definitely sympathize with your trouble. Hopefully there'll be some guidance on this from me (or you!?) soon.

.Seth

Jason_Holmes · March 22, 2018, 4:32pm

We're streaming JSON versions of Bro logs into Splunk without an issue. Some pointers that may help:

1. Set your initCrcLength to something like 2048 in your monitor statement in your inputs.conf for Bro logs. The default is 256 bytes, which can be too small to extend past the headers at the beginning of a Bro log for some log types. If you don't do something like this, Splunk will get confused when logs rotate because it will find a log with a different name having the same CRC. This could be why you're having issues with file renames on log rotation.

2. If you rotate your logs off to some other server for long term storage, keep a day or three local as well and have Splunk monitor those directories as well. If you have the initCrcLength set, Splunk is smart enough to recognize that conn.log and conn-datestamp.log are the same thing if they have the same initCrcLength and won't reindex the rotated log. On the other hand, if Splunk was down or had a log queued for batch processing and didn't get it before it was rotated, it'll pick it up from the archive directory.

We accomplish this by rotating to an archive directory on the same partition on the Bro manager. That makes the rotate time almost nothing since the move is essentially a rename rather than moving all of those bytes of logs. We then use a cron job with rsync to copy the files over to long term storage. Another cron job removes files that are too old.

Example monitor statements:

[monitor:///path/to/your/bro/spool/manager/]
disabled = 0
sourcetype = json_bro
index = your_bro_index
initCrcLength = 2048
whitelist = (dns|notice|weird)_json.*\.log$

[monitor:///path/to/your/bro/spool/archive/20*/]
disabled = 0
sourcetype = json_bro
index = your_bro_index
initCrcLength = 2048
whitelist = (dns|notice|weird)_json.*\.log$

3. If you're moving a massive amount of Bro logs and are regularly falling behind, try a heavy forwarder rather than a universal forwarder and bump the number of parallelIngestionPipelines in your server.conf for your Bro node up.

Thanks,

Topic		Replies	Views
Memory leak? Zeek	4	110	May 6, 2022
local.bro causing memory leak Zeek	3	107	May 6, 2022
Why does my logger keep crashing - bro version 2.6.3 Zeek	8	233	May 6, 2022
Logger Child Memory Leak (logger crashing often) Zeek	5	127	May 6, 2022
Version: 2.0-907 -- Bro manager memory exhaustion Development development	2	108	May 6, 2022

local.bro causing memory leak

Related topics