I’ve got some custom log names happening, and it’s causing a memory leak. Bro never closes the file descriptors or releases the objects. This is causing the manager to crash over a period of time.
I’m running my cluster with broctl, and rotation is turned off because I’m naming files with a timestamp to begin with.
Any suggestions on how to perform a periodic “clean up”?
function datepath(id: Log::ID, path: string, rec: any) : string
{
local filter = Log::get_filter(id, “default”);
return string_cat(filter$path, strftime("%F%H", current_time()));
}
I now have the diag output for the crash. I think I will be using a custom routine to identify and “close” files on a regular basis.
[BroControl] > diag manager
[manager]
No core file found. You may need to change your system settings to
allow core files.
Bro 2.5.2
Linux 3.10.0-693.17.1.el7.x86_64
Bro plugins: (none found)
==== No reporter.log
==== stderr.log
/usr/local/bro/share/broctl/scripts/run-bro: line 61: ulimit: core file size: cannot modify limit: Operation not permitted
terminate called after throwing an instance of ‘std::system_error’
what(): Resource temporarily unavailable
/usr/local/bro/share/broctl/scripts/run-bro: line 110: 144420 Aborted nohup “$mybro” “$@”
==== stdout.log
max memory size (kbytes, -m) unlimited
data seg size (kbytes, -d) unlimited
virtual memory (kbytes, -v) unlimited
core file size (blocks, -c) 0
I may have solved the problem. I don’t believe this was actually a memory leak. It appears to be a problem with max user processes instead. I upped my ulimits for bro and it works now.
“ulimit -u” was set to 4096. I upped it to 65536, and that seems to have resolved the problem.
It was a little challenging to narrow down, because I didn’t have debug on, and “Resource temporarily unavailable” wasn’t telling me WHICH resource it was trying to allocate, just that it couldn’t. If I have problems in the future, or upgrade, I’ll definitely be enabling debug so I can get better information for problems like this.
It didn’t solve the problem. It just removed the roadblock. After doing a full “restart” on the cluster, lsof reports 2K+ files. While before reset it reported 1M+. So I still need to figure out a way to clean up those leftover file descriptors.
Pretty sure it is.. i don't think path_func is intended to be used the way you are using it and I don't think anything garbage collects writers that have not been used in a while.
It's trivial to verify this, just wait a few hours and run lsof or just
Justin got your problem right. If you turn off file rotation, then Bro is never closing any of these hourly logs. You have to be really careful with how you use $path_func because you can easily get yourself into hot water.
Alternately you need to define a rotation interval and post processor. Something like this…
function my_log_post_processor(info: Log::RotationInfo): bool
{
local ext = sub(info$fname, /^[^\-]+-[0-9]+-[0-9]+-[0-9]+_[0-9]+\.[0-9]+\.[0-9]+\./, "");
# Move file to name including both opening and closing time.
local dst = fmt("%s_%s_%s-%s.%s", info$path, strftime("%Y%m%d", info$open),
strftime("%H:%M:%S", info$open),
strftime("%H:%M:%S%z", info$close),
ext);
local cmd = fmt("/bin/mv %s %s/%s", info$fname, "/data/logs", dst);
system(cmd);
return T;
}
event bro_init()
{
for ( id in Log::active_streams )
{
local filter = Log::get_filter(id, "default");
filter$interv = 1hr;
filter$postprocessor = my_log_post_processor;
Log::add_filter(id, filter);
}
}
Something like that will enable you to turn off log rotation in broctl (but you’ll lose some broctl niceties as well).
The whole problem I’m trying to solve is steaming data into splunk. Splunk forwarder’s don’t like it when filenames change, and the artificial delay created by rotating logs adds too much latency. The solution that was proposed was “don’t rotate logs”, and leave them in place long enough for the forwarders to finish.
At this point I’ve got to step back and ask, “Am I doing it wrong?” This problem has to have been solved by others. I’m certain there is a way to stream my data to splunk that is better than this.
The file rotation and renaming functions give me enough to play with to solve the problem using bro-script.
Ah! I'm trying to solve a similar problem with my json-streaming-logs package. I'm planning on doing some testing and getting that fixed soon. I think it's still a little broken right now, but I can definitely sympathize with your trouble. Hopefully there'll be some guidance on this from me (or you!?) soon.
We're streaming JSON versions of Bro logs into Splunk without an issue. Some pointers that may help:
1. Set your initCrcLength to something like 2048 in your monitor statement in your inputs.conf for Bro logs. The default is 256 bytes, which can be too small to extend past the headers at the beginning of a Bro log for some log types. If you don't do something like this, Splunk will get confused when logs rotate because it will find a log with a different name having the same CRC. This could be why you're having issues with file renames on log rotation.
2. If you rotate your logs off to some other server for long term storage, keep a day or three local as well and have Splunk monitor those directories as well. If you have the initCrcLength set, Splunk is smart enough to recognize that conn.log and conn-datestamp.log are the same thing if they have the same initCrcLength and won't reindex the rotated log. On the other hand, if Splunk was down or had a log queued for batch processing and didn't get it before it was rotated, it'll pick it up from the archive directory.
We accomplish this by rotating to an archive directory on the same partition on the Bro manager. That makes the rotate time almost nothing since the move is essentially a rename rather than moving all of those bytes of logs. We then use a cron job with rsync to copy the files over to long term storage. Another cron job removes files that are too old.
3. If you're moving a massive amount of Bro logs and are regularly falling behind, try a heavy forwarder rather than a universal forwarder and bump the number of parallelIngestionPipelines in your server.conf for your Bro node up.