Bro and Splunk forwarder

Hi Joseph,

Just wanted to get clarity, are you running Splunk forwarder on the manager of your Bro cluster?
If yes, then how are you monitoring the log files generated by bro in current dir (i.e. contents of your inputs.conf of Splunk Forwarder)?

I believe, Splunk monitoring should work just fine on the bro log files on manager.

Fatema.

I’ve run in to something like this. It may be related to a known issue in the Splunk forwarder (SPL-99316). The forwarder appeared to lose track of files, and usually picked up the data on a delay after the log was rotated. It seemed to be volume-related, with files that grow quickly more likely to trigger it. The workaround in the docs works - I’ve seen it happen after pushing the workaround, but it’s extremely rare.

From the forwarder known issues in the release notes:

2015-04-07 SPL-99316 Universal Forwarders stop sending data repeatedly throughout the day

Workaround:
In limits.conf, try changing file_tracking_db_threshold_mb in the [inputproc] stanza to a lower value.

Otherwise, if splunkd has a cpu core pegged, you may need to do additional tuning to enable an additional parsing pipeline. Also, splunkd has a default output limit of 256Kbit/s to the indexers and will rate-limit itself. It may fall far enough behind that it appears that it’s stopped. For our busiest forwarders, I push these tuning values to the forwarder in a simple app:

— limits.conf —
[thruput]

unlimited output, default is 256 (kb/s)

maxKBps = 0

[inputproc]

default is 100

max_fd = 256

* workaround for SPL-99316

default is 500; the note in “known issues” on SPL-99316

recommends setting to a lower value.

file_tracking_db_threshold_mb = 400

— end limits.conf —

— server.conf —
[general]

parse and read multiple files at once, significantly increases CPU usage

parallelIngestionPipelines = 4

[queue]
maxSize = 128MB

[queue=parsingQueue]
maxSize = 32MB
— end server.conf —

One note about those configs - we’re load balancing the forwarder between a couple dozen clustered indexers. If you’re using a standalone indexer, I’d be careful about parallelIngestionPipelines being too high. We went overkill on memory, so 256MB just for parsing queues isn’t an issue, and the bro masters have plenty of available CPU. If you’re stretched for resources on the box, you probably don’t want to allow Splunk to push that hard.

There’s a lot more tuning that can be done - we switched to JSON output for the bro logs, and the amount of processing needed on the Splunk forwarder went down quite a bit (along with saving quite a bit of disk space on the indexers), at the cost of more Splunk license usage. JSON has fields extracted at search time, while the default delimited logs have all the fields extracted as the file is ingested - smaller size for _raw, but more disk usage since all the fields are stored in the indexes. Performance is actually a little better with JSON as well.

Hopefully that’s helpful.

  • J

Thanks everyone. I’m passing this along to our Splunk person to see what we can do.

Just to clarify I’m running the manager, logger, some of the workers and a splunk forward on the main node. The remaining nodes just run the workers. This is modeled after the 100Gb Bro cluster paper from Berkeley except I don’t believe they had the splunk forwarder. However, we provided more hardware to accommodate this configuration which is the experimental part of this setup.

Thanks,
Joseph

Hi there,

I have encountered this issue, it seems to be related to inputs.conf

if splunk UF is set to batch:// the directory of spool/manager, then it will delete the files it sees after processing them.

meaning you could get a race condition between splunk UF and bro, since bro will try to move the file after a certain interval and gzip them.

you should use monitor:// instead of batch:// in inputs.conf

B