I’ve run in to something like this. It may be related to a known issue in the Splunk forwarder (SPL-99316). The forwarder appeared to lose track of files, and usually picked up the data on a delay after the log was rotated. It seemed to be volume-related, with files that grow quickly more likely to trigger it. The workaround in the docs works - I’ve seen it happen after pushing the workaround, but it’s extremely rare.
From the forwarder known issues in the release notes:
2015-04-07 SPL-99316 Universal Forwarders stop sending data repeatedly throughout the day
Workaround:
In limits.conf, try changing file_tracking_db_threshold_mb in the [inputproc] stanza to a lower value.
Otherwise, if splunkd has a cpu core pegged, you may need to do additional tuning to enable an additional parsing pipeline. Also, splunkd has a default output limit of 256Kbit/s to the indexers and will rate-limit itself. It may fall far enough behind that it appears that it’s stopped. For our busiest forwarders, I push these tuning values to the forwarder in a simple app:
— limits.conf —
[thruput]
unlimited output, default is 256 (kb/s)
maxKBps = 0
[inputproc]
default is 100
max_fd = 256
* workaround for SPL-99316
default is 500; the note in “known issues” on SPL-99316
recommends setting to a lower value.
file_tracking_db_threshold_mb = 400
— end limits.conf —
— server.conf —
[general]
parse and read multiple files at once, significantly increases CPU usage
parallelIngestionPipelines = 4
[queue]
maxSize = 128MB
[queue=parsingQueue]
maxSize = 32MB
— end server.conf —
One note about those configs - we’re load balancing the forwarder between a couple dozen clustered indexers. If you’re using a standalone indexer, I’d be careful about parallelIngestionPipelines being too high. We went overkill on memory, so 256MB just for parsing queues isn’t an issue, and the bro masters have plenty of available CPU. If you’re stretched for resources on the box, you probably don’t want to allow Splunk to push that hard.
There’s a lot more tuning that can be done - we switched to JSON output for the bro logs, and the amount of processing needed on the Splunk forwarder went down quite a bit (along with saving quite a bit of disk space on the indexers), at the cost of more Splunk license usage. JSON has fields extracted at search time, while the default delimited logs have all the fields extracted as the file is ingested - smaller size for _raw, but more disk usage since all the fields are stored in the indexes. Performance is actually a little better with JSON as well.
Hopefully that’s helpful.