input framework and tuning options

I have been using the input framework with great success as a tool to read and parse structured text logs. Unfortunately I have reached a performance impasse and was looking for a little advice.

The data source is a log file that grows at ~7-9k records/sec and consists of small text lines of < 512 bytes, newline delimited.

The primary symptom here is a steadily growing memory footprint even though the back end analyzer seems to be processing the events in near real time - i.e. there is obviously some buffering going on but the data is being consumed. The footprint for script side variables is not to blame as it is always << 1% of the total.

I tried modifying Raw::block_size to better fit the line size, but that made it worse. Increasing it to 16k seemed to be the sweet spot, but the problem is still there.

Any thoughts on what might help here (besides lower data rates)?


The main categories of problems to check for that come to mind:

(a) Rate of production exceeds rate of consumption
(b) Unbounded script state accumulation
(c) Unbounded core state accumulation
(d) Memory leak

It sounds like you've ruled out (a) and (b). For the others, using a
heap profiler/checker is going to help. There's a brief guide at [1]
on finding memory leaks in Bro that you can try. Else if you can
provide a simple test case that reproduces the behavior, filing a
bug/ticket with that info would be the best way to get someone to help
look into it with you.

- Jon