I have been using the input framework with great success as a tool to read and parse structured text logs. Unfortunately I have reached a performance impasse and was looking for a little advice.
The data source is a log file that grows at ~7-9k records/sec and consists of small text lines of < 512 bytes, newline delimited.
The primary symptom here is a steadily growing memory footprint even though the back end analyzer seems to be processing the events in near real time - i.e. there is obviously some buffering going on but the data is being consumed. The footprint for script side variables is not to blame as it is always << 1% of the total.
I tried modifying Raw::block_size to better fit the line size, but that made it worse. Increasing it to 16k seemed to be the sweet spot, but the problem is still there.
Any thoughts on what might help here (besides lower data rates)?
thanks!
scott