This sounds like an interesting experiment. The main difference that I see
is that when writing raw PCAP, you're writing to mostly-contiguous blocks.
With the text logs, you're writing to 20 files, scattered who-knows-how,
and at least 3 or 4 of those files are heavily used. The SATA layer
*should* try to reorder the writes to minimize seeks, but who knows how
efficient that is.
If this actually was the issue, it'd be nice if Bro would detect that
$sourceRate > $sinkRate, and log something to reporter, that there's a
bottleneck in the log writer. That's a pretty daunting task, however.
It ought to be easy enough to tell if it is indeed disk logging IO
that's the bottleneck. I would think simply disabling all logs or
sending all logs to /dev/null via symlink or even putting the logging
directory on /dev/shm for a bit would all be ways of seeing if the raw
IO is the problem.
How does it look when you enable displaying individual threads in top?
Some threads will be pretty much idle most of them time, while others
should be quite busy. If it really seems to hang somewhere, attaching
a gdb should show where exactly the threads stall (in particular the
main one). But seeing the high aggregate CPU load, I'm not sure that's
what's happening.
I discovered that last night; displaying by threads showed exactly what you’ve described.
I’m not an expert with threads but I do know they are challenging to program, not to mention debug. I’m assuming I/O is not the problem here as the disk latency is 0.2ms at max.
When running 5 clusters instead of one I still have the problem with the manager consuming all memory on each server, it just takes longer for it to happen. The last configuration I tried was 16 workers on each cluster. I wasn’t able to tell via top and GDB what was going on, so I guess it’s time to litter the code with debugging blocks and figure it out.