This sounds like an interesting experiment. The main difference that I see
is that when writing raw PCAP, you're writing to mostly-contiguous blocks.
With the text logs, you're writing to 20 files, scattered who-knows-how,
and at least 3 or 4 of those files are heavily used. The SATA layer
*should* try to reorder the writes to minimize seeks, but who knows how
efficient that is.
If this actually was the issue, it'd be nice if Bro would detect that
$sourceRate > $sinkRate, and log something to reporter, that there's a
bottleneck in the log writer. That's a pretty daunting task, however.
It ought to be easy enough to tell if it is indeed disk logging IO
that's the bottleneck. I would think simply disabling all logs or
sending all logs to /dev/null via symlink or even putting the logging
directory on /dev/shm for a bit would all be ways of seeing if the raw
IO is the problem.
I was going to try capturing from an interface with little to no traffic.
The process is constantly in a uwait status which I believe indicates a problem with threading.
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
95626 bro 24 20 0 7473M 7354M uwait 36 8:46 638.82% bro
95628 bro 1 103 5 994M 772M CPU12 0 1:25 85.35% bro
61097 mysql 24 20 0 929M 12556K uwait 39 184:16 0.00% mysqld
Using ktrace I see some errors regarding umtx_op (userland mutex):
95626 157938 bro 0.003929 RET _umtx_op -1 errno 60 Operation timed out
How does it look when you enable displaying individual threads in top?
Some threads will be pretty much idle most of them time, while others
should be quite busy. If it really seems to hang somewhere, attaching
a gdb should show where exactly the threads stall (in particular the
main one). But seeing the high aggregate CPU load, I'm not sure that's
I discovered that last night; displaying by threads showed exactly what you’ve described.
I’m not an expert with threads but I do know they are challenging to program, not to mention debug. I’m assuming I/O is not the problem here as the disk latency is 0.2ms at max.
When running 5 clusters instead of one I still have the problem with the manager consuming all memory on each server, it just takes longer for it to happen. The last configuration I tried was 16 workers on each cluster. I wasn’t able to tell via top and GDB what was going on, so I guess it’s time to litter the code with debugging blocks and figure it out.
We may be converging on the source of this problem. We'll be sure to update again.