Hey All,
Running Bro 2.5 on a single server with 20 cores and some 240 GB of memory.
node.cfg specifies 14 workers, 2 proxies, 1 manager and a 1 logger process.
We are running a custom build of bro built with tmalloc enabled and pfring enabled.
I’m working to get my bro cluster stable. As it stand, often the logger process will crash causing us to lose a period of log files. Looking at the output of broctl top, it seems that the system is likely killing the bro logger process when it sees the amount of memory resources it is consuming.
==== stderr.log
listening on p5p2
1484325490.230681 received termination signal
broctl top
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger localhost 47880 parent 4G 3G 82% bro
logger logger localhost 47902 child 38G 37G 13% bro
…
As I’ve been writing this email I have watched the logger process’s memory utilization slowly climb from 16% to 17% (broctl top is now indicating 41G memory usage by logger child)
I’ve been investigating if the bottleneck goes back to our storage solution, which is just a bunch of disks. Based on utilization indicated by iostat and iotop’s output, it seems like the Bro logger process is writing around 4MB/s to our disc which seems reasonable and does not indicate a bottleneck to me.
Aside: there is a tangential problem in that currently we are seeing a very high drop rate indicated by netstats:
broctl netstats
worker-1-1: 1484334410.387744 recvd=1087209145 dropped=3525526435 link=317784575
worker-1-10: 1484334410.711691 recvd=2916765681 dropped=1696150851 link=317965517
…
Thanks for any insights or suggestions!
-Ryan