I’ve successfully run smaller Bro clusters, but now that I’m scaling out I’m seeing the manager and logger threads crash immediately when I deploy the configuration.
What I’m trying to run:
-
1 manager, 1 logger on 1 host
-
8 proxies and 32 workers on 8 hosts
I’m using Bro 2.5.1. Each worker host has 2 Myricom 10G NICs w/2 ports each, using the 3.0.10 Myricom SNF driver. I’m attempting to run 9 processes (lb_procs) per worker node, each pinned to its own CPU core.
What I’m finding is that any time the number of worker processes exceeds ~160 (not a magic number–not consistent, but around that value based on observation), the manager and logger threads crash. If I keep the number of worker processes at or below ~160 (either by reducing processes per node, reducing nodes per host, or reducing hosts in the cluster) it runs successfully. Ideally, the cluster would have 288 worker processes.
This does not seem to be related packet volume, as the manager and logger threads crash even if I am not sending any traffic to the worker nodes.
Any troubleshooting or optimization suggestions are appreciated.