Hi.
I’m having issues with a sensor. I’m running Zeek 4.2.1 configured with PF_RING 8.0. The pk_ring.ko module is loaded. Workers are pinned to specific cores. I’m adjusting the number of RSS queues to the number of workers.
I’m processing ~ 5.2gbps of traffic. I have ~30% of my workers maxing their CPU core at 100% (and dropping a ton of packets, while the rest are chilling at a cozy ~40-50%. Since it might be relevant, I also have a capture filter that filters out some high volume flows.
Things I’ve tried, but seem to have no effect
- disabling reassembly offload (as suggested in Bro-2.5.2 and PF_RING 6.7 not load balancing properly). Doesn’t seem to affect anything.
- setting PFRINGClusterType = 5-tuple in zeekctl.cfg.
- changing the number of workers (tried 12, 14)
- setting PFRINGClusterType to round-robin. I expected zeek might stop working but would at least change it’s behaviour but the results are exactly the same.
Zeeks stats show that the affected workers are seeing about twice as many packets (in .pkts_link) but also twice as many connections. This is what leads me to believe it’s a clustering issue, and not some CPU heavy flows hogging a few workers.
root@S40-0002:/opt/zeek# cat logs/current/stats.log |grep worker |jq -c '[.peer, .pkts_proc, .pkts_dropped, .pkts_link, .bytes_recv, .tcp_conns, .udp_conns]|@tsv' -r |sort -n
worker-1 11926937 0 11926937 11821666034 9705 7568
worker-10 20304874 0 20304874 25494365339 9887 6571
worker-11 15982274 0 15982274 17382237822 12721 7560
worker-12 5569983 22038010 27607993 3998967861 28960 6724
worker-13 9880313 0 9880313 10404686108 12841 10090
worker-14 6431003 13202965 19633968 4980727202 23737 5635
worker-2 12780457 0 12780457 13877151339 12552 8607
worker-3 8981430 0 8981430 8152736373 10535 6884
worker-4 22568900 0 22568900 26497928709 9129 6921
worker-5 5240202 25481354 30721556 3621222996 20892 5015
worker-6 25090072 38963 25129035 29919849386 8952 6595
worker-7 11789623 0 11789623 12828034403 9187 6960
worker-8 5547602 27071557 32619159 3821877698 20696 5237
worker-9 17891859 0 17891859 21401727729 11683 8174
Here is my node.cfg:
[manager]
type=manager
host=127.0.0.1
[proxy-1]
type=proxy
host=127.0.0.1
[logger-1]
type=logger
host=127.0.0.1
[worker]
type=worker
host=127.0.0.1
interface=ens2f0,ens2f1
lb_method=pf_ring
lb_procs=14
pin_cpus=0,2,4,6,8,10,12,14,15,13,11,9,7,5