High capture loss for some workers

Mark_Gardner · April 23, 2019, 8:41pm

We are setting up a Zeek cluster consisting of a manager/logger and five sensors. Each node uses the same hardware:

2.4 GHz AMD Epyc 7351P (16-core, 32-threads)
256 GB DDR3 ECC RAM
Intel X520-T2 10 Gbps to Arista with 0.5m DAC
Configuration:
Arista 7150S hashing on 5-tuple
Gigamon sends to Arista via 4x10 Gbps
Zeek v2.6-167 with AF_Packet
16 workers per sensor (total: 5x16=80 workers)

The capture loss was 50-70% until I remembered to turn off offloading. Now it averages about 0.8%. Except that often 0-4 cores in a 1 hour summary spike at 60-70% capture loss. There doesn’t appear to be a pattern on which core suffers the high loss. Searches for how to identify and fix the reason for such large losses have failed to yield any suggestions for debugging the problem. Suggestions?

Mark

JustinAzoff · April 23, 2019, 10:55pm

Once you have a high capture loss value you need to switch from
focusing on that and look at the missed_bytes column in the conn.log.
The capture loss value is like a check engine light. It only tells
you that something is wrong, but the conn.log tells you what is wrong.

Look for entries in the conn.log where missed_bytes is non zero, or
even start with looking for any records where it is > 100000. You may
find that you simply have a few connections that are completely broken
causing the capture loss to be skewed towards that 60% value.

A much better metric that I like to use is 'percent of connections
with loss'. It's a completely different problem if you have 40%
overall capture loss but only .01% of connections with loss, compared
to 40% overall capture loss with loss on 20% of connections.

If you install bro-doctor from bro-pkg that will do a lot of analysis
like this for you.

I'd also run 1 less worker on each of those boxes. With 16 workers
and 16 cores, you're not leaving any spare cores to dedicate to cron
jobs and other background tasks.

Topic		Replies	Views
Capture packet loss discrepancy Zeek	5	336	May 6, 2022
capture_loss vs. pkts_dropped vs. missed_bytes Zeek	2	255	May 6, 2022
How can I reduce my packet loss - bro version 2.6.3 Zeek	2	100	May 6, 2022
Capture Loss using pcap file Zeek	2	107	May 6, 2022
capture_loss are very high Zeek	1	71	May 6, 2022

High capture loss for some workers

Related topics