Capture packet loss discrepancy

Hello,
I’m using Zeek 3.0.1 and I’m seeing very high zeek capture loss even if the system load is very low (I’m analyzing 50-100mbps of traffic with a Xeon 8C-16T and 32GiB of ram, the system load is barely rearching 1).

This is what https://docs.zeek.org/en/stable/scripts/policy/misc/capture-loss.zeek.html is reporting:

{"_path":"capture_loss","_write_ts":"2020-03-06T10:00:28.080623Z","ts":"2020-03-06T10:00:28.080623Z","ts_delta":900.0000138282776,"peer":"worker-1-1","gaps":980763,"acks":3451717,"percent_lost":28.4137720444
6367}
{"_path":"capture_loss","_write_ts":"2020-03-06T09:45:28.080609Z","ts":"2020-03-06T09:45:28.080609Z","ts_delta":900.0000011920929,"peer":"worker-1-1","gaps":775832,"acks":3944662,"percent_lost":19.6678955002
98886}

But in Zeek stats logs I cannot see any drops:

{"_path":"stats","_write_ts":"2020-03-06T10:00:25.184307Z","ts":"2020-03-06T10:00:25.184307Z","peer":"manager","mem":99,"pkts_proc":0,"bytes_recv":0,"events_proc":2171,"events_queued":2175,"active_tcp_conns"
:0,"active_udp_conns":0,"active_icmp_conns":0,"tcp_conns":0,"udp_conns":0,"icmp_conns":0,"timers":1097,"active_timers":77,"files":0,"active_files":0,"dns_requests":0,"active_dns_requests":0,"reassem_tcp_size
":0,"reassem_file_size":0,"reassem_frag_size":0,"reassem_unknown_size":0}
{"_path":"stats","_write_ts":"2020-03-06T10:00:26.676726Z","ts":"2020-03-06T10:00:26.676726Z","peer":"proxy-1","mem":95,"pkts_proc":0,"bytes_recv":0,"events_proc":1579,"events_queued":1579,"active_tcp_conns"
:0,"active_udp_conns":0,"active_icmp_conns":0,"tcp_conns":0,"udp_conns":0,"icmp_conns":0,"timers":939,"active_timers":39,"files":0,"active_files":0,"dns_requests":0,"active_dns_requests":0,"reassem_tcp_size"
:0,"reassem_file_size":0,"reassem_frag_size":0,"reassem_unknown_size":0}
{"_path":"stats","_write_ts":"2020-03-06T10:00:28.087807Z","ts":"2020-03-06T10:00:28.087807Z","peer":"worker-1-1","mem":494,"pkts_proc":4342283,"bytes_recv":3395272841,"pkts_dropped":0,"pkts_link":4347989,"p
kt_lag":0.010818004608154297,"events_proc":894955,"events_queued":894955,"active_tcp_conns":2944,"active_udp_conns":473,"active_icmp_conns":70,"tcp_conns":9036,"udp_conns":7376,"icmp_conns":383,"timers":2168
06,"active_timers":11468,"files":10693,"active_files":10,"dns_requests":2,"active_dns_requests":0,"reassem_tcp_size":500680,"reassem_file_size":451064,"reassem_frag_size":0,"reassem_unknown_size":0}
{"_path":"stats","_write_ts":"2020-03-06T10:05:25.185129Z","ts":"2020-03-06T10:05:25.185129Z","peer":"manager","mem":99,"pkts_proc":0,"bytes_recv":0,"events_proc":3573,"events_queued":3569,"active_tcp_conns"
:0,"active_udp_conns":0,"active_icmp_conns":0,"tcp_conns":0,"udp_conns":0,"icmp_conns":0,"timers":1063,"active_timers":77,"files":0,"active_files":0,"dns_requests":0,"active_dns_requests":0,"reassem_tcp_size
":0,"reassem_file_size":0,"reassem_frag_size":0,"reassem_unknown_size":0}
{"_path":"stats","_write_ts":"2020-03-06T10:05:26.677296Z","ts":"2020-03-06T10:05:26.677296Z","peer":"proxy-1","mem":95,"pkts_proc":0,"bytes_recv":0,"events_proc":2101,"events_queued":2101,"active_tcp_conns"
:0,"active_udp_conns":0,"active_icmp_conns":0,"tcp_conns":0,"udp_conns":0,"icmp_conns":0,"timers":933,"active_timers":39,"files":0,"active_files":0,"dns_requests":0,"active_dns_requests":0,"reassem_tcp_size"
:0,"reassem_file_size":0,"reassem_frag_size":0,"reassem_unknown_size":0}
{"_path":"stats","_write_ts":"2020-03-06T10:05:28.087831Z","ts":"2020-03-06T10:05:28.087831Z","peer":"worker-1-1","mem":494,"pkts_proc":4024429,"bytes_recv":3075761558,"pkts_dropped":0,"pkts_link":4029762,"p
kt_lag":0.012360095977783203,"events_proc":860917,"events_queued":860919,"active_tcp_conns":3043,"active_udp_conns":505,"active_icmp_conns":149,"tcp_conns":10006,"udp_conns":7298,"icmp_conns":483,"timers":23
2394,"active_timers":12492,"files":11763,"active_files":13,"dns_requests":0,"active_dns_requests":0,"reassem_tcp_size":60920,"reassem_file_size":955592,"reassem_frag_size":0,"reassem_unknown_size":0}

My network interface reports 0 drops:

NIC statistics:
     rx_packets: 57185146109
     tx_packets: 118236
     rx_bytes: 51106679706383
     tx_bytes: 12060072
     rx_broadcast: 28116614
     tx_broadcast: 0
     rx_multicast: 430062675
     tx_multicast: 0
     multicast: 430062675
     collisions: 0
     rx_crc_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 0
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 51106679706383
     tx_dma_out_of_sync: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
     os2bmc_rx_by_bmc: 0
     os2bmc_tx_by_bmc: 0
     os2bmc_tx_by_host: 0
     os2bmc_rx_by_host: 0
     tx_hwtstamp_timeouts: 0
     rx_hwtstamp_cleared: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_frame_errors: 0
     rx_fifo_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_queue_0_packets: 0
tx_queue_0_restart: 0
     tx_queue_1_packets: 118236
     tx_queue_1_bytes: 11587128
     tx_queue_1_restart: 0
     tx_queue_2_packets: 0
     tx_queue_2_bytes: 0
     tx_queue_2_restart: 0
     tx_queue_3_packets: 0
     tx_queue_3_bytes: 0
     tx_queue_3_restart: 0
     tx_queue_4_packets: 0
     tx_queue_4_bytes: 0
     tx_queue_4_restart: 0
     tx_queue_5_packets: 0
     tx_queue_5_bytes: 0
     tx_queue_5_restart: 0
     tx_queue_6_packets: 0
     tx_queue_6_bytes: 0
     tx_queue_6_restart: 0
     tx_queue_7_packets: 0
     tx_queue_7_bytes: 0
     tx_queue_7_restart: 0
     rx_queue_0_packets: 7309311690
     rx_queue_0_bytes: 6672057827542
     rx_queue_0_drops: 0
     rx_queue_0_csum_err: 0
     rx_queue_0_alloc_failed: 0
     rx_queue_1_packets: 7067404359
     rx_queue_1_bytes: 5978548722708
     rx_queue_1_drops: 0
     rx_queue_1_csum_err: 0
     rx_queue_1_alloc_failed: 0
     rx_queue_2_packets: 6936589456
     rx_queue_2_bytes: 5816850955623
     rx_queue_2_drops: 0
     rx_queue_2_csum_err: 0
     rx_queue_2_alloc_failed: 0
     rx_queue_3_packets: 7560820836
     rx_queue_3_bytes: 7177372551363
     rx_queue_3_drops: 0
     rx_queue_3_csum_err: 0
     rx_queue_3_alloc_failed: 0
     rx_queue_4_packets: 6665690657
     rx_queue_4_bytes: 5815406197188
     rx_queue_4_drops: 0
     rx_queue_4_csum_err: 0
     rx_queue_4_alloc_failed: 0
     rx_queue_5_packets: 7245905157
     rx_queue_5_bytes: 6640952714842
     rx_queue_5_drops: 0
     rx_queue_5_csum_err: 0
tx_queue_5_restart: 0
     tx_queue_6_packets: 0
     tx_queue_6_bytes: 0
     tx_queue_6_restart: 0
     tx_queue_7_packets: 0
     tx_queue_7_bytes: 0
     tx_queue_7_restart: 0
     rx_queue_0_packets: 7309311690
     rx_queue_0_bytes: 6672057827542
     rx_queue_0_drops: 0
     rx_queue_0_csum_err: 0
     rx_queue_0_alloc_failed: 0
     rx_queue_1_packets: 7067404359
     rx_queue_1_bytes: 5978548722708
     rx_queue_1_drops: 0
     rx_queue_1_csum_err: 0
     rx_queue_1_alloc_failed: 0
     rx_queue_2_packets: 6936589456
     rx_queue_2_bytes: 5816850955623
     rx_queue_2_drops: 0
     rx_queue_2_csum_err: 0
     rx_queue_2_alloc_failed: 0
     rx_queue_3_packets: 7560820836
     rx_queue_3_bytes: 7177372551363
     rx_queue_3_drops: 0
     rx_queue_3_csum_err: 0
     rx_queue_3_alloc_failed: 0
     rx_queue_4_packets: 6665690657
     rx_queue_4_bytes: 5815406197188
     rx_queue_4_drops: 0
     rx_queue_4_csum_err: 0
     rx_queue_4_alloc_failed: 0
     rx_queue_5_packets: 7245905157
     rx_queue_5_bytes: 6640952714842
     rx_queue_5_drops: 0
     rx_queue_5_csum_err: 0
     rx_queue_5_alloc_failed: 0
     rx_queue_6_packets: 7693400503
     rx_queue_6_bytes: 6803443308105
     rx_queue_6_drops: 0
     rx_queue_6_csum_err: 0
     rx_queue_6_alloc_failed: 0
     rx_queue_7_packets: 6706023451
     rx_queue_7_bytes: 5768995611822
     rx_queue_7_drops: 0
     rx_queue_7_csum_err: 0
     rx_queue_7_alloc_failed: 0

Is there something am I missing? Is it a way to further analyze the problem? By looking in zeek logs everything looks fine.

You either drop data before it reaches Zeek sensors or you have an offloading missconfiguration on the card or you have a load-balancing problem. That might be a good news.

The capture.log file is based on heuristics “I have seen some ACKs without matching data packets” - but won’t tell you where the problem is. Justin calls it a “check engine light”.

You don’t seem to have drops in stats.log and in ethtool stats. That’s great. Stats.log meaning is different for each capture method and it’s never enough without some ethtool stats.

Can you tell us more about your setup?

  • how are you feeding your sensor (taps, span ports, etc)
  • what’s your capture mechanism?
  • can you share all commands for interface’s configuration?
  • can you share the node.cfg
  • output of

ethtool -c
ethtool -l
ethtool -x
ethtool -a
ethtool -g

Hello,
thank you for your help.
I’m configured a span port on the switch.

In Zeek I’ve configured the AF_Packet plugin.

This is my node.cfg:

[manager]
type=manager
host=localhost

[proxy-1]
type=proxy
host=localhost

[worker-1]
type=worker
host=localhost
interface=bond0
lb_method=custom
lb_procs=1
af_packet_fanout_id=21
af_packet_fanout_mode=AF_Packet::FANOUT_HASH
af_packet_buffer_size=128*1024*1024

This is the ethtool output:

ethtool -c eno2
Coalesce parameters for eno2:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 3
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

ethtool -l eno2
Channel parameters for eno2:
Pre-set maximums:
RX:             0
TX:             0
Other:          1
Combined:       8
Current hardware settings:
RX:             0
TX:             0
Other:          1
Combined:       8

ethtool -x eno2
RX flow hash indirection table for eno2 with 8 RX ring(s):
    0:      0     0     0     0     0     0     0     0
    8:      0     0     0     0     0     0     0     0
   16:      1     1     1     1     1     1     1     1
   24:      1     1     1     1     1     1     1     1
   32:      2     2     2     2     2     2     2     2
   40:      2     2     2     2     2     2     2     2
   48:      3     3     3     3     3     3     3     3
   56:      3     3     3     3     3     3     3     3
   64:      4     4     4     4     4     4     4     4
   72:      4     4     4     4     4     4     4     4
   80:      5     5     5     5     5     5     5     5
   88:      5     5     5     5     5     5     5     5
   96:      6     6     6     6     6     6     6     6
  104:      6     6     6     6     6     6     6     6
  112:      7     7     7     7     7     7     7     7
  120:      7     7     7     7     7     7     7     7
RSS hash key:
Operation not supported

ethtool -a eno2
Pause parameters for eno2:
Autonegotiate:  on
RX:             on
TX:             off

ethtool -g eno2
Ring parameters for eno2:
Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             256

I’m also adding my /etc/network/interfaces:

auto eno2
iface eno2 inet manual
  up ip link set $IFACE promisc on arp off up
  down ip link set $IFACE promisc off down
  post-up ethtool -G $IFACE rx 4096; for i in rx tx sg tso ufo gso gro lro; do ethtool -K $IFACE $i off; done
  post-up echo 1 > /proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
  bond-master bond0

auto bond0
iface bond0 inet manual
  bond-mode balance-rr
  bond-slaves eno2
  mtu 1500
  up ip link set bond0 promisc on arp off up
  down ip link set bond0 promisc off down
  post-up ethtool -G bond0 rx 4096; for i in rx tx sg tso ufo gso gro lro; do ethtool -K bond0 $i off; done
  post-up echo 1 > /proc/sys/net/ipv6/conf/bond0/disable_ipv6

I’ve configured a bond interface because I’m planning to add another span port. I’ve copied this configuration from SecurityOnion github: https://github.com/Security-Onion-Solutions/securityonion-saltstack/blob/d3826bc605a22e090bc7cf07309bce76a2f12623/setup/so-functions#L298

So it does not look like that’s Zeek dropping traffic here. Something else must be going on.

What’s your data rate? What’s the MTU?

Something that worries me -

     rx_long_byte_count: 51106679706383

Not sure what these are, as it pretty much always requires reading network card’s driver source (these fields are by no means standarized and even if they happen to have the same name accross vendors they can mean something else).

Can you share statistics from the switch? Something like show int stats or similar.