BPF Segmentation Fault

Hi guys,

I wonder if anyone can offer any advice in relation to an issue we have using Zeek (LTS 4.0.3), and a Myricom 10G-PCIE2-8C2-2S. The Myricom card is currently on a SPAN port from a Juniper QFX, albeit we’re planning to move to a Profitap fibre TAP soon.

We’ve compiled Zeek using sources in order to accommodate the snf driver (e.g ./configure --with-pcap=/opt/snf/), and it works well using the following node.cfg configuration -

[worker-1]
type=worker
host=localhost
#pin_cpus=1,3,5,7
interface=snf0
lb_method=myricom
lb_procs=8

Our issue, is that when we try to filter traffic, either using ZeekArgs, or redef PacketFilter::default_capture_filter, workers crash within a few minutes of starting the process.

We’re trying to use a simple capture filter like -

ZeekArgs = -f "not dst host 10.100.48.5 and not dst host 10.100.40.78”

Or

redef PacketFilter::default_capture_filter = "not host 10.100.48.5";

The output of the crash diag is attached, but in short, we experience -

Program terminated with signal SIGSEGV, Segmentation fault.
#0 zeek::packet_analysis::Ethernet::EthernetAnalyzer::AnalyzePacket (this=0x5560e04be680, len=808, data=0x41d853675b0719a8 <error: Cannot access memory at address 0x41d853675b0719a8>, packet=0x5560e171b9c8) at /root/zeek-4.0.3/src/packet_analysis/protocol/ethernet/Ethernet.cc:33
33 if ( data[12] == 0x89 && data[13] == 0x03 )
[Current thread is 1 (Thread 0x7f90ea7172c0 (LWP 4830))]

If we remove the BPF or capture filter, the processes stay online consistently.

Any advise on how to diagnose this would be greatly appreciated.

Best regards
Andy

crash-diag.txt (23.3 KB)

the snf libpcap is buggy and returns garbage packets to zeek. When no packet is available pcap_next_ex is supposed to return a timeout, but snf libpcap just returns a previous packet that is often corrupted.

I first ran into this probably 5 years ago and reported it to them and it was never fixed.

Try this patch to zeek, I bet it fixes the problem for you:

diff --git a/src/iosource/pcap/Source.cc b/src/iosource/pcap/Source.cc
index b61e1ce91…fb9da5c2c 100644
— a/src/iosource/pcap/Source.cc
+++ b/src/iosource/pcap/Source.cc
@@ -245,6 +245,8 @@ bool PcapSource::ExtractNextPacket(Packet* pkt)
++stats.received;
stats.bytes_received += header->len;

  • header->len=0;
  • header->caplen=0;
    return true;
    }

Using the native zeek myricom plugin avoids this issue, or if you have a support contact with myricom you could try to get them to fix this bug.

Hi Justin,

Thank you very much for your reply. I’ve patched the 4.0.3 code, and now have 8 workers which have been running for several hours

root@hornet01:/var/log/bro/current# zeekctl status
Name Type Host Status Pid Started
logger-1 logger localhost running 26373 24 Sep 16:23:29
manager manager localhost running 26423 24 Sep 16:23:31
proxy-1 proxy localhost running 26472 24 Sep 16:23:32
worker-1-1 worker localhost running 26602 24 Sep 16:23:33
worker-1-2 worker localhost running 26603 24 Sep 16:23:33
worker-1-3 worker localhost running 26607 24 Sep 16:23:33
worker-1-4 worker localhost running 26618 24 Sep 16:23:33
worker-1-5 worker localhost running 26620 24 Sep 16:23:33
worker-1-6 worker localhost running 26619 24 Sep 16:23:33
worker-1-7 worker localhost running 26625 24 Sep 16:23:33
worker-1-8 worker localhost running 26623 24 Sep 16:23:33

We’re running this with the following to capture RFC1918 src/dst only - ZeekArgs = -f "(src net 10.0.0.0/8 or src net 192.168.0.0/16 or src net 172.16.0.0/12) and (dst net 10.0.0.0/8 or dst net 192.168.0.0/16 or dst net 172.16.0.0/12)”

We have another Zeek instance doing north/south traffic, which I’ll patch over the weekend.

Thanks so far for your awesome assistance!

Best regards
Andy

Just as a follow-up, I opened https://github.com/zeek/zeek/pull/1804 this morning to add this fix.

Tim