Bro 2.3.2-419 segfaults when using PF_RING 6.0.3 libpcap 1.6.2 and pfdnacluster_master on RHEL 6.6

Hello,

I’m having trouble getting Bro to run with PF_RING after updating from RHEL 6.5 to RHEL 6.6. The PF_RING aware drivers (DNA/ZC etc) in the “stable” 6.0.2 branch of PF_RING don’t appear to compile correctly on RHEL 6.6, which necessitated a move to the latest 6.0.3 development branch (rev.9009). This version compiles fine and I have it working with both Suricata and nprobe, but can’t get it working with Bro. Bro doesn’t seem to be able to open the dnacluster:21@0 etc interfaces with the new version. Specifically bro segfaults when calling the PF_RING version of libpcap.so.1.6.2, which is a new version of libpcap in 6.0.3. Previously libpcap was 1.1.1. I have also tried to compile PF_RING 6.0.2 stable on RHEL 6.6 with the newer drivers, but the version of pfdnacluster_master that ships with PF_RING 6.0.2 stable (that uses the older libpcap) will silently crash on RHEL 6.6. I’ve attached the output of a broctl diag to this email. Typically when I’ve seen an error where bro can’t listen on dnacluster in the past it has been due to the interface already being in use, bro not being able to find pfring, or not compiling against the correct libpcap. I’ve verified this isn’t the case to the best of my ability (no other libpcap on the system, fresh dna driver load and instance of pfdnaclster_master, pfring in $PATH etc). I’ve also verified that I can see packets on the dnacluster interfaces by testing with pfcount. It looks like perhaps bro doesn’t like the new version of libpcap. I have tried compiling and running bro with debugging enabled, but bro seems to crash on the workers without generating anything in the various debug.log files. Any thoughts?

Here are example error messages from /var/log/messages:

kernel: bro[1653]: segfault at 1371670 ip 00007f5a9e7f0660 sp 00007fff8714b300 error 4 in libpcap.so.1.6.2[7f5a9e7d9000+90000]
kernel: bro[1643]: segfault at 1371670 ip 00007ff16d19b660 sp 00007fff81eea9a0 error 4 in libpcap.so.1.6.2[7ff16d184000+90000]
kernel: bro[1656]: segfault at 1371670 ip 00007fcf3c6cf660 sp 00007fff3e1789b0 error 4 in libpcap.so.1.6.2[7fcf3c6b8000+90000]
kernel: bro[1644]: segfault at 1 ip 00007f5932268506 sp 00007fffcd3ea0b0 error 4 in libpcap.so.1.6.2[7f5932251000+90000]
kernel: bro[1642]: segfault at 1 ip 00007ff3d1c83506 sp 00007fff468f4930 error 4 in libpcap.so.1.6.2[7ff3d1c6c000+90000]
kernel: bro[1658]: segfault at 1371670 ip 00007f53584f2660 sp 00007ffff89515f0 error 4 in libpcap.so.1.6.2[7f53584db000+90000]
kernel: bro[1652]: segfault at 1371670 ip 00007f158fbc7660 sp 00007fff14aa7e20 error 4 in libpcap.so.1.6.2[7f158fbb0000+90000]
kernel: bro[1660]: segfault at 1371670 ip 00007f2fee8e7660 sp 00007ffff9dacaf0 error 4 in libpcap.so.1.6.2[7f2fee8d0000+90000]
kernel: bro[1641]: segfault at 1 ip 00007f32fbc48506 sp 00007fff7d9b2a00 error 4 in libpcap.so.1.6.2[7f32fbc31000+90000]
kernel: bro[1662]: segfault at b836210 ip 00007f5c9d669660 sp 00007fff71636fb0 error 4 in libpcap.so.1.6.2[7f5c9d652000+90000]
kernel: bro[4220]: segfault at 1371670 ip 00007f6d35299660 sp 00007fff4d896940 error 4 in libpcap.so.1.6.2[7f6d35282000+90000]
kernel: bro[4465]: segfault at 1371670 ip 00007f202ff75660 sp 00007fff04fff8c0 error 4 in libpcap.so.1.6.2[7f202ff5e000+90000]
kernel: bro[4710]: segfault at 1371670 ip 00007fd8bc794660 sp 00007fff33041db0 error 4 in libpcap.so.1.6.2[7fd8bc77d000+90000]
kernel: bro[7873]: segfault at 1371670 ip 00007ffc910f2660 sp 00007fff1b5ba1b0 error 4 in libpcap.so.1.6.2[7ffc910db000+90000]
kernel: bro[8065]: segfault at 1371670 ip 00007ffaa5c8f660 sp 00007fff3cdde390 error 4 in libpcap.so.1.6.2[7ffaa5c78000+90000]
kernel: bro[8257]: segfault at 63745e0 ip 00007ff913224660 sp 00007fff297ca2f0 error 4 in libpcap.so.1.6.2[7ff91320d000+90000]
kernel: bro[8446]: segfault at 1371670 ip 00007f0a1c567660 sp 00007fffdf059910 error 4 in libpcap.so.1.6.2[7f0a1c550000+90000]
kernel: bro[8638]: segfault at 1371670 ip 00007f50982af660 sp 00007fff703caa30 error 4 in libpcap.so.1.6.2[7f5098298000+90000]
kernel: bro[8835]: segfault at 1371670 ip 00007f1b4acd2660 sp 00007fffacc16630 error 4 in libpcap.so.1.6.2[7f1b4acbb000+90000]
kernel: bro[9036]: segfault at 1 ip 00007f10df91b506 sp 00007fff5ac3e320 error 4 in libpcap.so.1.6.2[7f10df904000+90000]

Regards,
Gary

bro-diag-pfring-shorter-23FEB2015.txt (33.6 KB)

A couple folks have suggested I run this with gdb and get a backtrace to post here. Here is a quick gdb session with a backtrace of when I run bro -i dnacluster:21@0: 

# gdb /nsm/bro/bin/bro
GNU gdb (GDB) SLES Expanded Support platform (7.2-75.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <[http://gnu.org/licenses/gpl](http://gnu.org/licenses/gpl).
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux".
For bug reporting instructions, please see:
[<http://www.gnu.org/software/gdb/bugs/>](http://www.gnu.org/software/gdb/bugs/)...
Reading symbols from /nsm/bro/bin/bro...done.
(gdb) run -i dnacluster:21@0
Starting program: /nsm/bro/bin/bro -i dnacluster:21@0
[Thread debugging using libthread_db enabled]
listening on dnacluster:21@0, capture length 8192 bytes

[New Thread 0x7fff20fd0700 (LWP 36513)]
[New Thread 0x7fff1bfff700 (LWP 36514)]
[New Thread 0x7fff1b5fe700 (LWP 36515)]
[New Thread 0x7fff1abfd700 (LWP 36516)]
[New Thread 0x7fff1a1fc700 (LWP 36517)]
[New Thread 0x7fff197fb700 (LWP 36518)]
[New Thread 0x7fff18dfa700 (LWP 36519)]
[New Thread 0x7fff03fff700 (LWP 36520)]
[New Thread 0x7fff035fe700 (LWP 36521)]
[New Thread 0x7fff02bfd700 (LWP 36522)]
[New Thread 0x7fff021fc700 (LWP 36523)]
[New Thread 0x7fff017fb700 (LWP 36524)]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7959506 in pcap_read_packet (handle=0x2631640,
callback=0x7ffff795d720 <pcap_oneshot>, userdata=0x7fffffffda20
"p\025c\002") at ./pcap-linux.c:1807
1807    ./pcap-linux.c: No such file or directory.
        in ./pcap-linux.c
Missing separate debuginfos, use: debuginfo-install
GeoIP-1.5.1-5.el6.x86_64 glibc-2.12-1.149.el6_6.5.x86_64
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-33.el6.x86_64
libcom_err-1.41.12-21.el6.x86_64 libgcc-4.4.7-11.el6.x86_64
libselinux-2.0.94-5.8.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64
numactl-2.0.9-2.el6.x86_64 openssl-1.0.1e-30.el6_6.5.x86_64
zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x00007ffff7959506 in pcap_read_packet (handle=0x2631640,
callback=0x7ffff795d720 <pcap_oneshot>, userdata=0x7fffffffda20
"p\025c\002") at ./pcap-linux.c:1807
#1  0x00007ffff795d79b in pcap_next (p=<value optimized out>, h=<value
optimized out>) at ./pcap.c:218
#2  0x0000000000a4a490 in iosource::pcap::PcapSource::ExtractNextPacket
(this=0x2631430, pkt=0x2631468) at */nsm/bro/git/bro2.3-419/bro/*
src/iosource/pcap/Source.cc:151
#3  0x0000000000a7580c in iosource::PktSrc::ExtractNextPacketInternal
(this=0x2631430) at /nsm/bro/git/bro2.3-419/bro/src/iosource/PktSrc.cc:432
#4  0x0000000000a7511b in iosource::PktSrc::NextTimestamp
(this=0x2631430, local_network_time=0x7fffffffdcb8) at
/nsm/bro/git/bro2.3-419/bro/src/iosource/PktSrc.cc:241
#5  0x0000000000a71193 in iosource::Manager::FindSoonest (this=0xf29bc0,
ts=0x7fffffffddc8) at */nsm/bro/git/bro2.3-419/bro/*
src/iosource/Manager.cc:82
#6  0x00000000007895d1 in net_run () at */nsm/bro/git/bro2.3-419/bro/*
src/Net.cc:301
#7  0x00000000006d8ed7 in main (argc=3, argv=0x7fffffffe498) at
/nsm/bro/git/bro2.3-419/bro/src/main.cc:1200

All,

A few other folks reported similar segfault issues to the PF_RING team both with standard PF_RING and DNA/ZC. After some troubleshooting and debugging they were able to to issue a patch (in SVN build 9021) that at least in initial testing seems to have resolved the segfault issue. Bro appears to now work segfault free using PF_RING (6.0.3 build 9021) both without DNA/ZC and with DNA using RSS. I'm still seeing a separate issue I'm following up with them on concerning not being able to map more than 10 app instances when using libzero's pfdnacluster_master script for load-balancing on host.

Regards,
Gary

Confirmed, pf_ring from SVN (with libpcap 1.6.x line), Bro 2.3.2 and
no segfaults from over 10 hours now.

Everything works in the AWS VM (so no DNA/ZC, but multiple workers). I
had issues with release version of pf_ring (6.0.2) but 6.0.3 build
9021 is OK.

As a follow up to the second issue I was seeing. Alfredo (from NTOP) suggested that pfdnacluster_master was not allowing me to listen with more than 10 app instances possibly due to the name being too long when using the long format dnacluster:21@10 as opposed to dnacl:21@10. Bro used to work fine with the long name, so perhaps this is some change in PF_RING. Using the short name allowed Bro to bind workers to dnacl:21@10 and greater, but this seems to cause broctl to not call capstats properly (for a dnacluster). When using dnacluster:21 'broctl capstats' had some logic that would trigger broctl to call a single instance of capstats for each worker node as app2 on the dnacluster, where app2 had a single queue with a full copy of the traffic, so it wouldn't get a 'no such device message'. This doesn't seem to work when using the short name version of 'dnacl', and instead broctl calls capstats for each individual worker and since these are already bound to the app1 queues there is nothing for capstats to listen on.

Regards,
Gary

Thanks for reporting this issue. I've made a small change
to broctl so that it now recognizes the shorter name "dnacl".

-Daniel

Thanks, Daniel! I can confirm that the patch fixed the issue for 'broctl capstats' when using the short name dnacl for an interface for me.

Regards,
Gary