Linux Kernel dropping a lot of packets

I'm wondering why Bro is so quiet. So I tried tcpdump with Bro's filter ...

[root@rhyolite rreitz]# /usr/sbin/tcpdump -i eth2 '((((((((((((((((((((((port 111) or (port smtp)) or (port ftp)) or (port smtp)) or (icmp)) or (tcp[2:2] > 32770 and tcp[2:2] < 32901 and tcp[0:2] != 80 and tcp[0:2] != 22 and tcp[0:2] != 139)) or ((ip[6:2] & 0x3fff != 0) and tcp)) or (port 111)) or (tcp dst port 80 or tcp dst port 8080 or tcp dst port 8000)) or (tcp src port 80 or tcp src port 8080 or tcp src port 8000)) or (port 6666)) or (port 512 or port 513 or port 515)) or (tcp port 80 or tcp port 8080 or tcp port 8000 or tcp port 8001)) or (port telnet or tcp port 513)) or (port telnet)) or (port 53)) or ((src net 131.225.0.0/16 or src net 198.124.212.0/24 or src net 198.124.213.0/24) and (dst port 135 or dst port 137 or dst port 139 or dst port 445))) or (tcp[13] & 7 != 0)) or (port ftp)) or (port 6667)) or (port 143)) or (udp port 69)) or (port 161 or port 162)'
tcpdump: WARNING: eth2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
....

Killing after ~10 seconds ...

166 packets captured
249098 packets received by filter
248721 packets dropped by kernel

Do I read this as the kernel dropped the packets since they failed the Bro filter, or is the kernel just dropping packets because it's Tuesday?

I see in the email list advice like ...

"That all boils down to this certainly looking like a problem with the
packet filter itself rather than Bro."

I'm using ...

[root@rhyolite rreitz]# cat /etc/redhat-release
Scientific Linux Fermi LTS release 4.4 (Wilson)

This is Fermilab's release of RedHat Enterprise 4, update 4.

[root@rhyolite rreitz]# rpm -qa | egrep pcap
libpcap-0.8.3-10.RHEL4

Bro is 1.2.1 - not using the built-in libpcap.

[root@rhyolite bro-1.2.1]# tail /etc/sysctl.conf
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1
net.core.rmem_max = 16777216

[root@rhyolite bro-1.2.1]# /sbin/sysctl net.core.rmem_max
net.core.rmem_max = 131071

At this point, I remembered Jason Lee's advice to tune the Linux kernel. He suggested this link
http://www.net.t-labs.tu-berlin.de/research/bpcs/

So I did ...

[root@rhyolite bro-1.2.1]# cat /proc/sys/net/core/rmem_default
110592
[root@rhyolite bro-1.2.1]# echo 33554432 > /proc/sys/net/core/rmem_default
[root@rhyolite bro-1.2.1]# echo 33554432 > /proc/sys/net/core/rmem_max
[root@rhyolite bro-1.2.1]# echo 10000 > /proc/sys/net/core/netdev_max_backlog
[root@rhyolite bro-1.2.1]# /sbin/sysctl net.core.rmem_max
net.core.rmem_max = 33554432

OK, this looks like progress. I tried the same tcpdump as above. Now I see ...

121 packets captured
149216 packets received by filter
121673 packets dropped by kernel

Before the 'tune', the kernel was dropping 99.8%. After the tune, it's dropping 81.5%. Not much better. No fair to suggest I drop Linux for FreeBSD!

Suggestions?

Thanks,
Randy Reitz
Computer Security Team

<snip>

  If you are up for adventure you should look at the pf-ring code from
www.ntop.org. While fairly exciting to get in (it replaces the native pcap
code in the kernel) once you do it appears to work fairly well. On an earlier
version of pf-ring we managed to keep up with a 995 megabit jumbo frame netperf
run with argus (the jumbos however are the best case traffic senario). I have
the latest version running in an IBM P510 in OpenSUSE 10.2 and a 2.6.18 kernel
(I think) but haven't yet managed to get it in to a busy gig link yet (the
original link has gone 10 gig in the interrum and is thus no longer available
:-)). Small packets are its most likely weakness.

Peter Van Epp / Operations and Technical Support
Simon Fraser University, Burnaby, B.C. Canada

Peter Van Epp wrote:

  If you are up for adventure you should look at the pf-ring code from
www.ntop.org. While fairly exciting to get in (it replaces the native pcap
code in the kernel) once you do it appears to work fairly well. On an earlier
version of pf-ring we managed to keep up with a 995 megabit jumbo frame netperf
run with argus (the jumbos however are the best case traffic senario). I have
the latest version running in an IBM P510 in OpenSUSE 10.2 and a 2.6.18 kernel
(I think) but haven't yet managed to get it in to a busy gig link yet (the original link has gone 10 gig in the interrum and is thus no longer available
:-)). Small packets are its most likely weakness.

I tested this recently, and while a great improvement, it was
still considerably less than out-of-the-box FreeBSD performance.

Mark

Hmmm, perhaps I should test again. At that point on a dual athelon
FreeBSD (which is my default platform for running argus on) lost %50 of the
traffic on that gig link. Same hardware with Linux and pf-ring lost nothing.
I did see that the FreeBSD 6 series was supposed to improve networking but
unless they also made radical changes in bpf the kernel/user copy eats
memory bandwidth (which pf-ring I believe avoids by doing ugly things direct
to the page tables avoiding the memory to memory copy). I recall the pf-ring
author also saying the same trick wouldn't work on FreeBSD and he felt the
code was going to be hard to port to FreeBSD.

Peter Van Epp / Operations and Technical Support
Simon Fraser University, Burnaby, B.C. Canada

Peter Van Epp wrote:

<snip>

I should add that I did not attempt a comprehensive comparison, and
that the performance probably varies significantly as a function of
variables such as traffic profile. My tests used synthetic traffic
with a single packet size mix (simulating our actual environment.)
This was on 6.1 btw.

Mark

  A late thought on this subject: were you running a stock Linux kernel
(i.e. with just the pf-ring patches)? I use the config out of our HPC folks
with tcp stack tweaks which can (and does daily) do 995 megabits per second
across a 40 msec latency light path on a grid cluster here (the 200 terabyte
file store is here the compute engines are in several other sites across
town and several thousand miles away). The stock kernel (which we used by
accident during a test one day) gets 35 megabits per second on that same gig
link and I suspect that may impact capture performance too.
  When our HPC guys started up 5 or 6 years ago I suggested FreeBSD but
testing indicated that a properly tuned Linux kernel was just as fast (at
least in tcp :-)) as FreeBSD and was more common in Beowolf clusters so they
went Linux.

Peter Van Epp / Operations and Technical Support
Simon Fraser University, Burnaby, B.C. Canada