Issue with Bro reporting dropped packets

Hi,

I’m trying to troubleshoot a Bro IDS that is experiencing capture loss with dropped packets. The machine I’m using has a 16-core Intel Xeon processor, 96Gb RAM, and an Intel NIC. I have 3 Bro workers with CPU affinity enabled and I’m using the pf_ring module on CentOS with no custom Bro scripts running. All of my processors are running at 99% utilization.

According to my operating system, I’m dropping about 8000 packets over the course of a day on a 300-400Mbps network. According to Bro capstats, I am dropping about the same number of packets I’m receiving, sometimes more than I receive. My capture_loss.log shows my workers lose about 30-50% packets and my manager and proxy, 70-90%. I can provide any configurations or screenshots if necessary.

I’m trying to troubleshoot where the issue lies. I initially installed Bro with all the recommended packages (tcmalloc, etc…) and the pf_ring module and I can see that Bro is using it. At this point, everything I see is pointing to an application issue and I’m running Bro version 2.5. I had the same issue with Bro v.2.4 as well.

Short of tweaking OS kernel and NIC card settings, I’m not sure where else I could try to reduce my packet drop count in Bro. Any recommendations?

Thanks,

First I think the recommended number of workings is something like number of real cores (not counting hyperthreading) -2 so for 8 real cores you would use 6 workers, if you have 16 real cores you probably want closer to 14 workers if this is a dedicated bro box. Maybe try bumping up your number of workers and enabling cpu pinning if you haven’t done so.

Have you reviewed everything located here? :

https://www.bro.org/documentation/faq.html#how-can-i-reduce-the-amount-of-captureloss-or-dropped-packets-notices

Specifically a few things come to mind…I know you mentioned NIC settings but are you sure you disabled all the NIC offloading features using ethtool?, more detail on that at this link:

http://securityonion.blogspot.com/2011/10/when-is-full-packet-capture-not-full.html

Also, wouldn’t hurt to double check the the pf_ring kernel module is loaded/loading staying loaded? If you patch the server and the kernel gets updated unless you have something automated to reload/reinstall the pf_ring module you will probably need to reload the pf_ring module for the new kernel…

Also, did you configure the number of ring slots for PF_RING ?

Check to be sure that /etc/modprobe.d/pf_ring.conf exists for your PF_RING installation…this is where you will configure the number of ring slots for PF_RING, the default is 4096 I believe but on busy networks this needs to be increased as appropriate (in increments of 4096)…the max value is 65534. I would try that if you’ve tried everything else at the first link above to no avail…

This is also a great resource re: PF_RING and number of ring slots:

https://groups.google.com/forum/#!topic/security-onion/zu7U7U9pBT8

Hope this helps,

-Drew

Hi Drew,

I definitely did. I tried asking earlier if there was a difference between adding more Bro workers via [worker-1],[worker-2],[worker-3],etc… vs lb_procs ‘N’ but didn’t receive a response. I tried both methods in the node.cfg file with little to no noticeable performance impact. I’m definitely using CPU affinity.

Yep, I tried disabling as many NIC offloading features as I could and I’d like to mention I found a more comprehensive list of NIC offload disabling for Suricata that may be applied to Bro as well: http://pevma.blogspot.com/2014/03/suricata-prepearing-10gbps-network.html

Right now I’m working with ntop on an issue with PF_RING because their repo and rpm packages are not correctly loading the pf_ring module into the kernel and I get errors when attempting to run this following command to validate my PF_RING install after a successful Bro installation:

bro -N Bro::PF_RING

I also have an issue where the pf_ring repo packages are interfering with a Bro reinstall because Bro no longer recognizes the libpcap library. Still working that one out with ntop.

There is no difference. The whole lb_procs thing arose because we used to have people with huge node.cfg files because they were running a lot of workers. Adding the lb_procs mechanism gave them a short hand to not have to configure each and every worker.

  .Seth

Hi Seth,

That makes more sense. Thank you for the background. I was able to find and resolve the issue I was experiencing with ntop. Thank you both.

Sincerely,