I’m trying to troubleshoot a Bro IDS that is experiencing capture loss with dropped packets. The machine I’m using has a 16-core Intel Xeon processor, 96Gb RAM, and an Intel NIC. I have 3 Bro workers with CPU affinity enabled and I’m using the pf_ring module on CentOS with no custom Bro scripts running. All of my processors are running at 99% utilization.
According to my operating system, I’m dropping about 8000 packets over the course of a day on a 300-400Mbps network. According to Bro capstats, I am dropping about the same number of packets I’m receiving, sometimes more than I receive. My capture_loss.log shows my workers lose about 30-50% packets and my manager and proxy, 70-90%. I can provide any configurations or screenshots if necessary.
I’m trying to troubleshoot where the issue lies. I initially installed Bro with all the recommended packages (tcmalloc, etc…) and the pf_ring module and I can see that Bro is using it. At this point, everything I see is pointing to an application issue and I’m running Bro version 2.5. I had the same issue with Bro v.2.4 as well.
Short of tweaking OS kernel and NIC card settings, I’m not sure where else I could try to reduce my packet drop count in Bro. Any recommendations?
First I think the recommended number of workings is something like number of real cores (not counting hyperthreading) -2 so for 8 real cores you would use 6 workers, if you have 16 real cores you probably want closer to 14 workers if this is a dedicated bro box. Maybe try bumping up your number of workers and enabling cpu pinning if you haven’t done so.
Specifically a few things come to mind…I know you mentioned NIC settings but are you sure you disabled all the NIC offloading features using ethtool?, more detail on that at this link:
Also, wouldn’t hurt to double check the the pf_ring kernel module is loaded/loading staying loaded? If you patch the server and the kernel gets updated unless you have something automated to reload/reinstall the pf_ring module you will probably need to reload the pf_ring module for the new kernel…
Also, did you configure the number of ring slots for PF_RING ?
Check to be sure that /etc/modprobe.d/pf_ring.conf exists for your PF_RING installation…this is where you will configure the number of ring slots for PF_RING, the default is 4096 I believe but on busy networks this needs to be increased as appropriate (in increments of 4096)…the max value is 65534. I would try that if you’ve tried everything else at the first link above to no avail…
This is also a great resource re: PF_RING and number of ring slots:
I definitely did. I tried asking earlier if there was a difference between adding more Bro workers via [worker-1],[worker-2],[worker-3],etc… vs lb_procs ‘N’ but didn’t receive a response. I tried both methods in the node.cfg file with little to no noticeable performance impact. I’m definitely using CPU affinity.
Right now I’m working with ntop on an issue with PF_RING because their repo and rpm packages are not correctly loading the pf_ring module into the kernel and I get errors when attempting to run this following command to validate my PF_RING install after a successful Bro installation:
bro -N Bro::PF_RING
I also have an issue where the pf_ring repo packages are interfering with a Bro reinstall because Bro no longer recognizes the libpcap library. Still working that one out with ntop.
There is no difference. The whole lb_procs thing arose because we used to have people with huge node.cfg files because they were running a lot of workers. Adding the lb_procs mechanism gave them a short hand to not have to configure each and every worker.