Broctl pf_ring_DNA support / Bro at 100G

Hello,

We recently lit up a 100G link and are attempting to tackle migrating our IDS and monitoring infrastructure from 10G to 100G capabilities. We have an existing set of servers that we are are using to evaluate SNORT, Suricata and Bro on with a 100G Gigamon upstream. For purposes of a Bro proof of concept I have two of the following Dell 720s to start from:

Dell 720XD
64 G RAM (1600 MHz RDIMMS)
30TB (usable) RAID 6 7.2K RPM SAS 6Gbps
2 146GB 15K RPM SAS 6Gbps
2 Intel Xeon E5-2670 2.60GHz, 20M Cache, 8.0GT/s QPI, Turbo, 8C
3 Intel X520 DP 10Gb DA/SFP+

I’m starting from build 2.2-beta-114 and looking at using it and PF_RING with the DNA drivers for the Intel cards for now as some of the other popular cards are “complicated” for us to get approval to purchase. I haven’t found much info on running Bro this way other than issue ID 845 and even that only suggests that there is a Bro Control plugin in the works for this, but that it may not be fully tested yet. Has anyone tried the plugin yet or have any experience configuring Bro and PF_RING/DNA to work together?

Regards,

If you want to test the PF_RING/DNA plugin, then you'll need to use
the BroControl in the branch "topic/dnthayer/ticket845" (in the broctl
git repo), but I'm not sure if anyone has successfully used it yet.

First off, I'll admit I'm new to both pf_ring and bro cluster set-up, so quite possibly I've made some rookie mistakes, but I've been trying to read documentation, source comments, and lists to try to fill in the gaps as best I can with a full helping of trial an error. I also understand that I'm attempting to test some features that are in development and not necessarily ready for prime-time.

I've been experimenting with the broctl with DNA support (topic/dnthayer/ticket845) on a single node to start. I have tried testing this with various RSS settings (0,1 and 4) as well as transparent mode 0 and 2 by tweaking the shell script load_dna_driver.sh that comes with pf_ring, but I could be horribly misconfiguring something somewhere. What seems to happen based on the output from running diag within an interactive broctl (and I may be misinterpreting things) is that every worker process tries to listen on the same cluster ID(21). pfdnacluster_master appears to run and then crash and then the workers seem to start in a non-DNA mode. Running capstats from within broctl usually returns an error that cluster ID 21 does not exist at this point, and attempting to run the stop command typically results in one or more worker process being hung up and having to be killed or crashing brotctl in some way. I thought I ran across a previous issue for vanilla pf_ring where there was another bug ID related to needing to spawn each process with a different cluster id, but can't recall. Maybe there are two different branches addressing different issues related to what I'm trying to do.

Here is what my node.cfg looks like (where xx.xx.xx.xx is currently the same IP for manager/proxy/worker):

[manager]
type=manager
host=xx.xx.xx.xx

[proxy-1]
type=proxy
host=xx.xx.xx.xx

[worker-1]
type=worker
host=xx.xx.xx.xx
interface=dna0
lb_procs=4
lb_method=pf_ring_dna

Typically what I end up seeing in /proc/net/pf_ring/ is something like this where processid-none.xx matches each bro worker process:

30194-dna0.12 30319-none.13 30320-none.14 30321-none.16 30322-none.15

and then after some time has passed:

30319-none.13 30320-none.14 30321-none.16 30322-none.15

Output from each looks a such:

# cat 30194-dna0.12
Bound Device(s) :
Active : 1
Breed : DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX only
Appl. Name : pfdnacluster_master-cluster-21-
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 128
Num Poll Calls : 0
Channel Id : 0
Num RX Slots : 8192
Num TX Slots : 8192
Tot Memory : 25952256 bytes
Cluster: Tot Recvd : 2217888
Cluster: Tot Sent : 0

# cat 30319-none.13
Bound Device(s) :
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 600262

# cat 30320-none.14
Bound Device(s) :
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 706408

cat 30321-none.16
Bound Device(s) :
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 775591

# cat 30322-none.15
Bound Device(s) :
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 886131

Any thoughts? Is anything I've said at all useful in seeing where I may be failing or where bro might not do what it is I'm trying to get it to do?

Regards,

Gary Faulkner
UW Madison
Office of Campus Information Security
608-262-8591

It looks like this behavior may be a case of not having a libzero license. I had licensed the DNA drivers, and hadn't realized I also needed the libzero piece. I'll try this again once I have the proper licensing. Thanks to Scott Campbell for pointing me in the right direction.

Regards,

Gary Faulkner
UW Madison
Office of Campus Information Security
608-262-8591