First off, I'll admit I'm new to both pf_ring and bro cluster set-up, so quite possibly I've made some rookie mistakes, but I've been trying to read documentation, source comments, and lists to try to fill in the gaps as best I can with a full helping of trial an error. I also understand that I'm attempting to test some features that are in development and not necessarily ready for prime-time.
I've been experimenting with the broctl with DNA support (topic/dnthayer/ticket845) on a single node to start. I have tried testing this with various RSS settings (0,1 and 4) as well as transparent mode 0 and 2 by tweaking the shell script load_dna_driver.sh that comes with pf_ring, but I could be horribly misconfiguring something somewhere. What seems to happen based on the output from running diag within an interactive broctl (and I may be misinterpreting things) is that every worker process tries to listen on the same cluster ID(21). pfdnacluster_master appears to run and then crash and then the workers seem to start in a non-DNA mode. Running capstats from within broctl usually returns an error that cluster ID 21 does not exist at this point, and attempting to run the stop command typically results in one or more worker process being hung up and having to be killed or crashing brotctl in some way. I thought I ran across a previous issue for vanilla pf_ring where there was another bug ID related to needing to spawn each process with a different cluster id, but can't recall. Maybe there are two different branches addressing different issues related to what I'm trying to do.
Here is what my node.cfg looks like (where xx.xx.xx.xx is currently the same IP for manager/proxy/worker):
[manager]
type=manager
host=xx.xx.xx.xx
[proxy-1]
type=proxy
host=xx.xx.xx.xx
[worker-1]
type=worker
host=xx.xx.xx.xx
interface=dna0
lb_procs=4
lb_method=pf_ring_dna
Typically what I end up seeing in /proc/net/pf_ring/ is something like this where processid-none.xx matches each bro worker process:
30194-dna0.12 30319-none.13 30320-none.14 30321-none.16 30322-none.15
and then after some time has passed:
30319-none.13 30320-none.14 30321-none.16 30322-none.15
Output from each looks a such:
# cat 30194-dna0.12
Bound Device(s) :
Active : 1
Breed : DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX only
Appl. Name : pfdnacluster_master-cluster-21-
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 128
Num Poll Calls : 0
Channel Id : 0
Num RX Slots : 8192
Num TX Slots : 8192
Tot Memory : 25952256 bytes
Cluster: Tot Recvd : 2217888
Cluster: Tot Sent : 0
# cat 30319-none.13
Bound Device(s) :
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 600262
# cat 30320-none.14
Bound Device(s) :
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 706408
cat 30321-none.16
Bound Device(s) :
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 775591
# cat 30322-none.15
Bound Device(s) :
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 886131
Any thoughts? Is anything I've said at all useful in seeing where I may be failing or where bro might not do what it is I'm trying to get it to do?
Regards,
Gary Faulkner
UW Madison
Office of Campus Information Security
608-262-8591