A few questions

Good afternoon. I am still relatively new to Bro and working on building a cluster here at MUSC. In the process of setting up and configuring the IDS I have run into some issues and would like to ask the list a few questions.

  1. Is Linux even a reliable platform to think about using for Bro? Based on my experience the logs seem to be missing traffic. I have been making connections in and out of our network that pass through our network TAP and Bro does not always log them. Upon further investigation it appears that packets are being dropped (based on broctl netstats worker-1). I attempted to use pf_ring and compile Bro with libpcap-1.0.0-ring. This seemed to help some but not a lot.

  2. In regards to question #1, am I interpreting the output of broctl netstats correctly? Specifically if my dropped number is higher than my recvd number then that means Bro is processing < 50% of my network traffic?

  3. In the “diag” output I see that the workers are reporting “pcap bufsize = 8192”. Is this tunable on Linux? Are there any other suggestions for Linux tuning to decrease the amount of dropped packets?

  4. Is anyone else running a reliable, stable Bro cluster on Linux?

We are using RedHat Enterprise Linux 5.4, 64-bit.

Thanks,

Scott Powell

Unix Systems Engineer / Information Security Analyst

Office of the CIO - Information Systems (OCIO-IS)

Medical University of South Carolina

powellsm@musc.edu

(843) 792-6651

Good afternoon. I am still relatively new to Bro and working on building a
cluster here at MUSC. In the process of setting up and configuring the IDS I
have run into some issues and would like to ask the list a few questions.

1) Is Linux even a reliable platform to think about using for Bro? Based
on my experience the logs seem to be missing traffic. I have been making
connections in and out of our network that pass through our network TAP and
Bro does not always log them. Upon further investigation it appears that
packets are being dropped (based on broctl netstats worker-1). I attempted to
use pf_ring and compile Bro with libpcap-1.0.0-ring. This seemed to help some
but not a lot.

Try the following in /etc/sysctl.conf

net.core.rmem_max = 33554432
net.core.netdev_max_backlog = 10000
net.core.rmem_default = 33554432

What output do you get from capstats?

How much CPU is your bro process using? As long as it isn't maxing out a cpu
core, it shouldn't be dropping packets. If it is maxing out the cpu, then the
problem isn't with capturing, it is with doing too much analysis. If you have
an ethernet card that uses the igb driver you can try the pf_ring tn_api stuff:

http://www.ntop.org/TNAPI.html

you can use it to run a single node bro cluster with each worker capturing from
eth0@0,eth0@1,eth0@2,eth0@3

2) In regards to question #1, am I interpreting the output of broctl
netstats correctly? Specifically if my dropped number is higher than my recvd
number then that means Bro is processing < 50% of my network traffic?

What version of bro are you running? in 1.4.x the pcap stats for dropped
packets were recorded incorrectly on linux. I see some ammount of dropped
packets, but usually less than 1 percent.

3) In the "diag" output I see that the workers are reporting "pcap
bufsize = 8192". Is this tunable on Linux? Are there any other suggestions
for Linux tuning to decrease the amount of dropped packets?

4) Is anyone else running a reliable, stable Bro cluster on Linux?

I've been running bro on linux for years now...

We are using RedHat Enterprise Linux 5.4, 64-bit.

Debian 64bit :slight_smile:

Justin,

Thanks for the reply. After some further investigation the issue appears to be CPU related. My bro process on worker-1 (which has my external Internet TAP connected to eth1) was using 100% of a CPU core. I turned off http-request and http-reply analysis and I'm now seeing CPU percentage between 60% and 90% with upwards of a 90% packet received rate.

My concern is these machines have 2 x AMD Opteron Quad Core 2.1 GHz processors and yet Bro cannot keep up with the out of the box policy configuration. Also, it seems all of my analysis is being done on one core of the worker with the TAP. Why isn't the analysis being spread across the other workers? They seem to be sitting idle.

Thanks for the other tuning suggestions. I have implemented those as well.

-Scott

I'm not sure I have fully understood how you set things up, but you
need some external way of distributing the traffic across the
workers. If the workers are running on separate PCs, that's
typically some form of load-balancing frontend device. If they all
run on the same box (in order to leverage multiple core), you can
try some BPF tricks.

Robin

Robin,

I wondered if I needed some sort of distributor/load balancer external to the workers but wasn't sure based on the documentation.

Currently our network TAPs (external, DMZ, internal, etc.) go to single NICs on different machines. We have been using these for years to capture Netflow data with Argus as well as running Snort on some of them. We do not distribute a single TAP across different interfaces or servers.

Given our current setup, how would I go about these BPF tricks to leverage multiple cores on a single machine? It is starting to sound like I would want to go about running Bro standalone installations on the TAPs I would be interested in monitoring but the amount of traffic is too high to turn on all of the out of the box analyzers, unless I can take advantage of multiple cores.

Thanks,
Scott

Robin et al,

OK, a little more info. It appears that the analyzers that are killing the CPU are the HTTP ones. I do not won't to disable these because they log very useful information. However, I cannot seem to keep up on one core. I either need a way to process the analysis on multiple cores or I need a frontend to distribute the load to multiple nodes. I do not have a hardware frontend solution so I would be interested in software solutions such as click. I saw it mentioned on the Wiki and in Workshop slides but are there example configs somewhere?

Thanks,
Scott

Attached is a click config that splits up traffic into 3 queues.
I have it using pcap since I ignore a few hosts on campus that do a ton of bulk
traffic that is not intersting from within Bro.

Like I mentioned in my other reply, if you have a newer intel card you can do
this without click. I run the usermode click and it uses about 60% of one
core(I have 8) to split up the traffic. If it took any more I would just get
the better intel GigE card and do the traffic splitting in hardware.

load-balance.click (401 Bytes)

This doesn't work right. Your config is based on the change you made to the hashswitch element in your build isn't it?

   .Seth

No.. we originally had this:

my_switch :: HashSwitch(0, 6);

but that was asuming the ethernet header had already been stripped by another
element. The 26 skips the ethernet header and then everything works right.

I've been running with that config for over a month now and as far as I can
tell it is working properly :slight_smile:

Yeah, the 8 is what I was referring to though. The two directions of traffic could go to different outputs because it would be hashing the bytes of both IP addresses and would be two different values for the two directions.

   .Seth

Yeah.. I thought that was the problem originally because the traffic was going
to different outputs, but it was just the offset that needed fixing..

The HashSwitch implementation in click adds each byte to generate the hash, so
A->B hashes to the same thing as B->A

It is a pretty dumb hash but it works well enough:

worker-1: 1265210113.069432 recvd=27566997
worker-2: 1265210113.080038 recvd=26377039
worker-3: 1265210113.013432 recvd=23995748

The click setup already mentioned is probably the better solution,
but when using BPF, you would give each worker a different BPF
filter ignoring all but its slice of the traffic. One can express
the hash "(src+dst) mod n" in BPF (let me know if you want the exact
filter).

Robin

I know that the Intels are able to do that but I'm curious how
actually set it up. Have you played that with? We're just getting
some of Intel's 10G models ...

Robin

Ah! Perfect. I didn't realize that hashswitch was implemented that way.

   .Seth

I haven't played with it myself.. I thought the last box I got was going to
come with the newer igb cards, but it has an older e1000 based one.

There is a lot of information on it here:

http://www.ntop.org/TNAPI.html

I'm not sure if his patches are required to make it work, or just work faster..

the key bit seems to be:

# insmod ./<driver name>.ko IntMode=3 (IntMode=3 enables MSI-X)

"It's now time to start your multiqueue PF_RING application. Suppose you use
ethX with Y RX queues: you can either capture from ethX (aggregated traffic
from all RX queues) or from the single queues ethX@0 ... ethX@Y-1. This means
that if you capture from the ethX device you capture from all queues (PF_RING
merges traffic from all incoming queues). Instead for maximum performance you
can create a multithreaded application which captures from the single queues."

Robin,

Yes, I went with the click setup as provided by Justin and so far so good. I'm not dropping any packets yet.

Justin - thanks again for the config.

-Scott

Very helpful, I hadn't seen that before. Thanks!

Robin

P.S.: Now I'm looking for a FreeBSD solution as well ...

Let me know if you find one!!!

   .Seth

For the sake of future reference (and my own curiosity) can I request
the filter be sent to the list?

Thanks,
Tim

Tim,

I figured that I would chime in since I'm running with the BPF filter.
I have had decent performance using the BPF filter. I am attempting to
process 1 Gbps on a two-processor, quad-core box with 24 GB of RAM.
Right now, it is dropping about 40-50% of the traffic. I think Seth is
running with a hardware frontend that distributes the traffic to eight
worker computers with dual or quad cores. That gives 16 or 32 CPU cores
to process traffic. Since I don't have a hardware frontend, I am
limited to processing traffic on a single computer, which means up to
eight workers (one per core).

Here is the formula that Robin gave me:
((ip[12:4]+ip[16:4])-((ip[12:4]+ip[16:4])/N)*N) = J

N is the total number of processes you have (ideally a prime), and J
is a different value out of {0...N-1} for each process.

The mod operation broken down:
((A)-((A)/N)*N) = J
A=ip[12:4]+ip[16:4]
ip[12:4] = source IP
ip[16:4] = destination IP

It will probably work as-is, but the offset of 12 and 16 for may vary
depending on how much of the ethernet header is on the packet. You can
use dumpcap/Wireshark to grab packets and figure out where the header is
for your interface. Wireshark shows my source IP offset at 26, which is
what Justin had in his Click configuration, but 12 and 16 seem to be
working for me. The ethernet header appears to be stripped off by the
time it gets processed by Bro.

--- Example node.cfg ----
[worker-1]
type=worker
host=_dns_name_of_worker_
interface=igb1
aux_scripts=worker1.bro

--- Example worker1.bro ---
redef restrict_filters += { ["mod source and dest pairs over multiple
procs "] = "((ip[14:2]+ip[18:2])-((ip[14:2]+ip[18:2])/8)*8) == 0" };

I was using 14:2 and 18:2 to look at the last two octets of the src and
dst IP since we have a class B subnet. I don't know that it is
computationally less expensive, but it seems to give the same results.

I have all 8 workers listening to the same ethernet interface igb1, so
all eight workers have to filter the full 1Gbps of traffic down to 1/8th
the amount. Click! should have better performance because it splits the
1Gbps eight ways first, then sends 1/8th to each worker.

It looks like Click is available for FreeBSD. I'd like to test that to
see if I can gain some performance. I am having difficulty compiling it
on FreeBSD 7.1 amd64 however. If anyone has Click/FreeBSD working,
please let me know.

The Bro wiki mentions that Click! is limited to 2Gbps in tests. I
wonder if that is still true? I was thinking about the possibility of
installing a 10 Gig card in the current server as well as some
additional 1 Gig ports. Then using Click to split the traffic to some
workers on this box and send the rest out the additional 1 Gig ports to
some additional workers. That way I could use this server as a frontend
plus workers, but expand the cluster to additional computers.

Tyler