Bro performance issues

Tomer_Teller · October 30, 2011, 9:46am

Hey all,

I am testing Bro’s performance using tcpreplay for some project of mine.

I am using a packet capture of 680000 packets using different rates to check for packet loss.

tcpreplay -i eth0 --mbps=X 680000.pcap (where X = 1000,500,100,10)

I am registered on the new_packet event in order to count packets like so:

global ctr = 0;
event new_packet (c: connection,p: pkt_hdr)
{
ctr = ctr + 1;
…
}

I write to log every time ctr % 100,000 = 0 to avoid unnecessary I/O to disk.

On the sender side i see that all packet was transmitted successfully as well as on the receiver side (using tcpdump), i.e. it is not libpcap issue.

Bro on the other hand, doesn’t see all 680000, he sees around 540,000.

I also used smaller packet captures (10/30/100 packets), again, bro does not see all packets.
Note! Packet captures are valid (checksum) HTTP connections that i recorded for testing.

I tried removing some analyzers using broctl as well as modifying local.bro.
Also followed the Bro performance tuning.

Nothing helps, Bro does not see all the packets.

Any ideas what is the problem?

Adayadil_Thomas · October 31, 2011, 1:45am

The bro policy that you have must be setting some BPF (libpcap) filter
so that Bro analyzes only the traffic that it wants to see.

Seth_Hall3 · October 31, 2011, 4:27am

If I remember correctly, the new_packet event is only fired for IPv4 packets. Internally it can't deal with IPv6 packets but it also doesn't work with non-IP packets. Do the numbers you're getting match the number of IPv4 packets in your traffic trace file?

.SEth

Tomer_Teller · October 31, 2011, 10:45am

All the packets are valid IPv4, I just noticed that my CPU goes to 92% so I am probably suffering drops due to load.

I decided to set up a cluster to utilize my machine’s 4 cores.

1 for Manager, 1 for Proxy and 2 for Workers.

To avoid installing click router and rewrite packets I want to load worker-1 and worker-2 with different policies so they won’t handle traffic twice.

worker1-policy.bro:
redef restrict_filters += { [“capture even src/dest pairs only”] = “(ip[12:4] + ip[16:4]) & 1 == 0” };

worker2-policy.bro:
redef restrict_filters += { [“capture even src/dest pairs only”] = “(ip[12:4] + ip[16:4]) & 1 == 1” };

If this possible and recommended? (Just trying to pump up performance)

How do I load worker-1 with ‘worker1-policy.bro’ and worker-2 with ‘worker2-policy.bro’ ? The documentation only talks about ‘local-worker.bro’ that is being loaded by all the workers.

Thanks

Seth_Hall3 · October 31, 2011, 12:35pm

All the packets are valid IPv4, I just noticed that my CPU goes to 92% so I am probably suffering drops due to load.

Very likely. I usually try not to send more than 80-100Mbps of traffic to a single core.

How do I load worker-1 with 'worker1-policy.bro' and worker-2 with 'worker2-policy.bro' ? The documentation only talks about 'local-worker.bro' that is being loaded by all the workers.

What version are you using? 1.5.x or the 2.0 beta we just released on friday? The answers to all of your questions will be different based on it.

.Seth

Tomer_Teller · October 31, 2011, 12:52pm

I am using version 1.5.3

Running on 2 x Intel Xeon 2.33GHz with 4GB FBDIMM and 8 Cores

For now I just want to test that 2 cores.

This is my node.cfg

[manager]

type=manager
host=localhost

[proxy-1]
type=proxy
host=localhost

[worker-1]
type=worker
host=localhost
interface=bg0

[worker-2]
type=worker
host=localhost
interface=bg0

I want to load balance my traffic between 2 cores using the mentioned restrict filter (due to NAT, maybe it will be wise to filter by Source port, even → worker-1, odd → worker-2)

Seth_Hall3 · October 31, 2011, 1:05pm

Use this....

event bro_init()
  {
  if ( peer_description == "worker-1" )
    restrict_filters += { ["capture even src/dest pairs only"] = "(ip[12:4] + ip[16:4]) & 1 == 0" };
  if ( peer_description == "worker-2" )
    restrict_filters += { ["capture even src/dest pairs only"] = "(ip[12:4] + ip[16:4]) & 1 == 1" };
  }

.Seth

Azoff_Justin · October 31, 2011, 1:22pm

Hey all,

I am testing Bro's performance using tcpreplay for some project of mine.

I am using a packet capture of 680000 packets using different rates to
check for packet loss.

tcpreplay -i eth0 --mbps=X 680000.pcap (where X = 1000,500,100,10)

...

Bro on the other hand, doesn't see all 680000, he sees around 540,000.

As a sanity check, what does bro report if you run it with something
like this:

'bro -f ip -C -r 680000.pcap your_counter_policy.bro'

Seth_Hall3 · October 31, 2011, 2:35pm

Sorry about that...

event bro_init()
  {
  if ( peer_description == "worker-1" )
    restrict_filters += table(["capture even src/dest pairs only"] = "(ip[12:4] + ip[16:4]) & 1 == 0");
  if ( peer_description == "worker-2" )
    restrict_filters += table(["capture even src/dest pairs only"] = "(ip[12:4] + ip[16:4]) & 1 == 1");
  }

.Seth

Tomer_Teller · October 31, 2011, 4:39pm

event bro_init()
{
if ( peer_description == “worker-1” )

restrict_filters += table([“capture even src/dest pairs only”] = “(ip[12:4] + ip[16:4]) & 1 == 0”);

if ( peer_description == “worker-2” )

restrict_filters += table([“capture even src/dest pairs only”] = “(ip[12:4] + ip[16:4]) & 1 == 1”);
}

Is causing the following error:

line 58 (restrict_filters += table(capture even src/dest pairs only = (ip[12:4] + ip[16:4]) & 1 == 0)): error, requires two arithmetic or two string operands

Martin_Holste · October 31, 2011, 4:49pm

Is there a reason you can't use PF_RING for this? It sure makes
things easier like this easier.

Tomer_Teller · October 31, 2011, 5:08pm

Do you mean PF_RING with front-end solution such as click router?
Is it possible to run everything on a single machine?

Seth_Hall3 · October 31, 2011, 5:33pm

Martin is referring to clustering in PF_RING. It will split your traffic into bidirectional flows within your kernel and it easy to configure with Bro 2.0-beta (I wouldn't try it with 1.5, it would be a bit of a mess). If you're running with broctl it will mostly just work with PF_RING out of the box including clustering, you just need to make sure you're building against the correct libpcap using PF_RING's libpcap wrapper and then all of your workers you configure in broctl's node.cfg file should sniff the same interface.

.Seth

Tomer_Teller · November 1, 2011, 8:26am

I installed Bro 2.0-beta on my machine.
I have to say that it was quick, easy and without any problems

I removed libpcap0.8 before the installation, installed PF_RING along with libpcap-1.1.1-ring which BRO is now using.

libpcap.so.1 => /usr/local/lib/libpcap.so.1
libpfring.so => /usr/local/lib/libpfring.so

I configured the node.cfg and added:
1 manager
1 proxy
2 workers - sniffing the same interface

All the nodes are on the same localhost

I’m replaying a big pcap file with 680000 packets and expecting to see some load-balancing between the 2 nodes (that are running on different cores).

I am using the ‘netstats’ command in broctl and expecting to see that half (or at least some) of the traffic goes to worker-1 and the rest to worker-2 (i.e. The sum of both workers packet received = 680000 ~)

I see that worker-1 took everything.
worker-1: 1320163523.794836 recvd=638311 dropped=31948 link=670259

And i’m assuming that worker-2 also got everything (duplicate).

How do I load-balance between the two workers on the same machine?

Also I noticed minor bugs:

[BroControl] > netstats
worker-3: <error: cannot connect to 127.0.1.1:47764>

[BroControl] > scripts
proxy-1 is ok.
cat: loaded_scripts*: No such file or directory
worker-1 is ok.
cat: loaded_scripts*: No such file or directory
worker-3 is ok.
cat: loaded_scripts*: No such file or directory

Gregor_Maier · November 1, 2011, 12:57pm

In terms of performance please note that using the new_packet() event generates a lot of overhead so the performance you see is
going to be significantly worse than in "normal" operation.

cu
gregor

Martin_Holste · November 1, 2011, 2:02pm

Looks like only one worker is even alive. There should be no tweaking
necessary to get the load-balancing to occur, so there's a fundamental
problem if it's not happening. It sounds like you've already got the
installation done, but I have a quick howto here:
ossectools.blogspot.com/2011/09/bro-quickstart-cluster-edition.html.
I would suggest trying a clean install to a different directory and
copying the config files over if you continue to have issues.

Seth_Hall3 · November 1, 2011, 2:05pm

Oh, good point. I should add something to the warnings file that prints out if you are handling that event to make sure people understand how badly it can impact performance.

.Seth

Seth_Hall3 · November 1, 2011, 2:08pm

Could you send the content of your node.cfg file? I noticed in your broctl session you sent, there was a reference to worker-3 which isn't represented in your description.

Thanks,
.Seth

Tomer_Teller · November 2, 2011, 3:45pm

This is my node.cfg config file:

[manager]
type=manager
host=localhost

[proxy-1]
type=proxy
host=localhost

[worker-1]
type=worker
host=localhost
interface=em0

[worker-2]
type=worker
host=localhost
interface=em0

I am running on Bro 2.0 Beta.

I am replying a 680000 packets pcap file to the machine.

When running broctl’s netstats, this is what i see:

worker-1: 1320276618.514073 recvd=669576 dropped=0 link=669576
worker-2: 1320276618.714115 recvd=669576 dropped=0 link=669576

I expect to see load-balance between worker-1 and worker-2 but they are getting the same traffic.

I decided to patch Bro in order to “support” that load-balancing.

I edited PktSrc::Process() (PktSrc.cc) and added my own code to distinguish between processes (i.e. worker-1, worker-2).

Then looked at the data variable, extracted the ip src & dst and checked for → (ipSRC ^ ipDST) % 2 == 0

Worker-1 gets all the Even Result, Worker-2 gets all the Odd Results

Also had to play around with (++stats.received) to reflect the new changes.

This small patch dramatically improved my performance.

Seth_Hall3 · November 2, 2011, 3:48pm

Can you send the output of:

broctl config | grep -i pfring

and:

ldd <prefix>/bin/bro

Thanks,
.Seth

Topic		Replies	Views
Troubleshooting crashes Zeek	21	239	May 6, 2022
Bro 2.0 packets dropped Zeek	22	198	May 6, 2022
Bro workers die Zeek	8	99	May 6, 2022
bro and pf_ring zc configuration success stories Zeek	18	379	May 6, 2022
BRO with PF_Ring multiplies log records Zeek	7	118	May 6, 2022

Bro performance issues

Related topics