Bro workers die

Hi, me again.

Bro is a new one, from the SVN, but I had the same results with 2.1 stable.

broctl start
starting manager ...
starting proxy ...
starting nsm1-eth4-1 ...
starting nsm1-eth5-1 ...
starting nsm1-eth5-10 ...
starting nsm1-eth5-11 ...
starting nsm1-eth5-12 ...
starting nsm1-eth5-2 ...
starting nsm1-eth5-3 ...
starting nsm1-eth5-4 ...
starting nsm1-eth5-5 ...
starting nsm1-eth5-6 ...
starting nsm1-eth5-7 ...
starting nsm1-eth5-8 ...
starting nsm1-eth5-9 ...
(nsm1-eth5-12 still initializing)
(nsm1-eth5-9 still initializing)
(nsm1-eth5-10 still initializing)
(nsm1-eth5-11 still initializing)
(nsm1-eth4-1 still initializing)

And after a while

Name Type Host Status Pid Peers Started
nsm1-eth4-1 worker <ip> crashed
nsm1-eth5-10 worker <ip> crashed
nsm1-eth5-11 worker <ip> crashed
nsm1-eth5-12 worker <ip> crashed
nsm1-eth5-9 worker <ip> crashed
manager manager <ip> running 44798 9 22 Apr 19:27:37
proxy proxy <ip> running 44845 9 22 Apr 19:27:39
nsm1-eth5-1 worker <ip> running 45048 2 22 Apr 19:27:41
nsm1-eth5-2 worker <ip> running 45057 2 22 Apr 19:27:41
nsm1-eth5-3 worker <ip> running 45060 2 22 Apr 19:27:41
nsm1-eth5-4 worker <ip> running 45063 2 22 Apr 19:27:41
nsm1-eth5-5 worker <ip> running 45066 2 22 Apr 19:27:41
nsm1-eth5-6 worker <ip> running 45067 2 22 Apr 19:27:41
nsm1-eth5-7 worker <ip> running 45068 2 22 Apr 19:27:41
nsm1-eth5-8 worker <ip> running 45069 2 22 Apr 19:27:41

Two more questions:
1. does Bro use pf_ring by default with a configuration like this?
2. how can i change the load balancing method? I need to spread things more evenly.

cat /opt/bro/etc/node.cfg
[manager]
type=manager
host=<ip>

[proxy]
type=proxy
host=<ip>

[nsm1-eth4]
type=worker
host=<ip>
interface=eth4
lb_method=pf_ring
lb_procs=1

[nsm1-eth5]
type=worker
host=<ip>
interface=eth5
lb_method=pf_ring
lb_procs=12

1. does Bro use pf_ring by default with a configuration like this?

Yes, it's the lb_method=pf_ring that enables it.

2. how can i change the load balancing method? I need to spread things
more evenly.

What do you want to change it to? I think it's doing 4-tuple or 5-tuple by default right now.

One problem you will encounter is a issue with pf_ring cluster_id choice. You will be running two pf_ring clusters on the same host (i'm assuming that nsm1 is the same physical host) and pf_ring doesn't like that. It does something weird like trying to stick packets from both NICs into the same queue. We have it fixed for our next release (that did get merged into master, right Daniel?) but it's a problem right now.

You are sending us enough information to determine why you're seeing crashes though. Could you send the output from broctl diag nsm1-eth5-1 (assuming that's a host that is currently crashed)?

Thanks,

  .Seth

1. does Bro use pf_ring by default with a configuration like this?

Yes, it's the lb_method=pf_ring that enables it.

2. how can i change the load balancing method? I need to spread things
more evenly.

What do you want to change it to? I think it's doing 4-tuple or 5-tuple by default right now.

OK, I might be wrong on that, it has helped in a big way for snort.


One problem you will encounter is a issue with pf_ring cluster_id choice. You will be running two pf_ring clusters on the same host (i'm assuming that nsm1 is the same physical host) and pf_ring doesn't like that. It does something weird like trying to stick packets from both NICs into the same queue. We have it fixed for our next release (that did get merged into master, right Daniel?) but it's a problem right now.

I'm running the SVN code, so you think it does not choose a unique cluster id for eth4 and another for eth5? How can i fix it?

You are sending us enough information to determine why you're seeing crashes though. Could you send the output from broctl diag nsm1-eth5-1 (assuming that's a host that is currently crashed)?

Thanks,

   .Seth

broctl diag nsm1-eth5-1
[nsm1-eth5-1]

Bro 2.1-386

==== No reporter.log

==== stderr.log
listening on eth5, capture length 8192 bytes

1366658863.663940 processing suspended
1366658863.664006 processing continued
1366658869.682828 Failed to open GeoIP database: /usr/share/GeoIP/GeoIPCity.dat
1366658869.682828 Fell back to GeoIP Country database
1366658869.682828 Failed to open GeoIP database: /usr/share/GeoIP/GeoIPCityv6.dat

==== stdout.log
unlimited

==== .cmdline
-i eth5 -U .status -p broctl -p broctl-live -p local -p nsm1-eth5-1 local.bro broctl base/frameworks/cluster local-worker.bro broctl/auto

==== .env_vars
PATH=/opt/bro/bin:/opt/bro/share/broctl/scripts:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bro/bin
BROPATH=/nsm/bro/spool/installed-scripts-do-not-touch/site::/nsm/bro/spool/installed-scripts-do-not-touch/auto:/opt/bro/share/bro:/opt/bro/share/bro/policy:/opt/bro/share/bro/site
CLUSTER_NODE=nsm1-eth5-1

==== .status
RUNNING [net_run]

==== No prof.log

==== No packet_filter.log

==== No loaded_scripts.log

I'm running the SVN code, so you think it does not choose a unique cluster id for eth4 and another for eth5? How can i fix it?

I don't know if that's fixed in master yet (i'm assuming you're running git master).

broctl diag nsm1-eth5-1

That shows the process is running fine. You need to do that for a worker that is crashed.

  .Seth

Right, just noticed the stack traces.

root@nsm1:/nsm/bro/logs/current# broctl diag nsm1-eth5-9
[nsm1-eth5-9]

Bro 2.1-386

core
[New LWP 54717]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/bro/bin/bro -i eth5 -U .status -p broctl -p broctl-live -p local -p nsm1-e'.
Program terminated with signal 11, Segmentation fault.
#0 AsBool (this=0x0) at scan.l:1074

Thread 1 (Thread 0x7f3af913d740 (LWP 54717)):
#0 AsBool (this=0x0) at scan.l:1074
#1 do_atif (expr=<optimized out>) at scan.l:686
#2 0x000000000051c95e in yyparse () at parse.y:1203
#3 0x00000000004c615e in main (argc=18, argv=<optimized out>) at /home/michal/bro/src/main.cc:801

==== No reporter.log

==== stderr.log
error in /opt/bro/share/bro/base/frameworks/cluster/./main.bro, line 136: no such index (Cluster::nodes[Cluster::node])
warning in /opt/bro/share/bro/base/frameworks/notice/./cluster.bro, line 23: non-void function returns without a value: Cluster::local_node_type
/opt/bro/share/broctl/scripts/run-bro: line 60: 54717 Segmentation fault (core dumped) nohup $mybro $@

==== stdout.log
unlimited

==== .cmdline
-i eth5 -U .status -p broctl -p broctl-live -p local -p nsm1-eth5-9 local.bro broctl base/frameworks/cluster local-worker.bro broctl/auto

==== .env_vars
PATH=/opt/bro/bin:/opt/bro/share/broctl/scripts:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
BROPATH=/nsm/bro/spool/installed-scripts-do-not-touch/site::/nsm/bro/spool/installed-scripts-do-not-touch/auto:/opt/bro/share/bro:/opt/bro/share/bro/policy:/opt/bro/share/bro/site
CLUSTER_NODE=nsm1-eth5-9

==== .status
INITIALIZING [main]

==== No prof.log

==== No packet_filter.log

==== No loaded_scripts.log
You have new mail in /var/mail/root
root@nsm1:/nsm/bro/logs/current# broctl diag nsm1-eth4-1
[nsm1-eth4-1]

Bro 2.1-386

core
[New LWP 54008]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/bro/bin/bro -i eth4 -U .status -p broctl -p broctl-live -p local -p nsm1-e'.
Program terminated with signal 11, Segmentation fault.
#0 AsBool (this=0x0) at scan.l:1074

Thread 1 (Thread 0x7ff6c1d0f740 (LWP 54008)):
#0 AsBool (this=0x0) at scan.l:1074
#1 do_atif (expr=<optimized out>) at scan.l:686
#2 0x000000000051c95e in yyparse () at parse.y:1203
#3 0x00000000004c615e in main (argc=18, argv=<optimized out>) at /home/michal/bro/src/main.cc:801

==== No reporter.log

==== stderr.log
error in /opt/bro/share/bro/base/frameworks/cluster/./main.bro, line 136: no such index (Cluster::nodes[Cluster::node])
warning in /opt/bro/share/bro/base/frameworks/notice/./cluster.bro, line 23: non-void function returns without a value: Cluster::local_node_type
/opt/bro/share/broctl/scripts/run-bro: line 60: 54008 Segmentation fault (core dumped) nohup $mybro $@

==== stdout.log
unlimited

==== .cmdline
-i eth4 -U .status -p broctl -p broctl-live -p local -p nsm1-eth4-1 local.bro broctl base/frameworks/cluster local-worker.bro broctl/auto

==== .env_vars
PATH=/opt/bro/bin:/opt/bro/share/broctl/scripts:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
BROPATH=/nsm/bro/spool/installed-scripts-do-not-touch/site::/nsm/bro/spool/installed-scripts-do-not-touch/auto:/opt/bro/share/bro:/opt/bro/share/bro/policy:/opt/bro/share/bro/site
CLUSTER_NODE=nsm1-eth4-1

==== .status
INITIALIZING [main]

==== No prof.log

==== No packet_filter.log

==== No loaded_scripts.log
root@nsm1:/nsm/bro/logs/current#

Have you run "broctl install" since you last changed your node.cfg file?

  .Seth

root@nsm1:~# broctl status 2>&1 | grep nsm1 | grep worker | wc -l
13
root@nsm1:~# broctl status 2>&1 | grep nsm1 | grep running | wc -l
13

Awesome! Thank you, i didn't know I'm supposed to :slight_smile:

Now on to the traffic filtering (which is ignored) but I've separated that into another post.

Seth,

The only time I am seeing dropped packets are during attempts to us TACC to amplify dos attach very aggressive port scans.

In both cases bro workers are being overloaded by 500kk to 1000k incoming packets. It looks like a single worker can only handle 30K packets/sec before it reaches 100 percent cpu usage. Is there any effort going into bro development to handle these cases.

My only work around that I have now is to block aces to common ports at the boarder router and opening host to vetted hosts.

Bill Jones