Finally getting back to this.
What’s the best way to disable Bro in a systematic way to isolate crashes ?
Sending us the diag output from broctl is best since it will include a back trace.
==== No reporter.log
==== stderr.log
listening on eth5, capture length 8192 bytes
/usr/local/3rd-party/bro/share/broctl/scripts/run-bro: line 60: 15452 Segmentation fault nohup mybro @
==== stdout.log
unlimited
unlimited
unlimited
==== .cmdline
-i eth5 -U .status -p broctl -p broctl-live -p local -p worker-5-9 local.bro broctl base/frameworks/cluster local-worker.bro broctl/auto
==== .env_vars
PATH=/usr/local/3rd-party/bro/bin:/usr/local/3rd-party/bro/share/broctl/scripts:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
BROPATH=/usr/local/3rd-party/bro/spool/installed-scripts-do-not-touch/site::/usr/local/3rd-party/bro/spool/installed-scripts-do-not-touch/auto:/usr/local/3rd-party/bro/share/bro:/usr/local/3rd-party/bro/share/bro/policy:/usr/local/3rd-party/bro/share/bro/site
CLUSTER_NODE=worker-5-9
==== .status
RUNNING [net_run]
==== No prof.log
==== No packet_filter.log
==== No loaded_scripts.log
– [Automatically generated.]
After ~12 hours I returned to find many of the worker nodes had crashed. I forgot to look at the diag for the crashed workers before stopping the cluster.
Do you have the cron command setup correctly? The workers should have been restart automatically after they crashed and a diagnostic email sent to you.
Mentioned in this section: http://bro-ids.org/documentation/quickstart.html#a-minimal-starting-configuration
I did not; it’s working properly now.
(…)
Total rings : 10
How many CPU cores do you have?
48 per server.
-rw-r–r-- 1 bro bro 10323 Aug 30 21:15 reporter.log
-rw-r–r-- 1 bro bro 52846117 Aug 30 21:27 weird.log
I’m curious about what’s in reporter.log, normally that shouldn’t have too much in it. Also, that’s an astonishingly large weird.log. Is there anything that stands out in those two?
reporter.log – looks like I need to setup GeoIPV6 database: /usr/share/GeoIP/GeoIPCityv6.dat (empty)
50 Reporter::INFO processing continued (empty) <cut…>
50 Reporter::INFO Failed to open GeoIP database: <cut…>
29 Reporter::INFO processing suspended (empty) <cut…>
weird.log –
bro@bc : [12:33am] : 2012-08-30 : ls -l weird.* | tail -5
-rw-r–r-- 1 bro bro 16757363 Aug 30 21:00 weird.20:00:00-21:00:00.log.gz
-rw-r–r-- 1 bro bro 304697 Aug 30 21:02 weird.21:00:00-21:02:10.log.gz
-rw-r–r-- 1 bro bro 39351508 Aug 30 22:00 weird.21:12:53-22:00:00.log.gz
-rw-r–r-- 1 bro bro 55141105 Aug 30 23:00 weird.22:00:00-23:00:00.log.gz
-rw-r–r-- 1 bro bro 38190282 Aug 31 00:00 weird.23:00:00-00:00:00.log.gz
bro@bc : [12:33am] : 2012-08-30 : gzcat weird.23:00:00-00:00:00.log.gz | awk ‘{print $7}’ | sort | uniq -c | sort -rn | head -10
614589 data_before_established
585445 possible_split_routing
260703 window_recision
190652 SYN_seq_jump
100211 inappropriate_FIN
64533 above_hole_data_without_any_acks
37882 connection_originator_SYN_ack
33611 data_after_reset
19106 Teredo_bubble_with_payload
11510 SYN_after_reset
bro@bc : [12:34am] : current : awk ‘{print $7}’ weird.log | sort | uniq -c | sort -rn | head -10
51561 window_recision
49218 possible_split_routing
47776 data_before_established
24526 Teredo_bubble_with_payload
19894 connection_originator_SYN_ack
11718 SYN_seq_jump
8938 inappropriate_FIN
8701 data_after_reset
7523 above_hole_data_without_any_acks
5765 inner_IP_payload_length_mismatch
Could you show me your node.cfg configuration too?
bro@bc : [12:41am] : bro : cat etc/node.cfg
[manager]
type=manager
host=z.z.z.M
[proxy-1]
type=proxy
host=z.z.z.M
[worker-1]
type=worker
host=z.z.z.A
interface=eth5
lb_procs=10
lb_method=pf_ring
[worker-2]
type=worker
host=z.z.z.B
interface=eth5
lb_procs=10
lb_method=pf_ring
[worker-3]
type=worker
host=z.z.z.C
interface=eth5
lb_procs=10
lb_method=pf_ring
[worker-4]
type=worker
host=z.z.z.D
interface=eth5
lb_procs=10
lb_method=pf_ring
[worker-5]
type=worker
host=z.z.z.E
interface=eth5
lb_procs=10
lb_method=pf_ring
Oh, and one last thing, have you made sure to disable all of special NIC features?
http://securityonion.blogspot.com/2011/10/when-is-full-packet-capture-not-full.html
Yeah, I’ve used those recommendations from the start with one exception; the Intel X520-DA2 cards I’m using do not support disabling “ufo” (UDP large send offload).
Adjust interface features