Hello,
I’ve been using Bro and PF_RING:ZC but workers keep crashing. Most recently worker-1 crashed ~9 hours after deploying.
zbalance_ipc is run as :
/zbalance_ipc -i zc:em2 -l /var/log/zbalance.log -c 97 -n 6 -m 1 -a -r 0:dummy0 -r 1:dummy1 -r 2:dummy2 -r 3:dummy3 -r 4:dummy4 -r 5:dummy5
node.cfg:
[manager]
type=manager
host=localhost
pin_cpus = 2
[logger]
type=logger
host=localhost
pin_cpus = 3
[proxy-1]
type=proxy
host=localhost
pin_cpus = 4
[worker-1]
type=worker
host=localhost
interface=dummy0
pin_cpus = 5
[worker-2]
type=worker
host=localhost
interface=dummy1
pin_cpus = 6
[worker-3]
type=worker
host=localhost
interface=dummy2
pin_cpus = 7
[worker-4]
type=worker
host=localhost
interface=dummy3
pin_cpus = 8
[worker-5]
type=worker
host=localhost
interface=dummy4
pin_cpus = 9
[worker-6]
type=worker
host=localhost
interface=dummy5
pin_cpus = 10
broctl diag output for worker-1:
[worker-1]
No core file found.
Bro 2.5
Linux 3.10.0-693.17.1.el7.x86_64
Bro plugins:
Bro::AF_Packet - Packet acquisition via AF_Packet (dynamic, version 1.3)
Bro::PF_RING - Packet acquisition via PF_RING (dynamic, version 1.0)
==== No reporter.log
==== stderr.log
listening on dummy0
1529351356.071495 processing suspended
1529351356.071495 processing continued
1529382027.863765 received termination signal
1529382027.863765 469839129 packets received on interface dummy0, 1095837 dropped
/mnt/localraid/bro/share/broctl/scripts/run-bro: line 107: 110538 Killed nohup ${pin_command} $pin_cpu “$mybro” “$@”
==== stdout.log
max memory size (kbytes, -m) unlimited
data seg size (kbytes, -d) unlimited
virtual memory (kbytes, -v) unlimited
core file size (blocks, -c) unlimited
==== .cmdline
-i dummy0 -U .status -p broctl -p broctl-live -p local -p worker-1 local.bro broctl base/frameworks/cluster local-worker.bro broctl/auto
==== .env_vars
PATH=/mnt/localraid/bro/bin:/mnt/localraid/bro/share/broctl/scripts:/sbin:/bin:/usr/sbin:/usr/bin
BROPATH=/mnt/localraid/bro/spool/installed-scripts-do-not-touch/site::/mnt/localraid/bro/spool/installed-scripts-do-not-touch/auto:/mnt/localraid/bro/share/bro:/mnt/localraid/bro/share/bro/policy:/mnt/localraid/bro/share/bro/site
CLUSTER_NODE=worker-1
==== .status
TERMINATING [net_finish]
==== No prof.log
==== No packet_filter.log
==== No loaded_scripts.log
Are there any ways to get more detailed logs to help understand why the worker crashed? I restarted Bro with misc/profiling turned on, but these seems more like resource consumption statistics and won’t help to debug crashes.
Thanks