Why does my logger keep crashing - bro version 2.6.3

Hello,

Why does my logger keep crashing? Can someone please help me with this. I have provided some system information below:

I am running bro version 2.6.3

Hi,

I would try to monitor the cpu \ mem usage of the logger instance.
Try running broctl top, my guess is that you will see that the logger process will have a very high cpu usage.

I know of an option to have multiple loggers but I am not sure how to set it up.

Are you writing to a file ?

B

Thanks for your response. The CPU usage for the logger is at 311%. (look below).

broctl top

Name Type Host Pid VSize Rss Cpu Cmd

logger logger localhost 22867 12G 9G 311% bro

I wasn’t aware that you could set up multiple loggers, I tried checking the docs to see if that was an option. Does anyone know how to do this?

The logger is threaded, so seeing CPU > 100% is not necessarily a problem.

Have you tried running "broctl diag logger" to see why the logger is
crashing? Do you have any messages in your system logs about
processing being killed for out of memory (OOM)?

  --Vlad

Thanks for your response.

I do see the following OOM message in my system logs on the logger process ID:
Sep 23 18:48:00 kernel: Out of memory: Kill process 10439 (bro) score 787 or sacrifice child
Sep 23 18:48:00 kernel: Killed process 10439 (bro), UID 0, total-vm:301983900kB, anon-rss:195261772kB, file-rss:2592kB, shmem-rss:0kB

Wonder why its taking so much memory, I have 251G and 99G swap on this server.
total used free shared buff/cache available
Mem: 251G 66G 185G 4.2M 488M 184G
Swap: 99G 1.1G 98G

Below is the output of "broctl diag logger", ran after the logger crashed.

[logger]

No core file found.

Bro 2.6.3
Linux 3.10.0-1062.1.1.el7.x86_64

Bro plugins:
Bro::AF_Packet - Packet acquisition via AF_Packet (dynamic, version 1.4)

==== No reporter.log

==== stderr.log
/usr/local/bro/share/broctl/scripts/run-bro: line 110: 10439 Killed nohup "$mybro" "$@"

==== stdout.log
max memory size (kbytes, -m) unlimited
data seg size (kbytes, -d) unlimited
virtual memory (kbytes, -v) unlimited
core file size (blocks, -c) unlimited

==== .cmdline
-U .status -p broctl -p broctl-live -p local -p logger local.bro broctl base/frameworks/cluster broctl/auto

==== .env_vars
PATH=/usr/local/bro/bin:/usr/local/bro/share/broctl/scripts:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bro/bin
BROPATH=/logs/bro/spool/installed-scripts-do-not-touch/site::/logs/bro/spool/installed-scripts-do-not-touch/auto:/usr/local/bro/share/bro:/usr/local/bro/share/bro/policy:/usr/local/bro/share/bro/site
CLUSTER_NODE=logger

==== .status
RUNNING [net_run]

==== No prof.log

==== No packet_filter.log

==== No loaded_scripts.log

Thoughts? Any suggestions.

Hi

Try using the None writer instead of the ASCII one.
In local.bro add :
redef Log::default_writer=Log::WRITER_NONE;

If the logger instance still crashes then the issue is not related to an IO bottleneck.

B

Looks like setting up 2 loggers resolved the issue of my logger crashing but my Dropped packets are pretty high on my workers. Can someone assist me with how I can reduce my dropped packets.

cat capture_loss.log

#separator \x09

#set_separator ,

#empty_field (empty)

#unset_field -

#path capture_loss

#open 2019-09-27-12-05-05

#fields ts ts_delta peer gaps acks percent_lost

#types time interval string count count double

1569600304.774215 900.000013 worker-1-1 126463 3246542 3.895314

1569600304.783703 900.000064 worker-1-3 106904 4465333 2.394088

1569600304.785983 900.000212 worker-1-11 123729 3768503 3.28324

1569600304.802244 900.000098 worker-1-14 144154 3584013 4.022139

1569600304.823378 900.000095 worker-1-18 137507 3503583 3.924754

1569600304.892559 900.000470 worker-1-13 148904 3448544 4.31788

1569600305.010986 900.000030 worker-1-8 174213 3409819 5.109157

1569600305.938686 901.043465 worker-1-15 509268 1072199 47.497526

1569600304.806850 900.000047 worker-1-22 591232 1234893 47.877185

1569601204.762382 900.000786 worker-1-16 120086 4491072 2.673883

1569601204.774220 900.000005 worker-1-1 127257 3461349 3.676515

1569601204.802447 900.000203 worker-1-14 125481 3171663 3.956316

1569601204.884438 900.000029 worker-1-19 125037 3566663 3.505714

1569601204.891746 900.000015 worker-1-23 120553 3078889 3.915471

1569601205.110098 900.000139 worker-1-10 108016 3442813 3.137434

1569601205.938906 900.000220 worker-1-15 565536 1156759 48.8897

1569601218.120290 900.000047 worker-1-6 456312 753749 60.538986

Below are some of my settings:

I have 23 workers defined and I pinned CPU.

[worker-1]

type=worker

host=localhost

interface=af_packet::ens2f0

lb_method=custom

#lb_method=pf_ring

lb_procs=23

pin_cpus=5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

af_packet_fanout_id=25

af_packet_fanout_mode=AF_Packet::FANOUT_HASH

Can someone assist me with this.

Thanks.

Hi,

Can you please share your entire node.cfg file ?

It looks like you’ve added 3 more workers. I would check if the CPUs you are pinning has a direct PCI lane to the NIC you are listening on.
Check the numa node the NIC is attached to and make sure you are pinning the correct CPUs first

B