Bro cluster requirements and manager logging backlog bug

Hello all,

We are still having a problem with our Bro cluster and logging. During peak times the manager will slowly consume all available memory while the logs sent to disk are delayed by an hour or more.

Does anyone know the official bug ID for this within bro-tracker.atlassian.net ?

I’ve tracked this problem for a while now and tried all variations of the proposed fixes: the flare patch, the no-flare patch, segmented cluster with one manager per box, and an architecture change from Linux+PF_RING to FreeBSD+Myricom. Currently we are using a standard build of bro-2.5-beta in a cluster configuration with one dedicated manager and three dedicated sensors, each using both ports of a Myricom card with 22 workers attached to each port. ( 1 manager, 1 logger, 12 proxies, 6 worker nodes (22 procs each, 132 total).

Restarting the cluster on a regular basis is much easier without PF_RING but that’s only partially curing the symptom. In that regard the last proposed solution is the most expensive, using faster CPUs which will reduce the worker count. But will that really solve the problem ? I’m more interested in defining what the problem actually is.

FWIW there’s some text below to illustrate, the dates are somewhat old but it’s still a representative example.

21:05 UTC

  • Manager node is near out of memory… 2800 Mb left
  • Workers have moderate CPU usage, 60%
  • Logs on manager node are 25 minutes behind…
  • 21:05 vs 20:40
  • Initiated cluster restart at 21:06, completed at 21:11.

21:26 UTC

  • Workers have moderate CPU usage.

  • Logs are 16 minutes behind

Earlier the logs were roughly two hours behind.

[bro@mgr /opt/bro]$ date -r 1471373408 (most recent conn.log timestamp)
Tue Aug 16 18:50:08 UTC 2016

[bro@mgr /opt/bro]$ date
Tue Aug 16 20:43:45 UTC 2016

Bro manager process is using 70G of memory and the system is swapping:

last pid: 96557; load averages: 46.37, 53.09, 54.88 up 0+18:06:24 21:25:17
55 processes: 8 running, 47 sleeping
CPU: 7.7% user, 2.1% nice, 68.1% system, 0.2% interrupt, 21.9% idle
Mem: 103G Active, 2412M Inact, 19G Wired, 549M Cache, 331M Free
ARC: 15G Total, 89M MFU, 15G MRU, 29M Anon, 68M Header, 211M Other
Swap: 12G Total, 12G Used, 85M Free, 99% Inuse, 9248K In

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
7305 bro 34 20 0 40121M 39498M uwait 10 31.7H 280.27% bro
7337 bro 1 96 5 70653M 61577M CPU36 36 868:45 59.96% bro

Currently in this state the logs over two hours behind the current time.

bro@mgr:~ % date -r 1471374952 (most recent conn.log timestamp)
Tue Aug 16 19:15:52 UTC 2016

bro@mgr:~ % date
Tue Aug 16 21:27:04 UTC 2016

Memory usage over the past week:

memory-week.png

You're saying "the manager" but do you mean "the manager node" or "the manager process"?

With the added logger process the manager process does not have anything to do with logs.

The last time you mentioned these issues the logger node capability did not exist yet. A lot has changed since then but the logs you show are from 4 months ago.

We need to see what this command outputs when your cluster is having log issues:

broctl top manager logger

Also, you've never mentioned the actual rate of logs you are seeing at these peak times

Running this in your log directory would help:

du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l

>
> Hello all,
>
>
> We are still having a problem with our Bro cluster and logging. During
peak times the manager will slowly consume all available memory while the
logs sent to disk are delayed by an hour or more.

You're saying "the manager" but do you mean "the manager node" or "the
manager process"?

The manager node.

With the added logger process the manager process does not have anything
to do with logs.

The last time you mentioned these issues the logger node capability did not

exist yet. A lot has changed since then but the logs you show are from 4
months ago.

The email was a saved draft accidentally sent before finishing the edit.
I've been using the logger process since October. As of yesterday I'm
using the latest Bro-2.5 with a default local.bro file.

We need to see what this command outputs when your cluster is having log
issues:

broctl top manager logger

Ok. Today I find this:

[bro@mgr /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.232.234.36 52852 parent 109G 100G 0% bro
logger logger 169.232.234.36 52867 child 837M 498M 0% bro
manager manager 169.232.234.36 52935 child 485M 17M 0% bro
manager manager 169.232.234.36 52892 parent 2G 557M 0% bro

In this condition all the workers are at 100% CPU and the worker nodes have
all 128GB RAM used. The manager node had to be rebooted as "killall -9
bro" had no effect. This is what happens if Bro isn't restarted every 30
minutes.

Also, you've never mentioned the actual rate of logs you are seeing at
these peak times

Running this in your log directory would help:

du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l

[bro@mgr /opt/bro_data/logs/current]$ du -ms;cat *|wc -l;sleep 60;du
-ms;cat *|wc -l
56 .
  789695
220 .
2801719

Bro was started @ Tue Dec 20 18:39:26 UTC 2016, the command below was run a
minute later.

[bro@mgr /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.232.234.36 18832 parent 289M 139M 194% bro
logger logger 169.232.234.36 18874 child 485M 78M 36% bro
manager manager 169.232.234.36 18905 parent 530M 384M 72% bro
manager manager 169.232.234.36 18947 child 510M 231M 51% bro

@ Tue Dec 20 18:46:48 UTC 2016 already the logger has 5G memory:

[bro@mgr /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.232.234.36 18832 parent 5G 5G 192% bro
logger logger 169.232.234.36 18874 child 1G 1G 58% bro
manager manager 169.232.234.36 18947 child 510M 255M 55% bro
manager manager 169.232.234.36 18905 parent 11G 1G 25% bro

[bro@mgr /opt/bro_data/logs/current]$ du -ms;cat *|wc -l;sleep 60;du
-ms;cat *|wc -l
593 .
7117478
809 .
9573974

@ Tue Dec 20 18:51:05 UTC 2016 the logger has 10G memory and the manager
has increased by 5G as well

[bro@mgr /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.232.234.36 18832 parent 10G 10G 222% bro
logger logger 169.232.234.36 18874 child 3G 3G 64% bro
manager manager 169.232.234.36 18947 child 510M 255M 65% bro
manager manager 169.232.234.36 18905 parent 16G 1G 23% bro

[bro@mgr /opt/bro_data/logs/current]$ du -ms;cat *|wc -l;sleep 60;du
-ms;cat *|wc -l
1346 .
15357570
1623 .
18708418

Back from lunch and the cluster is essentially crashed.

@ Tue Dec 20 19:03:48 UTC 2016

[bro@mgr /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.232.234.36 18832 parent 23G 22G 171% bro
logger logger 169.232.234.36 18874 child 11G 10G 58% bro
manager manager 169.232.234.36 18947 child 510M 255M 54% bro
manager manager 169.232.234.36 18905 parent 23G 2G 19% bro

[bro@mgr /opt/bro_data/logs/current]$ du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l
1096 .
17483890
1393 .
20633538

It’s about ~55K EPS, but probably more since logs are buffering in memory.

@ Tue Dec 20 20:12:47 UTC 2016

[bro@mgr /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.232.234.36 18832 parent 67G 20G 0% bro
logger logger 169.232.234.36 18874 child 44G 24G 0% bro
manager manager 169.232.234.36 18947 child 510M 249M 99% bro
manager manager 169.232.234.36 18905 parent 18G 798M 0% bro

last pid: 42312; load averages: 4.50, 5.40, 20.50 up 0+02:27:46 20:12:58
63 processes: 5 running, 53 sleeping, 1 zombie, 4 waiting
CPU: 2.3% user, 0.2% nice, 13.7% system, 0.0% interrupt, 83.8% idle
Mem: 104G Active, 3328M Inact, 17G Wired, 340M Cache, 180M Free
ARC: 15G Total, 7820M MFU, 7921M MRU, 16K Anon, 46M Header, 44M Other
Swap: 12G Total, 12G Used, K Free, 100% Inuse

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
18947 bro 1 108 5 510M 249M CPU40 40 55:54 100.00% bro
18832 bro 36 20 0 69454M 21145M uwait 44 129:19 0.00% bro
18874 bro 1 52 5 45265M 25543M pfault 1 42:47 0.00% bro
18905 bro 7 20 0 19346M 798M uwait 13 16:27 0.00% bro

[bro@mgr /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.232.234.36 52852 parent 109G 100G 0% bro
logger logger 169.232.234.36 52867 child 837M 498M 0% bro
manager manager 169.232.234.36 52935 child 485M 17M 0% bro
manager manager 169.232.234.36 52892 parent 2G 557M 0% bro

In this condition all the workers are at 100% CPU and the worker nodes have all 128GB RAM used. The manager node had to be rebooted as "killall -9 bro" had no effect. This is what happens if Bro isn't restarted every 30 minutes.

This output with the cpu at 0 is kind of odd, unless it was already swapping or something.

Also, you've never mentioned the actual rate of logs you are seeing at these peak times

Running this in your log directory would help:

du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l

[bro@mgr /opt/bro_data/logs/current]$ du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l
56 .
  789695
220 .
2801719

So this shows only 33k logs/sec and 3MB/sec

@ Tue Dec 20 18:46:48 UTC 2016 already the logger has 5G memory:

[bro@mgr /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.232.234.36 18832 parent 5G 5G 192% bro
logger logger 169.232.234.36 18874 child 1G 1G 58% bro
manager manager 169.232.234.36 18947 child 510M 255M 55% bro
manager manager 169.232.234.36 18905 parent 11G 1G 25% bro

This shows that your logger process seems to just have issues keeping up with the volume...

[bro@mgr /opt/bro_data/logs/current]$ du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l
593 .
7117478
809 .
9573974

but based on this you are only doing 40k logs/sec and 4 MB/sec and shouldn't really be having issues. We have users doing over 200k/sec.

Can you check the following:

after bro has been running for a bit:

    wc -l *.log | sort -n

to show which log files are the largest

the output of this command:

    top -b -n 1 -H -o TIME |grep bro:|head -n 20

or just run top and press H. That should show all the bro logging threads (it works on linux at least) They may show up truncated but it's enough to tell them apart.

What model/count CPU does your manager have?

Are you writing out logs as the default ascii or using json?

> In this condition all the workers are at 100% CPU and the worker nodes
have all 128GB RAM used. The manager node had to be rebooted as "killall
-9 bro" had no effect. This is what happens if Bro isn't restarted every
30 minutes.

This output with the cpu at 0 is kind of odd, unless it was already
swapping or something.

Yes, it was already swapping.

Can you check the following:

after bro has been running for a bit:

    wc -l *.log | sort -n

to show which log files are the largest

[bro@mgr /opt/bro_data/logs/current]$ wc -l *.log | sort -n
       1 stderr.log
       4 stdout.log
       8 packet_filter.log
       9 stats.log
      24 intel.log
      37 pe.log
      38 kerberos.log
      50 irc.log
      53 dce_rpc.log
      78 radius.log
     104 ftp.log
     332 mysql.log
     355 rfb.log
     416 snmp.log
     548 reporter.log
     841 notice.log
    1202 dpd.log
    2090 tunnel.log
    5515 known_certs.log
    7808 rdp.log
   15737 communication.log
   16393 smtp.log
   19713 known_services.log
   26794 ssh.log
   38487 sip.log
  101981 known_hosts.log
  106543 software.log
  492146 x509.log
  714576 ssl.log
  795818 dns.log
  886360 http.log
  985519 weird.log
1936147 files.log
5121874 conn.log
11277601 total

the output of this command:

    top -b -n 1 -H -o TIME |grep bro:|head -n 20

or just run top and press H. That should show all the bro logging threads
(it works on linux at least) They may show up truncated but it's enough to
tell them apart.

[bro@mgr /opt/bro_data/logs/current]$ top -n -H -o time | grep bro
5672 bro 100 10 21076K 2796K RUN 19 6:35 62.79% gzip
5858 bro 95 5 510M 257M RUN 8 3:26 61.38% bro
5785 bro 95 5 2373M 2058M CPU11 11 3:20 59.08% bro
5743 bro 87 0 7897M 7743M RUN 14 3:19 52.78% bro{bro}
5743 bro 88 0 7897M 7743M CPU0 0 3:18 55.18% bro{bro}
5816 bro 40 0 5298M 1158M nanslp 23 1:59 23.29% bro{bro}
5743 bro 37 0 7897M 7743M uwait 4 1:32 23.58% bro{bro}

What model/count CPU does your manager have?

Four of these with 32 GB per NUMA node.

Processor Information
        Socket Designation: CPU1
        Type: Central Processor
        Family: Opteron 6200
        Manufacturer: AMD
        ID: 12 0F 60 00 FF FB 8B 17
        Signature: Family 21, Model 1, Stepping 2
        Flags:
                 (...)
        Version: AMD Opteron(TM) Processor 6238
        Voltage: 1.2 V
        External Clock: 3200 MHz
        Max Speed: 3600 MHz
        Current Speed: 2600 MHz
        Status: Populated, Enabled
        Upgrade: Socket AM3
        L1 Cache Handle: 0x0700
        L2 Cache Handle: 0x0701
        L3 Cache Handle: 0x0702
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified
        Core Count: 12
        Core Enabled: 12
        Thread Count: 12

Are you writing out logs as the default ascii or using json?

The default Ascii.

[bro@mgr /opt/bro_data/logs/current]$ top -n -H -o time | grep bro
5672 bro 100 10 21076K 2796K RUN 19 6:35 62.79% gzip
5858 bro 95 5 510M 257M RUN 8 3:26 61.38% bro
5785 bro 95 5 2373M 2058M CPU11 11 3:20 59.08% bro
5743 bro 87 0 7897M 7743M RUN 14 3:19 52.78% bro{bro}
5743 bro 88 0 7897M 7743M CPU0 0 3:18 55.18% bro{bro}
5816 bro 40 0 5298M 1158M nanslp 23 1:59 23.29% bro{bro}
5743 bro 37 0 7897M 7743M uwait 4 1:32 23.58% bro{bro}

Here are the associated mappings for the PIDs:

[bro@f01 /opt/bro]$ bin/broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger 169.231.234.36 5743 parent 9G 9G 194% bro
logger logger 169.231.234.36 5785 child 3G 2G 60% bro
manager manager 169.231.234.36 5858 child 510M 257M 63% bro
manager manager 169.231.234.36 5816 parent 5G 1G 18% bro

Can you check the following:

after bro has been running for a bit:

    wc -l *.log | sort -n

to show which log files are the largest

[bro@mgr /opt/bro_data/logs/current]$ wc -l *.log | sort -n
  101981 known_hosts.log
  106543 software.log
  492146 x509.log
  714576 ssl.log
  795818 dns.log
  886360 http.log
  985519 weird.log
1936147 files.log
5121874 conn.log
11277601 total

Your weird.log is 1/5 the size of your conn.log and larger than your http log.. You should look into which weird name is showing up so much, you may have a serious problem with your tap configuration. That is not normal at all.

the output of this command:

    top -b -n 1 -H -o TIME |grep bro:|head -n 20

or just run top and press H. That should show all the bro logging threads (it works on linux at least) They may show up truncated but it's enough to tell them apart.

[bro@mgr /opt/bro_data/logs/current]$ top -n -H -o time | grep bro
5672 bro 100 10 21076K 2796K RUN 19 6:35 62.79% gzip
5858 bro 95 5 510M 257M RUN 8 3:26 61.38% bro
5785 bro 95 5 2373M 2058M CPU11 11 3:20 59.08% bro
5743 bro 87 0 7897M 7743M RUN 14 3:19 52.78% bro{bro}
5743 bro 88 0 7897M 7743M CPU0 0 3:18 55.18% bro{bro}
5816 bro 40 0 5298M 1158M nanslp 23 1:59 23.29% bro{bro}
5743 bro 37 0 7897M 7743M uwait 4 1:32 23.58% bro{bro}

Ah, I guess that doesn't work on freebsd, It would have output thread names like

bro: conn/Log
bro: dns/Log

It looks like the bulk of your logging load is coming from files+conn and weird though, so what you can do is cut down the volume of those logs to get your cpu to be happy.

What model/count CPU does your manager have?

Four of these with 32 GB per NUMA node.

Processor Information
        Socket Designation: CPU1
        Type: Central Processor
        Family: Opteron 6200

Ah!!! This is part of your problem. Every site we have worked with in the past year or so that was having serious manager performance issues was using the crazy high core count AMD systems. While they perform well when you have a heavily threaded task (And I bet they do, we have an entire supercomputer filled with 40,000 of them) the bro logger only has few heavyweight threads and just does not work well on these processors.

That said, you can probably get this working acceptably though. There are two options for this:

* filter some noisy log lines, which will cause them to not be logged at all.
* split heavy streams into multiple log files, which will let the logger process dedicate a logging thread to each part.

I would start by trying some of these config fragments that split log files apart:

# Split files log into files and files_certs log
event bro_init()
{
    Log::remove_default_filter(Files::LOG);
    Log::add_filter(Files::LOG, [
        $name = "files-split",
        $path_func(id: Log::ID, path: string, rec: Files::Info) = {
            if (rec?$mime_type && rec$mime_type == "application/pkix-cert")
                return "files_certs";
            return "files";
        }
    ]);
}

(you can probably just ignore those lines completely since the x509 log is more useful)

#Split conn into conn and conn_dns
event bro_init()
{

    Log::remove_default_filter(Conn::LOG);
    Log::add_filter(Conn::LOG, [
        $name = "conn-split",
        $path_func(id: Log::ID, path: string, rec: Conn::Info) = {
            if (rec?$service && "dns" in rec$service)
                return "conn_dns";
       
            return "conn";
        }
    ]);
}

#Split http.log into directions
event bro_init()
{
    Log::remove_default_filter(HTTP::LOG);
    Log::add_filter(HTTP::LOG, [
        $name = "http-directions",
        $path_func(id: Log::ID, path: string, rec: HTTP::Info) = {
            local l = Site::is_local_addr(rec$id$orig_h);
            local r = Site::is_local_addr(rec$id$resp_h);

            if(l && r)
                return "http_internal";
            if (l)
                return "http_outbound";
            else
                return "http_inbound";
        }
    ]);
}

You could also do the directions thing for the conn.log as well.

If your network is anything like ours, your conn.log is 90% scan attempts to tcp port 23 from IoT devices, splitting that out to a separate log file of filtering it entirely would probably help more than anything.

You can also take a look at the filter* scripts that are at https://github.com/michalpurzynski/bro-gramming

Your weird.log is 1/5 the size of your conn.log and larger than your http
log.. You should look into which weird name is showing up so much, you may
have a serious problem with your tap configuration. That is not normal at
all.

I'm going to guess that the messages relate to packet loss on the workers
which is currently 3%-10% per worker process. Adding another worker node
should fix it.

AFAIK the tap is working fine. In my experience the only weird message
that corresponded to tap issues was "possible_split_routing".

[bro@mgr /opt/bro_data/logs/current]$ awk '{print $7}' weird.log | sort |
uniq -c | sort -rn | head -25
243077 dns_unmatched_msg
205444 SYN_seq_jump
183857 bad_HTTP_request
54153 window_recision
42960 bad_UDP_checksum
20681 dns_unmatched_reply
14280 data_before_established
13048 possible_split_routing
9717 DNS_RR_unknown_type
4310 line_terminated_with_single_CR
4121 active_connection_reuse
1676 TCP_ack_underflow_or_misorder
1490 connection_originator_SYN_ack
1226 SYN_with_data
925 TCP_seq_underflow_or_misorder
862 above_hole_data_without_any_acks
727 Teredo_bubble_with_payload
706 HTTP_version_mismatch
685 data_after_reset
677 unknown_HTTP_method
626 unexpected_multiple_HTTP_requests
584 inappropriate_FIN
566 empty_http_request
511 bad_TCP_checksum
373 inflate_failed

It looks like the bulk of your logging load is coming from files+conn and
weird though, so what you can do is cut down the volume of those logs to
get your cpu to be happy.

>
> What model/count CPU does your manager have?
>
>
> Four of these with 32 GB per NUMA node.
>
> Processor Information
> Socket Designation: CPU1
> Type: Central Processor
> Family: Opteron 6200

Ah!!! This is part of your problem. Every site we have worked with in the
past year or so that was having serious manager performance issues was
using the crazy high core count AMD systems. While they perform well when
you have a heavily threaded task (And I bet they do, we have an entire
supercomputer filled with 40,000 of them) the bro logger only has few
heavyweight threads and just does not work well on these processors.

That said, you can probably get this working acceptably though. There are
two options for this:

* filter some noisy log lines, which will cause them to not be logged at
all.
* split heavy streams into multiple log files, which will let the logger
process dedicate a logging thread to each part.

I would start by trying some of these config fragments that split log
files apart:

Thanks for the help, I'm going to give your suggestions a try.

Thanks for the help, I'm going to give your suggestions a try.

Unfortunately I wasn't able to stabilize the cluster. I tried splitting
the conn log into six different types, inbound(dns,http,other} and
outbound{dns,http,other} in addition to the http inbound/outbound split but
the logger process continues to buffer about 1G of memory per minute.

Short of a re-write of the logging process the only option is to upgrade
CPUs ? I tried running more than one logger but that doesn't seem to work.

Maybe streaming logs via Kafka and disabling writing to disk has a chance.

Thanks for the help, I'm going to give your suggestions a try.

Unfortunately I wasn't able to stabilize the cluster. I tried splitting the conn log into six different types, inbound(dns,http,other} and outbound{dns,http,other} in addition to the http inbound/outbound split but the logger process continues to buffer about 1G of memory per minute.

Short of a re-write of the logging process the only option is to upgrade CPUs ? I tried running more than one logger but that doesn't seem to work.

There may be some inefficiencies in the thread queuing code the logger uses, but the only people that seem to have these major issues have the slow AMD cpus.

Multiple loggers is something we hope to add once broker is integrated. There's a few places I hope to be able to do some sort of consistent ring hashing to scale out different tasks. Many tasks in bro are easily partitioned, like logging and sumstats.

Maybe streaming logs via Kafka and disabling writing to disk has a chance.

Ah! if that is your end goal, you could try looking into having your workers write directly to kafka and bypass the manager entirely.

There may be some inefficiencies in the thread queuing code the logger
uses, but the only people that seem to have these major issues have the
slow AMD cpus.

Multiple loggers is something we hope to add once broker is integrated.
There's a few places I hope to be able to do some sort of consistent ring
hashing to scale out different tasks. Many tasks in bro are easily
partitioned, like logging and sumstats.

I wasn't implying poor code just code not optimized for our deployment.
Maybe the multiple logger approach would do it but in the meanwhile I'm
looking for a quick fix.

> Maybe streaming logs via Kafka and disabling writing to disk has a
chance.

Ah! if that is your end goal, you could try looking into having your
workers write directly to kafka and bypass the manager entirely.

I thought there was some degree of normalization that occurred at the
manager node ? Would having workers write directly to Kafka limit any
features of Bro ?

What you are saying sounds like using Kafka on the manager isn't going to
fix anything as it will encounter the same resource bottleneck.

Here's the config I was going to use:

# Kafka output
#@load logs-to-kafka.bro
#redef Kafka::logs_to_send = set(Conn::LOG, HTTP::LOG etc...);
#redef Kafka::kafka_conf = table(
# ["metadata.broker.list"] = "10.1.1.1:9092"
#);

There may be some inefficiencies in the thread queuing code the logger uses, but the only people that seem to have these major issues have the slow AMD cpus.

Multiple loggers is something we hope to add once broker is integrated. There's a few places I hope to be able to do some sort of consistent ring hashing to scale out different tasks. Many tasks in bro are easily partitioned, like logging and sumstats.

I wasn't implying poor code just code not optimized for our deployment. Maybe the multiple logger approach would do it but in the meanwhile I'm looking for a quick fix.

Yeah.. mostly it's just not designed or optimized for systems that have 48 slow cores.. It's possible that broker may end up performing a bit better (since it uses threads for more things) but at some point we would still need to scale out to multiple machines anyway.

> Maybe streaming logs via Kafka and disabling writing to disk has a chance.

Ah! if that is your end goal, you could try looking into having your workers write directly to kafka and bypass the manager entirely.

I thought there was some degree of normalization that occurred at the manager node ? Would having workers write directly to Kafka limit any features of Bro ?

Not really, the main thing it does is log aggregation and rotation, which if you are sending to kafka you don't really need.

What you are saying sounds like using Kafka on the manager isn't going to fix anything as it will encounter the same resource bottleneck.

Correct, it would still go through the same single receiving process.

It's probably not that much work (for kafka use cases at least) to be able to run two(or four) logger processes.
It's really just a matter of running more of them on different ports and updating the cluster layout so half the workers have one port and half the workers have the other port (broctl already does this kind of thing when you configure multiple proxy nodes).

It's a little more complicated for non kafka uses cases because you would end up logger-1/conn.log logger-2/conn.log and log rotation wouldn't work right.

I think getting it to work for the kafka use cases may only require a few lines of code to be changed (basically do for logger nodes what broctl already does for proxy nodes)

I think getting it to work for the kafka use cases may only require a few
lines of code to be changed (basically do for logger nodes what broctl
already does for proxy nodes)

I don't know how to do that.

I think I'm confused between what I can do at the script layer vs modifying
something in the base. Are you referring to
base/frameworks/cluster/setup-connections.bro ?

Are you suggesting I override the single logger limit and have each logger
export via Kafka ? I guess one logger process for each worker node used by
all workers local to that node.

Do I need to build a custom cluster-layout to do this ?

I can see a number of knobs that could make it happen but I don't know how
to go about scripting it. I think it would:

- disable logging to manager (done automatically by having a logger node)
- bypass the single logger limit
- configure each logger to have a writer::kafka
- disable other writers if necessary
- check if the local worker is part of the same node for the local logger
(based on IP address I guess) and use that as a filter for the
worker2logger events

Starting from bro_init () I don't know how to do this or if it can be done
in conjunction with node.cfg or a custom-layout.bro.

Or via local-logger.bro ?

Do I need to build a custom cluster-layout to do this ?

For a proof of concept this is really the only thing that needs to be done. I think broctl has one check to prevent you from configuring two loggers. If that was removed and the code that generates cluster layout was amended to use more than one (like it does for proxies) I think it would mostly just work.

Ok. Do you know offhand what file I would look into to make that change ? Also, after creating multiple loggers how would I make each one disable local logging and instead use a kafka export ?

Someone that does this now using the kafka plugin could answer better, but I think it's a matter of using a small script. The kafka plugin comes with a script like this:

event bro_init() &priority=-5
{
    for (stream_id in Log::active_streams)
    {
        if (stream_id in Kafka::logs_to_send)
        {
            local filter: Log::Filter = [
                $name = fmt("kafka-%s", stream_id),
                $writer = Log::WRITER_KAFKAWRITER,
                $config = table(["stream_id"] = fmt("%s", stream_id))
            ];

            Log::add_filter(stream_id, filter);
        }
    }
}

I think you would change it to be something like

event bro_init() &priority=-5
{
    for (stream_id in Log::active_streams)
    {
            local filter: Log::Filter = [
                $name = fmt("kafka-%s", stream_id),
                $writer = Log::WRITER_KAFKAWRITER,
                $config = table(["stream_id"] = fmt("%s", stream_id))
            ];

            Log::remove_default_filter(stream_id)
            Log::add_filter(stream_id, filter);
        }
}