One thing I notice is that the traffic to/from the manager box and cpu has increased quite a bit.
Ignoring the large CPU spikes from building the new branch just before the switch at 15:30, the overall cpu load on the manager box is about 3x higher.
The bandwidth isn't terribly excessive, but it increased from about 16mbit to 40mbit (stupid graph is in mebibytes).
Based on some capstats runs, it's all going to the logger port 47761. I have another graph that shows the total log volume being written to disk and that hasn't changed.
So it looks like this branch uses 2x the cpu and bandwidth to write the same volume of logs.
Interesting, I would have thought maybe the manager could be utilized
more, though not the logger. Will have to think about that.
Yeah.. it's definitely worker -> logger:
logger=47761, manager=47762, proxy=47763
[root@bro-test ~]# capstats -i enp3s0f0 -I 1 -f 'port 47761' -n 5
1520553009.488934 pkts=1173 kpps=1.1 kbytes=2837 mbps=22.6 nic_pkts=1175 nic_drops=0 u=0 t=1173 i=0 o=0 nonip=0
1520553010.489053 pkts=1280 kpps=1.3 kbytes=3218 mbps=26.4 nic_pkts=2455 nic_drops=0 u=0 t=1280 i=0 o=0 nonip=0
1520553011.489177 pkts=1086 kpps=1.1 kbytes=2912 mbps=23.9 nic_pkts=3541 nic_drops=0 u=0 t=1086 i=0 o=0 nonip=0
1520553012.489310 pkts=1382 kpps=1.4 kbytes=3757 mbps=30.8 nic_pkts=4923 nic_drops=0 u=0 t=1382 i=0 o=0 nonip=0
1520553013.489471 pkts=1563 kpps=1.6 kbytes=3816 mbps=31.3 nic_pkts=6486 nic_drops=0 u=0 t=1563 i=0 o=0 nonip=0
[root@bro-test ~]# capstats -i enp3s0f0 -I 1 -f 'port 47762' -n 5
1520553018.874154 pkts=346 kpps=0.3 kbytes=92 mbps=0.7 nic_pkts=348 nic_drops=0 u=0 t=346 i=0 o=0 nonip=0
1520553019.875111 pkts=1044 kpps=1.0 kbytes=283 mbps=2.3 nic_pkts=1391 nic_drops=0 u=0 t=1044 i=0 o=0 nonip=0
1520553020.875448 pkts=759 kpps=0.8 kbytes=199 mbps=1.6 nic_pkts=2150 nic_drops=0 u=0 t=759 i=0 o=0 nonip=0
1520553021.875575 pkts=654 kpps=0.7 kbytes=196 mbps=1.6 nic_pkts=2804 nic_drops=0 u=0 t=654 i=0 o=0 nonip=0
1520553022.875920 pkts=470 kpps=0.5 kbytes=123 mbps=1.0 nic_pkts=3274 nic_drops=0 u=0 t=470 i=0 o=0 nonip=0
[root@bro-test ~]# capstats -i enp3s0f0 -I 1 -f 'port 47763' -n 5
1520553025.700919 pkts=26 kpps=0.0 kbytes=8 mbps=0.1 nic_pkts=31 nic_drops=0 u=0 t=26 i=0 o=0 nonip=0
1520553026.701047 pkts=27 kpps=0.0 kbytes=9 mbps=0.1 nic_pkts=58 nic_drops=0 u=0 t=27 i=0 o=0 nonip=0
1520553027.701185 pkts=24 kpps=0.0 kbytes=8 mbps=0.1 nic_pkts=82 nic_drops=0 u=0 t=24 i=0 o=0 nonip=0
1520553028.701306 pkts=55 kpps=0.1 kbytes=18 mbps=0.2 nic_pkts=137 nic_drops=0 u=0 t=55 i=0 o=0 nonip=0
1520553029.701434 pkts=9 kpps=0.0 kbytes=3 mbps=0.0 nic_pkts=146 nic_drops=0 u=0 t=9 i=0 o=0 nonip=0
[root@bro-test ~]#
On another cluster seeing the same traffic with 56 workers running 2.5.3ish shows this for the logger port at the same time:
1520553725.875230 pkts=681 kpps=0.7 kbytes=1946 mbps=15.5 nic_pkts=768 nic_drops=0 u=0 t=681 i=0 o=0 nonip=0
1520553726.875374 pkts=508 kpps=0.5 kbytes=1605 mbps=13.1 nic_pkts=1392 nic_drops=0 u=0 t=508 i=0 o=0 nonip=0
1520553727.875520 pkts=554 kpps=0.6 kbytes=1681 mbps=13.8 nic_pkts=2247 nic_drops=0 u=0 t=554 i=0 o=0 nonip=0
1520553728.875713 pkts=536 kpps=0.5 kbytes=1708 mbps=14.0 nic_pkts=2902 nic_drops=0 u=0 t=536 i=0 o=0 nonip=0
1520553729.875845 pkts=518 kpps=0.5 kbytes=1669 mbps=13.7 nic_pkts=3641 nic_drops=0 u=0 t=518 i=0 o=0 nonip=0
[root@bro-test ~]# broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger bro-test 2016 parent 938M 283M 81% bro
manager manager bro-test 2107 parent 619M 244M 25% bro
[root@bro-test ~]#
[root@bro-dev ~]# broctl top manager logger
Name Type Host Pid Proc VSize Rss Cpu Cmd
logger logger bro-dev 10704 child 492M 150M 6% bro
logger logger bro-dev 10571 parent 1G 221M 0% bro
manager manager bro-dev 10734 parent 909M 497M 0% bro
manager manager bro-dev 10782 child 482M 110M 0% bro
cpus are about the same, test has X5650 @ 2.67GHz and dev has E5-2420 @ 1.90GHz
according to passmark, the older X5650 has a slightly faster single core performance.
An interesting experiment you could try is switching to an alternate
implementation of the Known scripts. E.g. stick this in local.bro:
redef Known::use_host_store = F;
redef Known::use_cert_store = F;
redef Known::use_device_store = F;
redef Known::use_service_store = F;
I tried doing that for an unrelated reason.. but I'm not sure if it works right.
You can't redef Known::use_service_store before the script is loaded, because otherwise you get
Can't document redef of Known::use_service_store, identifier lookup failed
but if you load the script first, the
@if ( Known::use_service_store )
block is already evaluated before you change the value.
[jazoff@bro-test ~]$ cat foo.bro
@load protocols/conn/known-services
redef Known::use_service_store=F;
event bro_init()
{
print Known::services;
}
[jazoff@bro-test ~]$ bro foo.bro
error in ./foo.bro, line 6: unknown identifier Known::services, at or near "Known::services"
[jazoff@bro-test ~]$ cat foo.bro
redef Known::use_service_store=F;
@load protocols/conn/known-services
event bro_init()
{
print Known::services;
}
[jazoff@bro-test ~]$ bro foo.bro
error in ./foo.bro, line 1: "redef" used but not previously defined (Known::use_service_store)
internal warning in ./foo.bro, line 1: Can't document redef of Known::use_service_store, identifier lookup failed
error in ./foo.bro, line 1 and /srv/bro/share/bro/policy/protocols/conn/known-services.bro, line 34: already defined (Known::use_service_store)
error in /srv/bro/share/bro/policy/protocols/conn/known-services.bro, line 40: value used but not set (Known::use_service_store)
error in /srv/bro/share/bro/policy/protocols/conn/known-services.bro, line 40: invalid expression in @if (Known::use_service_store)
error in /srv/bro/share/bro/policy/protocols/conn/known-services.bro, line 93: value used but not set (Known::use_service_store)
error in /srv/bro/share/bro/policy/protocols/conn/known-services.bro, line 93: invalid expression in @if (Known::use_service_store)