[JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

Hi,

I'm glad to hear that you're testing broctl on FreeBSD (I always
test on Linux). Here are my initial ideas:

How many hosts are in your cluster? (you mentioned "28 physical nodes",
does that mean 28 computers?!)

It is 28 computers, each running 3 bro worker processes with 2 more
physical machines running the master and proxies.

Are you running the git master version of broctl?

it is not quite master - it currently is running 5e2defe, so the state as
of March 13th.

Is every broctl command slow, or just status and top?

All the ones that I tried are slow. I can upgrade to master and test again
- I just wanted to ask if there is some way to debug what is going on
before restarting the cluster, since the problem took a few days to
manifest itself. Hence I probably will not be able to directly reproduce
it :slight_smile:

The broctl status command usually spends most of its time
waiting for broccoli. I've added a new option that you
can set in your etc/broctl.cfg file that will skip
the broccoli code so that broctl status runs much faster.
To enable this feature, make sure this line is in your
broctl.cfg file:
StatusCmdShowAll = 0
(after you add this, broctl will say that you have to run
either "install" or "deploy", but you don't actually
need to for this particular broctl option).

I added this (without running install / depoloy) and it now is now faster,
but still takes a while. I examined spool/debug.log a bit and it actually
seems that a significant period of time is spent getting the process status.
The timeline currently looks like this:

23 Mar 11:53:05 [broctl] status
23 Mar 11:53:05 [broctl] Getting process status ...
23 Mar 11:53:05 [execute] blade26: /xa/bro/master/share/broctl/scripts/helpers/check-pid 2513
[...] (many lines like this and many exit code lines)
23 Mar 11:54:07 [execute] blade15: exit code 0
23 Mar 11:54:07 [execute] blade26: /xa/bro/master/share/broctl/scripts/helpers/cat-file /xa/bro/master/spool/worker-26-0/.startup
[...]
23 Mar 11:54:09 [execute] blade15: exit code 0
23 Mar 11:54:09 [events] broccoli: Control::peer_status_request() to node worker-26-0
[...]
23 Mar 11:54:29 [events] broccoli: Control::peer_status_response(1427136868.812806 [...]
-> status output

Johanna

When you do a broctl status, does it show a status line for every Bro
node in your cluster?

Yes, it does. At least I think so, the number is quite large :slight_smile:

How are you running broctl status:
1) just by typing "broctl status", or
2) by running "broctl", then type the "status" command at the BroControl
prompt.

I run broctl first and then type status.

When you run "broctl status", it must establish an ssh session to
every remote machine, which could take awhile when there are 28
machines. However, when you run just "broctl", then type "status"
at the BroControl prompt, it keeps the ssh sessions open, so the 2nd
time you type "status" should be faster than the 1st time (because
the 2nd time it doesn't need to do the ssh connections).

There does not seem to be a big speed difference between the first time
and the second time status is run.

Johanna