Hi,
I'm glad to hear that you're testing broctl on FreeBSD (I always
test on Linux). Here are my initial ideas:
How many hosts are in your cluster? (you mentioned "28 physical nodes",
does that mean 28 computers?!)
It is 28 computers, each running 3 bro worker processes with 2 more
physical machines running the master and proxies.
Are you running the git master version of broctl?
it is not quite master - it currently is running 5e2defe, so the state as
of March 13th.
Is every broctl command slow, or just status and top?
All the ones that I tried are slow. I can upgrade to master and test again
- I just wanted to ask if there is some way to debug what is going on
before restarting the cluster, since the problem took a few days to
manifest itself. Hence I probably will not be able to directly reproduce
it
The broctl status command usually spends most of its time
waiting for broccoli. I've added a new option that you
can set in your etc/broctl.cfg file that will skip
the broccoli code so that broctl status runs much faster.
To enable this feature, make sure this line is in your
broctl.cfg file:
StatusCmdShowAll = 0
(after you add this, broctl will say that you have to run
either "install" or "deploy", but you don't actually
need to for this particular broctl option).
I added this (without running install / depoloy) and it now is now faster,
but still takes a while. I examined spool/debug.log a bit and it actually
seems that a significant period of time is spent getting the process status.
The timeline currently looks like this:
23 Mar 11:53:05 [broctl] status
23 Mar 11:53:05 [broctl] Getting process status ...
23 Mar 11:53:05 [execute] blade26: /xa/bro/master/share/broctl/scripts/helpers/check-pid 2513
[...] (many lines like this and many exit code lines)
23 Mar 11:54:07 [execute] blade15: exit code 0
23 Mar 11:54:07 [execute] blade26: /xa/bro/master/share/broctl/scripts/helpers/cat-file /xa/bro/master/spool/worker-26-0/.startup
[...]
23 Mar 11:54:09 [execute] blade15: exit code 0
23 Mar 11:54:09 [events] broccoli: Control::peer_status_request() to node worker-26-0
[...]
23 Mar 11:54:29 [events] broccoli: Control::peer_status_response(1427136868.812806 [...]
-> status output
Johanna