We've set up a Zeek cluster (version 4.1.1) with 8 worker nodes and a manager node (which is also the logger and the proxy). All nodes are on the same physical rack and configured to be on the same subnet. We have an issue where the zeek cron job intermittently reports that one (or a few) hosts are down. Within 5 minutes when the cron job runs again, we get a mail saying that the hosts are back up. There doesn't seem to be any notable reasons for this behavior. We've checked all settings from the firewall rules to increasing the connection timeout. The CPU and memory usage seems fine too. Whenever 'zeekctl status' is run manually, the output shows all nodes to be working and the logs are indeed being generated.
The exact same hardware (and network architecture) had been running Bro (version 2.5.4) for 2+ years without any issues. While we used to see such alert emails once a month, we now see them as frequently as 5 times a day. It would be great if someone can help us diagnose this issue.
Thanks and Regards