Zeek logging suddenly stops

0x4A6F686E · January 17, 2023, 10:04am

Hi all,

I have zeek v5.1.1 running in a Debian container. The configuration is (almost) identical to an (older version) standalone setup. The standalone setup works just fine. However, within the container, logging from the workers stops suddenly and I cannot figure out why.

In my current directory there are only these files: stats.log stderr.log stdout.log telemetry.log
and no more logging from the workers. The workers are still running, zeekctl diag and zeekctl status show nothing extrodinary; all workers are running.

The only thing I found in the stderr logs was:
/usr/local/zeek/share/zeekctl/scripts/run-zeek: line 110: 430000 Segmentation fault nohup ${pin_command} $pin_cpu “myzeek" "@”
listening on pcap0

[broker/ERROR] 2023-01-16T16:35:14.213 unable to find a master for zeek/known/certs
[broker/ERROR] 2023-01-16T16:35:14.213 unable to find a master for zeek/known/hosts
[broker/ERROR] 2023-01-16T16:35:14.214 unable to find a master for zeek/known/services
[broker/ERROR] 2023-01-16T16:35:14.214 proxy 17 received an unexpected message: message(caf::sec::broken_promise)
(28 of these last messages, that is exactly the number of workers running on one NUMA/CPU where pcap0 is attached to).

Anyone any clue/hints how to solve or debug this?

Regards, John

awelzel · January 17, 2023, 4:18pm

Does logging initially work and then stop after some time, or are there no traffic related logs produced whatsoever? Which node names do you find in telemetry.log?

More questions: Do worker processes consume CPU after logging stopped? Do you see packets for the mirroring traffic when running tcpdump -n -i pcap0 -c 1024 within the container?

Do you know what’s segfaulting here?

We’ve had the following ticket for tracking logging issues for Zeek 5.1 / 5.0 #2389, but that should’ve been fixed with the Zeek 5.1.1 (PR in broker).

If you’re building from source: Could you double check that the auxil/broker subtree is at the v2.4.1 tag? If not, you might have missed a git submodule update --init --recursive somewhere to pick up the fix. Given you’re using/building with Docker, that seems unlikely though.

Thanks,
Arne

0x4A6F686E · January 18, 2023, 3:08pm

Hi Arne,

to answer your questions:

initially the logging comes in for several hours and then stops, that is the various logs per protocol or the conn.log are no longer updated. Only the stats.log and telemetry.log are written to.
In the telemetry.log only “logger-1” is present after all other logging has stopped.
yes, when there is no more logging, the workers still consume CPU time so they are doing something…
Data is still coming in on the NICs. As soon as I restart zeek, per protocol logging starts immediately.
Unfortunately I don’t know which process crashed; still trying to figure that out.
As you already mentioned/expected, I’m running auxil/broker version 2.4.1.

However, I found another interesting thing: there is a giant memory leak somewhere in one of our plugins,

You can see that Sunday afternoon, the OOM-killer kicked in. Please note that logging stopped way before the kernel kills processes, in this case it stopped Friday evening at 17:23 hrs because at that time I have the last worker entries in the telemetry.log and no more per protocol log entries.
On Monday morning I restarted with the same configuration and again memory usage grows quickly.
Yesterday/Tuesday I restarted Zeek without any plugins enabled (except for afpacket and the community-id). Now it is much more stable and memory usage increases just a little bit over time. Besides this, logging is still ok.

So, we will keep this running for another day or so and will then restart with half of the plugins enabled to see if the problem reoccurs. If not, trry the other half, etc. in order to find the one or ones that cause the trouble.
We had the following plugins enabled:
zeek/salesforce/ja3
zeek/corelight/cve-2021-44228
zeek/hosom/file-extraction
zeek/zeek/spicy-analyzers
zeek/corelight/zeek-spicy-ipsec
zeek/corelight/zeek-spicy-openvpn
zeek/corelight/zeek-spicy-wireguard

To be continued…

Regards, John

Topic		Replies	Views
Logging Issue after upgrade to LTS 5.0.0 Zeek	19	1230	May 9, 2023
Zeek 5.0 cluster setup Zeek	12	437	September 1, 2022
Zeek crashed Zeek	3	168	May 6, 2022
Zeek Not Capturing after upgrade to 5.2.0 Zeek	5	277	April 3, 2023
Zeek memory is increasing constantly Zeek	7	704	November 27, 2023

Zeek logging suddenly stops

Related topics