Sure!
Zeek internally keeps a state for each connection - and quite possibly information in other data structures that your scripts created. If you have lots of events and each of them adds somehow data into those data structures, it might eventually eat all your RAM.
To give you an example. An intern of mine, a while ago, designed a brilliant script
event http_reply(c: connection, version: string, code: count,
reason: string) &priority=3
(…)
if (code >= 400) {
add c$http$tags[HTTP_ERROR];
SumStats::observe(“http.excessive_errors.attacker_badhits”, [$host=c$id$orig_h],
[$str=fmt("%d %s%s", code, c$http$host, c$http$uri)]);
SumStats::observe(“http.excessive_errors.victim_badhits”, [$host=c$id$resp_h],
[$str=fmt("%d %s%s", code, c$http$host, c$http$uri)]);
}
else if (code < 400) {
SumStats::observe(“http.excessive_errors.attacker_goodhits”, [$host=c$id$orig_h], []);
SumStats::observe(“http.excessive_errors.victim_goodhits”, [$host=c$id$resp_h], []);
This script will store, per inbound connection, a ratio of good vs bad HTTP transactions. That can be used to do a behaviour profiling of a client. A scanner would easily have mostly “bad hits” with return code >400.
This all worked well for 15 minutes and then 11 nodes x 64GB RAM each cluster ran out of memory. See, this cluster monitored (among other things) the Firefox update system, with 500 000 000 clients talking to it.
Why did it crash? Because the SumStats framework was adding data, into internal data structures, per connection.
Notice how there is technically no memory leak. Given 2TB of RAM this maybe would have worked.
Basically, if you have tons of connections (not simply bandwidth or packets) the amount of memory necessary to keep all of them might be simply too much.
Now, this data expires (unless you have a script that never does that), but it might be the amount of state grows too quickly and the expiration is not quick enough, to free up some memory.
My quick suspect would be the scan.bro / scan.zeek old script that comes bundled with Zeek. If you have it enabled, disable and see if you’re still crashing.
You can then take a look at your scripts and see if there is some data structure that will grow per connection, over time - and how quickly you purge data from it.