We use Zeek to monitor internal company network. The traffic is mirrored at firewall between user segment and services segment (1 Gbps link which is normally around 40% in use). Today the packet drop percent in Zeek suddenly shot from 5% to 60% for a brief period of time. Upon looking at flow records, it seems most of the traffic was between a repository server and multiple client machines. Further analysis with Zeek logs indicated that multiple users began downloading a newly hosted ISO (around 2GB) at the same time from the server. This led to link utilization spiking to 85% and Zeek dropping 60% of traffic.
Normally I would handle this by writing a capture filter that would exclude traffic to the repo server entirely. But we still need to know the kind of accesses that happen to that server, so a capture filter can’t be used.
We also face this problem when some users copy large data over SSH or SMB.
Is there a way to configure Zeek so that it logs basic connection/protocol information (from initial packets or headers) and then ignores subsequent traffic? Is there some ther way to manage packet drops due to such spikes in particular connections. I am already using 7 workers on the node and have found that more workers does not usually help in such scenarios.
This isn’t a direct answer, but are you sure Zeek dropped the traffic, and not whatever was feeding Zeek? How do you get traffic to Zeek?
Thanks for looking at this. We use port mirroring and feed the packets directly to sensor. It is possible that the port mirror is misconfigured so that it is mirroring more ports than required (We only require mirroring of a single port) and that at high loads the traffic gets dropped at source.
I now understand that Zeek counts gaps in acks to arrive at packet loss calculation. That may explain why it reports so much capture loss when other tools do not.
zeekctl netstats shows highest loss from a Zeek worker is around 0.5% of “on link” packets.
If you don’t want to investigate the packet loss you could still explore options for shedding load when you repo server gets hit with large flows. This gets a bit adventurous but is doable:
- You could try to disable analyzers dynamically for affected connections, via the
- Our packet filter framework comes with optional shunting support. I’m afraid I cannot point at documentation for this (it’s on our list to add), but the code is relatively accessible (if dated).
Thanks for the links Christian. Both approaches look promising!