First post here, so I hope I am posting in the right category.
I am developing an analysis tool that uses zeek conn.log files. For some slightly convoluted processing, I find myself in need of knowing whether each entry in the log (each connection) is completely independent from each other or if there may be any way in which a previous connection event can influence a successive one.
Could anyone point me to some reference documentation regarding this? I would be very grateful.
Can you clarify just what sort of independence you’re looking for? For sure, later connections are (very) often dependent on earlier ones in terms of what occurs in the later connection. I’m guessing you want to do some parallelized processing and need to know if it’s safe to partition each log line irrespective of its particulars and what came before it - whether that’s indeed safe depends on the nature of the processing you want to do.
Thank you for your reply. You are right, my question wasn’t fully well specified.
The application(s) running on the host(s) generating the connection(s) will obviously dictate the successive connection(s).
What I was interested in understanding is whether zeek itself will use some information from previous connections to populate the fields of a successive connection entry in the conn.log file, or if each entry in the conn.log file is populated only with observations regarding the current connection event.
To give a little more context, I am running zeek to extract the conn.log files from a set of pcap captures, so it’s not live traffic.
Hope this makes the question clearer. Thanks again!
Got it. For the most part, yes, the lines in the conn log are independent in the way you describe. It’s not entirely the case, however. For a few protocols (FTP in particular, and also RPC at least back in the day) Zeek learns what ports on a given system are being used for particular services. The most prominent example of this is Zeek learning to expect upcoming “FTP Data” connections on specific ports, and populating their $service field as such when they appear. There might be other such examples these days, I don’t know off-hand.
Unfortunately, I don’t know of any documented discussion of this. Looking through the base scripts, it looks like the other such instance for $service is irc-dcc-data, and possibly gridftp-data.