as a follow-up to my question about "delayed bro operation" I would
like to propose a new feature for bro. I call it batch mode and it
helps to run bro over a large amount of pcap files.
Until now I used a modified version of tcpslice to send multiple pcaps
to bro through a pipe. This setup works but is a bit complicated. It
also has the downside, that bro blocks reading from the pipe which
breaks the usual event loop.
What I hacked together is a simple change to PcapSource: When passed a
directory instead of a file it looks for files with the suffix pcap and
processes them. When EOF is reached, it closes the file and renames it.
But instead of also closing the IOSource it looks for the next file.
When there are no files to process, it behaves just like live capture
mode when there are no packets available.
My patch is quite basic right now: just grab the first pcap you can
find and work with it, but one could think of extended features:
1) Read files to work on from text file: This would also come handy
when the source files are distributed in the file system, e.g. sorted by
date or just to avoid to many files in one directory. Compared to
passing multiple file names to bro via command line, this also works
around the problem of a "to long argument list".
2) sort mode: check timestamps of all available files in directory and
process them in the right order. This mode would have to be smart
enough not to open all files at the same time running out of file
descriptors (like mergecap). So check timestamps first and open only the
files needed. This could be more than one when there are separate pcaps
3) For each flow save the the name/path of the first and last file it
was read from. So when detailed analysis is necessary, your exactly know
which files to open.
I would like to know if others would be interested in this kind of
feature. Also better ideas how to solve this "the bro way" are welcome.