File extraction and archive files

I’ve been tasked to find files with a specific “signature” in the file header, where the file will be within an archive of files. This needs to be agnostic of the protocol that transported the archive file.

I’m thinking the way to do this is to use the new File Analysis framework. Does Bro provide a mechanism to “automagically” extract the contents of an archive when it is an archive file that is being extracted from a protocol, or is this something I’m going to have to script myself? How can I know that a file has been fully received such that I can begin my analysis?

Thanks - Jon

I've been tasked to find files with a specific "signature" in the file header, where the file will be within an archive of files. This needs to be agnostic of the protocol that transported the archive file.

I'm thinking the way to do this is to use the new File Analysis framework. Does Bro provide a mechanism to "automagically" extract the contents of an archive when it is an archive file that is being extracted from a protocol,

It does not currently recurse on the contents of archive files “on-the-fly”.

or is this something I'm going to have to script myself?

A way (that I can think of) to possibly do this only in Bro scripts would be to extract the full archive to disk using the File Analysis Framework, then use Bro’s Exec module to expand the archive, and finally use the Input framework to feed the contents back in to the File Analysis Framework.

How can I know that a file has been fully received such that I can begin my analysis?

The “file_state_remove” event is when you’ve got as much data of the file as you’re going to get. Whether it’s actually the full file: sometimes you can’t tell and the best you can to is check that missing_bytes is zero (doesn’t appear to have been missed packets), but other times you may be able to check that seen_bytes == total_bytes.

- Jon