Extract files based on magic number using Bro 2.2

Hi all!

I'm just wondering; Is it possible to extract files based solely on their magic number using Bro 2.2
In Bro 2.1, it was possible to extract files just by comparing the magic number
with the first X bytes. I used the script provided here, with great success:
http://scrapbook.zscaler.com/2012/05/bro-script-to-extract-artifacts-from.html

However, in Bro 2.2, thigs seem to have changed. Most examples and docs now only
seem to use the MIME-type to determine if a file will be extracted or not, e.g. here:
http://www.bro.org/sphinx-git/frameworks/file-analysis.html

I also see that there har been included some sort of "magic number database"(/bro/share/bro/magic/), but I find little
documentation on what its role is in regards of file extraction, as well as the formatting that is being used.

Have I missed something essential here?
If anyone could help me better understand how file extraction works now in Bro 2.2, it is most appreciated! :slight_smile:

Best regards,
Marius P. Haugen.

I'm just wondering; Is it possible to extract files based solely on
their magic number using Bro 2.2
In Bro 2.1, it was possible to extract files just by comparing the
magic number
with the first X bytes. I used the script provided here, with great
success:
http://scrapbook.zscaler.com/2012/05/bro-script-to-extract-artifacts-from.html

However, in Bro 2.2, thigs seem to have changed. Most examples and docs
now only
seem to use the MIME-type to determine if a file will be extracted or
not, e.g. here:
http://www.bro.org/sphinx-git/frameworks/file-analysis.html

You can try handling the “file_new” event, compare f$bof_buffer (Beginning Of File Buffer) to whatever magic you want, and then add the file extraction analyzer to f if it matches (similar to the examples in that webpage you cite, except using f$bof_buffer as the condition instead of f$mime_type).

The “file_new” event is network protocol agnostic so if it’s important to only extract stuff over HTTP, check the value of f$source to find the protocol over which it’s transferred.

I also see that there har been included some sort of "magic number
database"(/bro/share/bro/magic/), but I find little
documentation on what its role is in regards of file extraction, as
well as the formatting that is being used.

The magic database is used by libmagic (the library which implements [1]) to determine the value of f$mime_type. See the magic(5) man page [2] for how magic files are formatted.

- Jon

[1] Ian Darwin's Fine Free File Command
[2] http://linux.die.net/man/5/magic

Hi Jon,

thanks a lot for taking the time to answer my question!
Comparing the the bof_buffer works like a charm! Again, thanks!

- Marius