I am using version 5.0.7-0 Zeek-Lts. At the same time, multiple pcaps are processed in parallel.
All files are extracted with the extension:
How can I prevent the extraction of “Unknown” and “Archive” files that take a long time and slow down the processing of pcaps? I don’t want to extract such files in pcap. How can I get the remaining different types of files to be extracted when these types arrive and continue when they are not?
I do not want to use it not to spoil the existing one due to the changes made in the files used. But as you mentioned, I think that I could get mime_type with file_sniff.
If I get the mime_type, when I do not take any action for the file that comes in Binary or Archive format using if control, could I continue to extract the rest content as Files::add_analyzer(f, Files::ANALYZER_EXTRACT)?
I’m not exactly sure what you mean with “rest content”? Generally, you’d not load the extract-all-files.zeek script and instead put one in place that contains the logic you want and only load that file.
The following would extract jpeg,png and gif files and ignore all others (but track the count of mime types ignored and prints them at the end - only useful for pcap processing). There’s more topics like building regular expressions for the wanted mime types, changing the extracted filename etc, but I hope that gives a start.
$ cat my-extract-files.zeek
option wanted_mimes = set(
global ignored_mime_type_summary: table[string] of count &default=0;
event file_sniff(f: fa_file, meta: fa_metadata)
if ( ! meta?$mime_type ) # Ignoring unset mime type
if ( meta$mime_type !in wanted_mimes )
for ( m, c in ignored_mime_type_summary )
print c, m;