Extract Specific File Types (Not All Files)

Hello all,

I am using version 5.0.7-0 Zeek-Lts. At the same time, multiple pcaps are processed in parallel.

All files are extracted with the extension:

…/policy/frameworks/files/extract-all-files.zeek

How can I prevent the extraction of “Unknown” and “Archive” files that take a long time and slow down the processing of pcaps? I don’t want to extract such files in pcap. How can I get the remaining different types of files to be extracted when these types arrive and continue when they are not?

Thanks,

Helllo @SFD -

The extract-all-files policy script doesn’t support filtering the way you want it I’m afraid. Have you considered using the hosom/file-extraction Zeek package instead?

If you don’t want to use the package, look at it’s main.zeek file and how it’s leveraging the file_sniff() event instead of file_new() to get access to the mime type.

Hope this helps,
Arne

1 Like

Thanks for your suggestions.

I do not want to use it not to spoil the existing one due to the changes made in the files used. But as you mentioned, I think that I could get mime_type with file_sniff.

If I get the mime_type, when I do not take any action for the file that comes in Binary or Archive format using if control, could I continue to extract the rest content as Files::add_analyzer(f, Files::ANALYZER_EXTRACT)?

Thank you,

I’m not exactly sure what you mean with “rest content”? Generally, you’d not load the extract-all-files.zeek script and instead put one in place that contains the logic you want and only load that file.

The following would extract jpeg,png and gif files and ignore all others (but track the count of mime types ignored and prints them at the end - only useful for pcap processing). There’s more topics like building regular expressions for the wanted mime types, changing the extracted filename etc, but I hope that gives a start.

$ cat my-extract-files.zeek 
@load base/files/extract

export {
        option wanted_mimes = set(
                "image/jpeg",
                "image/png",
                "image/gif",
        );
}

global ignored_mime_type_summary: table[string] of count &default=0;

event file_sniff(f: fa_file, meta: fa_metadata)
        {
        if ( ! meta?$mime_type )  # Ignoring unset mime type
                {
                ++ignored_mime_type_summary["<unknown>"];
                return;
                }

        if ( meta$mime_type !in wanted_mimes )
                {
                ++ignored_mime_type_summary[meta$mime_type];
                return;
                }

        Files::add_analyzer(f, Files::ANALYZER_EXTRACT);
        }

event zeek_done()
        {
        for ( m, c in ignored_mime_type_summary )
                print c, m;
        }
1 Like

Assume that unwanted mime types are Binary and Archive format. The rest content means types which are other than these two formats (rest of these).

Thanks for the script.

Instead of .../policy/frameworks/files/extract-all-files.zeek, I will use that script. In the extract-all-files.zeek, I changed the code inside the zeek file, it worked for the image files.

I will find the other mime types. After the adding them, I’ll complete the content. I hope it works smoothly :slight_smile: