File extraction package

Hello,

We are trying to do some customization to the file extraction package https://github.com/hosom/file-extraction

Does any one have any suggestions on how I can get any of these done?

Hello,

We are trying to do some customization to the file extraction package https://github.com/hosom/file-extraction

Does any one have any suggestions on how I can get any of these done?

Is there a way to define what network you want the “file extracting package” to extract the files from? Instead of extracting files from all the networks defined in network.cfg. Example: if I have 7 subnets defined in network.cfg but I only the file extracting package to extract files from 2 out of the 7.

yes, just make a set[subnet] and add the networks you want to it. the
networks.cfg just auto generates one for you called Site::local_nets

Is there a way to dedup the extracted files. Example: If a file was sent to 20 people, I only want to see the file 1 time instead of 20 times.

easiest way to do this part is to just name the file the hash, but you
could track recent files with a set[string].

We would also like to exclude certain file types based coming via SMB. Example: excluding all .pdf files I just want to exclude .pdf files coming via SMB.

If you look at how the plugins in that package are written, they are
just small scripts containing an if statement:

https://github.com/hosom/file-extraction/blob/master/scripts/plugins/extract-pdf.zeek

so you would just need something like

const pdf_types: set[string] = { "application/pdf" };

hook FileExtraction::extract(f: fa_file, meta: fa_metadata) &priority=5
{
    if ( f$source != "SMB" && meta$mime_type in pdf_types )
        break;
}

or keep extracting all pdfs and ignore the ones that come from smb.

hook FileExtraction::ignore(f: fa_file, meta: fa_metadata)
{
    if ( f$source == "SMB" && meta$mime_type in pdf_types )
        break;
}

Thanks for the response Justin. How do I make a "set[subnet]" and what file do I add it in?

Also is there a way to have Zeek organize my extracted files on an hourly basis. So I want zeek to store all extracted files from each hour in a separate timestamped folder.

I currently have the extracted files being stored in this directory: /logs/bro/spool/extracted_files/

Which I created and defined in:
/usr/local/zeek/share/zeek/site/file-extraction/ config.zeek

redef path = "/logs/bro/spool/extracted_files/";