- extracted filename with md5

Hi everyone,

I want to extract files and have their names include their md5 hash.
The problem is that the md5 hashing happens on file_hash event while file extraction occurs on former events such as file_new or file_over_new_connection.

Any ideas on how to accomplish this?

Thanks
B

Hi,

Below you can find a script that does file extraction and renames files to include the MD5 hash of the file. I’m using the file_sniff event to extract files and at this point I save them using the timestamp and the file ID. Extracted files are saved in a top level directory.

Later on, in the file_state_remove event (at which point the file’s MD5 should be available) I rename the file using the MD5 hash, and retaining the file’s extension. I’m saying that in the file_state_remove event the file’s MD5 should be available, but it’s not always the case. One possible situation in which the MD5 is missing is when Zeek is missing some bytes. Renamed files are being moved in a sub-directory using the date when the file was seen.

The script below allows you to customise the MIME types of the files that you want to extract and to restrict it to files downloaded by one given IP address. Feel free to customise it to fit your needs. The location where files are extracted can be customised as well.

Cheers,
Liviu

# MIME-types to be extracted
const extracted_mime_types = set(
        # Images:
        "image/jpeg",
        "image/png"
);

# Client for which to extract files
const target_client = 10.0.0.1 &redef;

redef FileExtract::prefix = "/data/zeek/extracted_files/";

export {
    ## Path where extracted files are saved
    const file_extract_path: string = "/data/zeek/extracted_files/" &redef;
}

# File extraction
event file_sniff(f: fa_file, meta: fa_metadata)
{
        # Check the right mime-type to extract.
        if ( ! meta?$mime_type || meta$mime_type !in extracted_mime_types )
                return;

        if ( target_client !in f$info$rx_hosts )
            return;

        for (i in meta$mime_types)
        {
                if(meta$mime_types[i]$mime in extracted_mime_types)
                {
                        local fext = split_string(meta$mime_types[i]$mime, /\//)[1];
                        local ntime = fmt("%D", network_time());
                        local fname = fmt("%s_%s.%s", ntime, f$id, fext);
                        Files::add_analyzer(f, Files::ANALYZER_EXTRACT, [$extract_filename=fname]);
                        break;
                }
        }
}


event file_state_remove(f: fa_file)
{

        if ( !f$info?$extracted || !f$info?$md5 || FileExtract::prefix == "" )
                return;

        local orig = f$info$extracted;

        local split_orig = split_string(f$info$extracted, /\./);
        local extension = split_orig[|split_orig|-1];

        local ntime = fmt("%D", network_time());
        local ndate = sub_bytes(ntime, 1, 10);
        local dest_dir = fmt("%s%s", FileExtract::prefix, ndate);
        mkdir(dest_dir);
        local dest = fmt("%s/%s.%s", dest_dir, f$info$md5, extension);

        local cmd = fmt("mv %s/%s %s", FileExtract::prefix, orig, dest);
        when ( local result = Exec::run([$cmd=cmd]) )
                {
                }
        f$info$extracted = dest;
}