Renaming carved files

I’m trying to find a simple way to rename a carved file back to it’s original file name using bro-script rather than having bash try to rip it out of the files.log file. I have seen the mime type analyzers on git that re-add the extension based on known mime types, but I’d rather be able to immediately identify the original file name as it came across the wire. I don’t need the unique session identifier because by the time I’m using bro file analysis I already have the individual session pcap isolated.

I’m guessing there should be a way to capture the files.log table data in broscript, match the unique file identifier then rename the file with that filename string from files.log.

This is a tricky thing to do regardless of how you do it. What happens when the file was transfered over something besides protocols with URLs? Or, what if the file is a PE and includes an original name in its manifest but resides at a different URL?

-AK

I’m not expecting there to be a filename associated with every file, but if the filename was in the pcap, for SMTP attachments, FTP file transfers, or HTTP sessions this shouldn’t be a complicated thing to do. I’m looking at this from a network analyst point of view in making it more efficient for them to quickly disseminate information. Maybe the fact that there is no filename for the extracted data makes it more/less interesting depending on the situation. I’m not looking for bro to try to make up a filename based on URI, but rather just get the information from the HTTP header if the filename is present (which I think is how bro gets the filename in files.log for HTTP sessions). In which case just ripping it out of files.log would be the right thing to do. I guess the real question is, is it possible to do that in bro-script? Or is it just more realistic to do that using python/shell?

Maybe this is useful

securityonion-bro-scripts/file-extraction/extract.bro

Regards,

Daniel

https://github.com/Security-Onion-Solutions/securityonion-bro-scripts/blob/master/file-extraction/extract.bro

So the problem I’m running into with this extraction script is here (I’ve already got a script that handles the extracted metadata mime types):

local fname = fmt("/nsm/bro/extracted/%s-%s.%s", f$source, f$id, ext);

I don’t need f$source or f$id in the filename. What I’m searching for is being generated here in main.bro. I just need a way to grab this information and add it to the extract.bro script to rename extracted file.

https://www.bro.org/sphinx-git/scripts/base/frameworks/files/main.bro.html#type-Files::Info

Files::Info

filename: [`string`](https://www.bro.org/sphinx-git/script-reference/types.html#type-string) [`&log`](https://www.bro.org/sphinx-git/script-reference/attributes.html#attr-&log) [`&optional`](https://www.bro.org/sphinx-git/script-reference/attributes.html#attr-&optional)

A filename for the file if one is available from the source for the file. These will frequently come from “Content-Disposition” headers in network protocols

The logic (forgive my terrible syntax) should be along the lines of

if f$filename is not empty,

local fname = fmt(outputdir, f$filename, ext);

else

local fname = fmt(“outputdir”, f$source, f$id, ext);

Michael,

I haven’t tested this other than validate syntax, but I think the logic you’re looking for is below. You of course have to add in the dynamic extension mapping and maybe make the outputdir configurable w/ an export {} block. Basically, you have to check to see if the filename is set. I would caution you, that there are many instances where it is not set, however. If you’re looking for a more robust file extraction strategy, I would recommend [1]. There’s some additional overhead in moving files around, but it allows you to store files by hash once extraction is complete. This should greatly reduce your disk usage and processing overhead of any follow on processing.

event file_sniff(f: fa_file, meta: fa_metadata)
{
local fname = "";
local outputdir = "/data/bro/extracted_files/";
local ext = ".out";

# .. logic here to generate ext (with starting .) and outputdir (with ending /)
if ( f?$info && f$info?$filename )
   fname = cat(outputdir, f$info$filename, ext);
else
   fname = cat(outputdir, f$source, f$id, ext);

Files::add_analyzer(f, Files::ANALYZER_EXTRACT, [$extract_filename=fname]);
}

[1] https://github.com/hosom/bro-file-extraction

Derek,

This is nearly spot on. Here’s what I have in main.bro from the git link you provided that almost works, but is missing some sort of syntax, as it’s giving me errors. If I comment out the If/else statement f$info$filename gives me the content-disposition extracted filename from the protocol. But I need a check placed in line to see if f$info$filename is empty, it’s empty it should go ahead and try to figure out a mime-type extension. Very close, and it’s probably something very obvious I’m looking over.

@load ./file-extensions

module FileExtraction;

export {

Path to store files

const path: string = “” &redef;

Hook to include files in extraction

global extract: hook(f: fa_file, meta: fa_metadata);

Hook to exclude files from extraction

global ignore: hook(f: fa_file, meta: fa_metadata);
}

event file_sniff(f: fa_file, meta: fa_metadata)
{
if ( meta?$mime_type && !hook FileExtraction::extract(f, meta) )
{
if ( !hook FileExtraction::ignore(f, meta) )
return;
if ( meta$mime_type in mime_to_ext )
local fext = mime_to_ext[meta$mime_type];
else
fext = split_string(meta$mime_type, ///)[1];

if ( f$info$filename != “” )
local fname = cat("%s%s-%s", path, f$source, f$info$filename);
else
local fname = cat("%s%s-%s.%s", path, f$source, f$id, fext);
Files::add_analyzer(f, Files::ANALYZER_EXTRACT,
[$extract_filename=fname]);
}
}

error in /opt/bro/share/bro/site/file-extraction/plugins/./…/./main.bro, line 26 and /opt/bro/share/bro/site/file-extraction/plugins/./…/./main.bro, line 28: already defined (FileExtraction::fname)
error in /opt/bro/share/bro/base/frameworks/files/./main.bro, lines 18-28 and /opt/bro/share/bro/site/file-extraction/plugins/./…/./main.bro, line 30: incompatible record types (Files::AnalyzerArgs and [$extract_filename=FileExtraction::fname])
error in /opt/bro/share/bro/site/file-extraction/plugins/./…/./main.bro, line 30 and /opt/bro/share/bro/base/frameworks/files/./main.bro, lines 18-28: type mismatch ([$extract_filename=FileExtraction::fname] and Files::AnalyzerArgs)
error in /opt/bro/share/bro/site/file-extraction/plugins/./…/./main.bro, lines 29-30: argument type mismatch in function call (Files::add_analyzer(FileExtraction::f, Files::ANALYZER_EXTRACT, [$extract_filename=FileExtraction::fname]))
warning in /opt/bro/share/bro/site/file-extraction/plugins/./…/./main.bro, line 30: expression value ignored (Files::add_analyzer(FileExtraction::f, Files::ANALYZER_EXTRACT, [$extract_filename=FileExtraction::fname]))

Disregard last, the correct answer was to not go off on my own and try to use an != “” Also used fmt instead of cat, and removed the unnecessary local statement. Thank you to everyone that lent a hand in this.

The correct script (which now works…)

@load ./file-extensions

module FileExtraction;

export {

Path to store files

const path: string = “” &redef;

Hook to include files in extraction

global extract: hook(f: fa_file, meta: fa_metadata);

Hook to exclude files from extraction

global ignore: hook(f: fa_file, meta: fa_metadata);
}

event file_sniff(f: fa_file, meta: fa_metadata)
{
if ( meta?$mime_type && !hook FileExtraction::extract(f, meta) )
{
if ( !hook FileExtraction::ignore(f, meta) )
return;
if ( meta$mime_type in mime_to_ext )
local fext = mime_to_ext[meta$mime_type];
else
fext = split_string(meta$mime_type, ///)[1];

if ( f?$info && f$info?$filename )
local fname = fmt("%s%s-%s", path, f$source, f$info$filename);
else
fname = fmt("%s%s-%s.%s", path, f$source, f$id, fext);
Files::add_analyzer(f, Files::ANALYZER_EXTRACT,
[$extract_filename=fname]);
}
}

I actually had this fully implemented a long time ago (naming files as they were named on the wire), but then I ripped it all out because it gave attackers the ability to control files being written on your file system. FireEye just got caught doing nearly this same thing recently and it turned out to be an evasion for them. I generally would not recommend going down the path of letting attackers control file names on your disk because you're likely to open a much larger hole than an evasion if you aren't extremely careful.

I am curious why you would like to do that though? Is it purely for convenience when you are doing analysis?

  .Seth

This is pretty common practice among forensic network analysis tools. The page preview function is one of the reasons Netwitness is so popular with analysts. Dangerous as well, it will attempt to render entire pages of HTTP based off of carved files. I’ve recommended the analysts just look in files.log if they want to see the original file name. From my perspective, the best solution is the mime type file analysis. To take it a step further a simple check to see if the mime type matches the file extension seen in the content-disposition header.

This is pretty common practice among forensic network analysis tools. The page preview function is one of the reasons Netwitness is so popular with analysts. Dangerous as well, it will attempt to render entire pages of HTTP based off of carved files. I've recommended the analysts just look in files.log if they want to see the original file name.

I've never used netwitness, but wow. I suppose you're saying that you need the files named as they were on the remote server so the page display works? I would expect more html/css munging to be required even with the files named in the same way though, so you might as well just name the files in another way. :slight_smile:

From my perspective, the best solution is the mime type file analysis. To take it a step further a simple check to see if the mime type matches the file extension seen in the content-disposition header.

I'd be curious to see how many files don't match their declared mime types, I bet a lot. I thought about writing a script to do this once, but then stopped myself because at the very least, there are lots of favicon files that are jpegs and gifs, but the remote server even declares in the header that it's actually an icon file (since servers typically just base on the file extension). I would still be interested to see what people's experiences are if anyone ever takes it on though (i.e., does it catch anything worth following).

Thanks,
  .Seth

I'd be curious to see how many files don't match their declared mime types, I bet a lot. I thought about writing a script to do this once, but then stopped myself because at the very least, there are lots of favicon files that are jpegs and gifs, but the remote server even declares in the header that it's actually an icon file (since servers typically just base on the file extension). I would still be interested to see what people's experiences are if anyone ever takes it on though (i.e., does it catch anything worth following).

That's exactly the kind of script I've just written. Will send an update how it behaves in a week or so. It's going to be deployed in several busy offices.

This code works well. Any nice way to remove the spaces from the DOCs?

Cheers,

JB