File Scanning Capability

Hello again,

I was hoping to get some guidance on how to best use Bro to process email files. My end goal is to strip out inbound email attachments, identify the file type, then run a distinct set of external tools against them. Each file type would have a different set or order of tools.

I will without a doubt eventually incorporate “http-ext-identified-files.sig” instead of what I am currently using, but I am having trouble determining where to integrate the logic for handling each file type. As it currently works, I am saving off every pdf and word doc, which would be unnecessary if I used bro to call the external tools and evaluate the results.

Current logic (this method calls for the external tools to be run against the directory by cron and are independent of Bro):
#if the hot flag is set then we dump the MIME-decoded attachment to it’s own file for analysis
if( session$entity_is_hot )
{
if ( session$entity_filename == hot_pdf_attachment_filenames )
{
#build the filename out of MD5, length and filename
hot_attachment_dumpname = fmt(“dumped_pdf_files/%s:%d:%s”, session$content_hash, length, session$entity_filename);
}
if ( session$entity_filename == hot_word_attachment_filenames )
{
hot_attachment_dumpname = fmt(“dumped_doc_files/%s:%d:%s”, session$content_hash, length,session$entity_filename);
}
#get a raw filehandle, notice open() instead of open_log_file(), write the data out, and be sure to close the fh
hot_attachment_dump_fh = open( hot_attachment_dumpname );
write_file(hot_attachment_dump_fh, data);
close(hot_attachment_dump_fh);
}

What I would like to be able to do:

if ( session$entity_filename == hot_pdf_attachment_filenames )
{
hot_attachment_dumpname = fmt(“dumped_pdf_files/%d:%s”, length, session$entity_filename);
hot_attachment_dump_fh = open( hot_attachment_dumpname );
write_file(hot_attachment_dump_fh, data);
scan_pdf_file(file) #call the external tools

scan_pdf_file would include something like this:

scanpdf.py (which would include clamscan, pdfid.py, cymruMHR, ssdeep…etc) The pdf python script can pass the results back to bro for handling.

if ( result == bad )
{
alert
}
else
{
delete file, carry on or log results somewhere then delete file
}

The scan for office docs would be similiar, but use ‘OfficeMalScanner’ instead of pdfid.py and pdf-parser.py. If I get this to work, I would like to do something very similar with http files.

How can I call the external tools? Is this the right place to be doing this?

I read in Robin’s ‘Advanced Scripting’ presentation from the 2009 workshop about injecting external information but am still confused how to do the alternative.

I would be surprised if this capability doesn’t already exist and suppose I might be going about this all wrong. I would just prefer to incorporate the file scans in Bro vice running them completely independently. If I wasn’t clear or am completely out in left field feel free to be honest. I won’t be offended.

Thanks in advance!

Will

I will without a doubt eventually incorporate "http-ext-identified-files.sig" instead of what I am currently using, but I am having trouble determining where to integrate the logic for handling each file type. As it currently works, I am saving off every pdf and word doc, which would be unnecessary if I used bro to call the external tools and evaluate the results.

That won't actually work quite right. The http-ext-identified-files.sig file uses special signature keywords that the http analyzer provides to detect file types. It's not directly applicable to SMTP/MIME transfers.

Current logic (this method calls for the external tools to be run against the directory by cron and are independent of Bro):
        hot_attachment_dump_fh = open( hot_attachment_dumpname );
        write_file(hot_attachment_dump_fh, data);
        close(hot_attachment_dump_fh);

In what event are you currently running using this code?

The scan for office docs would be similiar, but use 'OfficeMalScanner' instead of pdfid.py and pdf-parser.py. If I get this to work, I would like to do something very similar with http files.

Makes sense.

How can I call the external tools? Is this the right place to be doing this?

You can't currently do this in a way that would be feasible on live traffic. The problem is that the call to the external tool would block Bro and cause it to start dropping packets. There is a "when" statement that can help build asynchronous function calls though. So that the stack state will be saved and used again when the function call returns. I don't know if the system() (I think this is what you're looking for to run external programs) function can be used with the when statement though.

If you are looking to run this on tracefiles for now though, you can certainly just use the system function to call your external tool. It takes a single argument (a string) that is the command line you'd like to run. There is a function for defanging data if you need to do that too (taking something off the line and using it in the command line) named str_shell_escape. You do need to make sure that the data that is defanged with str_shell_escape is placed within double-quotes.

I would be surprised if this capability doesn't already exist and suppose I might be going about this all wrong. I would just prefer to incorporate the file scans in Bro vice running them completely independently. If I wasn't clear or am completely out in left field feel free to be honest. I won't be offended.

Nope, not out in left field at all and personally I'm a bit ashamed I never wrote a mime-ext.bro script that was a bit more capable like the http-ext script. I'm going to be rewriting the mime.bro script for the next release though and it will definitely have file extraction and identification capabilities built into it. However, we are going to be working toward a much more generalized notion of files for some future release of Bro. I've worked a bit on how that may proceed, but unfortunately we definitely won't be anywhere close to ready with that for the next release.

  .Seth

I will without a doubt eventually incorporate “http-ext-identified-files.sig” instead of what I am currently using, but I am having trouble determining where to integrate the logic for handling each file type. As it currently works, I am saving off every pdf and word doc, which would be unnecessary if I used bro to call the external tools and evaluate the results.

That won’t actually work quite right. The http-ext-identified-files.sig file uses special signature keywords that the http analyzer >>provides to detect file types. It’s not directly applicable to SMTP/MIME transfers.

Understandable. Being that there are so many different types it would be beneficial enough to create a signature file for SMTP/MIME. I would be happy to share it when I get it done.

Current logic (this method calls for the external tools to be run against the directory by cron and are independent of Bro):

hot_attachment_dump_fh = open( hot_attachment_dumpname );
write_file(hot_attachment_dump_fh, data);
close(hot_attachment_dump_fh);

In what event are you currently running using this code?

Here is the entire event:

event mime_entity_data(c: connection, length: count, data: string)
{
local session = get_session(c, T);

#md5 hashing is now a builtin function, so just call it and dumpthe result into the content_hash field
#that field in the info struct was already there, just had to add this to fill it.
session$content_hash = md5_hash(data);

#log the first 256 bytes of the attachment and the MD5 hash.
mime_log_msg(session, “data”, fmt(“%d: %s”, length, sub_bytes(data, 0, 256)));
mime_log_msg(session, “all data”, fmt(“MD5: %s”, session$content_hash));

#if the hot flag is set then we dump the MIME-decoded attachment to it’s own file for analysis
if( session$entity_is_hot )
{
if ( session$entity_filename == hot_pdf_attachment_filenames )
{
#build the filename out of MD5, length and filename
hot_attachment_dumpname = fmt(“dumped_pdf_files/%s:%d:%s”, session$content_hash, length, session$entity_filename);
}
if ( session$entity_filename == hot_word_attachment_filenames )
{
hot_attachment_dumpname = fmt(“dumped_doc_files/%s:%d:%s”, session$content_hash, length,session$entity_filename);
}

#get a raw filehandle, notice open() instead of open_log_file(), write the data out, and be sure to close the fh
hot_attachment_dump_fh = open( hot_attachment_dumpname );
write_file(hot_attachment_dump_fh, data);
close(hot_attachment_dump_fh);

#log stuff to the hot logfile as well
mime_log_hot_msg(session, “hot data”, fmt(“%d: %s”, length, sub_bytes(data, 0, 256)));
mime_log_hot_msg(session, “hot data”, fmt(“File dumped: %s MD5: %s”, session$entity_filename, session$content_hash));
}

I attached the modifed mime.bro in case anyone wanted to see the how the rest of it.

The scan for office docs would be similiar, but use ‘OfficeMalScanner’ instead of pdfid.py and pdf-parser.py. If I get this to work, I would like to do something very similar with http files.

Makes sense.

How can I call the external tools? Is this the right place to be doing this?

You can’t currently do this in a way that would be feasible on live traffic. The problem is that the call to the external tool would block Bro and cause it to start dropping packets. There is a “when” statement that can help build asynchronous function calls though. So that the stack state will be saved and used again when the function call returns. I don’t know if the system() (I think this is what you’re looking for to run external programs) function can be used with the when statement though.

I suppose the short answer is yes. I was looking for something like the system() call. Like modifying the PyBroccoli Example from below:
PyBroccoli Example:
@event
def pong(src_time, dst_time):
print “pong event: time=%f/%f s” %
(dst_time - src_time, current_time() - src_time)
bc = Connection(“127.0.0.1:47758”)
bc.send(“ping”, time(current_time()))

To:

@event (event == dumped pdf file)
def pass_pdf(file):
system(pdf_scan.py -f dumped_file.pdf > tempdir)

With what you mentioned taken into account, we can’t ask bro to wait on the results, but maybe we could dump the results to a logfile for alerting?

If you are looking to run this on tracefiles for now though, you can certainly just use the system function to call your external tool. It takes a single argument (a string) that is the command line you’d like to run. There is a function for defanging data if you need to do that too (taking something off the line and using it in the command line) named str_shell_escape. You do need to make sure that the data that is defanged with str_shell_escape is placed within double-quotes.

I would be surprised if this capability doesn’t already exist and suppose I might be going about this all wrong. I would just prefer to incorporate the file scans in Bro vice running them completely independently. If I wasn’t clear or am completely out in left field feel free to be honest. I won’t be offended.

Nope, not out in left field at all and personally I’m a bit ashamed I never wrote a mime-ext.bro script that was a bit more capable like the http-ext script. I’m going to be rewriting the mime.bro script for the next release though and it will definitely have file extraction and identification capabilities built into it. However, we are going to be working toward a much more generalized notion of files for some future release of Bro. I’ve worked a bit on how that may proceed, but unfortunately we definitely won’t be anywhere close to ready with that for the next release.

Maybe you should charge "more" for Bro...

No, you all are doing a great job on this project. I just wish I could do more to help.

.Seth


Seth Hall
International Computer Science Institute
(Bro) because everyone has a network
http://www.bro-ids.org/

Will

mime.bro (11.7 KB)

Hi Will:

Seems like you would probably want to use the python broccoli bindings
to send an event to a python client, here's what I'm doing with my
"stomper" code, which looks up urls on the fly in a malware database:

# In your bro startup script
@load listen-clear

redef Remote::destinations += {
        ["remote_stomper"] = [ $host=127.0.0.1, $events = /remote_check_URL/,
$connect=F, $ssl=F ]
...

#within bro policy

# Here we send to the broccoli client for checking/processing
event remote_check_URL(++stomper_seqno, c, is_orig, host, uri, ts);

.....................

On the python side, the relevant sections from the python code, which
is running as a daemon accepting events from bro and acting on them:

#! /usr/bin/env python

I forgot to mention here that you can do the file detection fully at the script layer with the identify_data data function. It takes a string which is the data at the beginning of a file and a boolean argument. If the boolean is true, it means you want the mime type (from libmagic), otherwise it returns the description of the file (again, from libmagic).

  .Seth

I implemented a straw man version of what you are doing for html file
transfers - in particular looking at PDF files via the pdfid tool. As
Jim pointed out, it is trivial to do a python->bro event call back via
Broccoli. I will post the code when I get back home - it is more of a
hack, but might prove to be helpful.

cheers,
scott

Thanks for that example Jim!

That gives me a bunch of other ideas. The best thing about using this method would be near real-time scanning and notifications vice running a cron’d script at a given interval.

In your code below, what are you asking bro to do, if anything with the returned value?

If the category signals a block

bro_conn.send(“stomper_block”,seqno)

return

#Main program - Initialize and call event loop

Setup the connection to bro

bro_conn = broccoli.Connection(“127.0.0.1:47758”)

Event loop

bro_event_loop(bro_conn)

Will

Thanks Scott. That would be great.

I assume you meant ‘http’ file transfers?

I have a very limited amount of experience analyzing pdf files, but understand that there are many characteristics that can be used to narrow down files that actually need to be analyzed. I am interested in parsing the results of pdfid.py, if conditions are met, pass the results in an alert. And potentially triggering pdf-parser.py to add additional content for analysis in the alert.

I would be very interested in seeing what you are doing.

Will

Hi Will,

When bro receives the event, it will raise a notice that will execute
a custom host-pair-drop-connectivity script that drops the
source/destination host pair for a short period to interrupt the
connection in realtime.

seqno is used by bro to keep track of which request it sent, so that
the event can identify the request that was made. This is in a table
whose entries expire rapidly (the timeout > the expected response time
of the python program)

BTW:

I believe there was a bug in my code above (i put it down half-baked a
while ago, and haven't picked it up in a while) - the broccoli event
should have the same number of arguments as the bro event that sends
it, and vice versa.

Understood. Thanks again for the info.

Will