MD5 Hashing

What is the correct way to turn on MD5 hashing in SMTP and HTTP logs?
Which variables do I need to set in my share/bro/site/local.bro ?

-Chris

# Windows executables are hashed by default (it's a regex matching the mime type of the file)
redef HTTP::generate_md5 += /image.*/;
redef SMTP::generate_md5 += /image.*/;

Those were pulled from these pages in our docs…
http://www.bro-ids.org/documentation/scripts/base/protocols/http/file-hash.html#id-HTTP::generate_md5
http://www.bro-ids.org/documentation/scripts/base/protocols/smtp/entities.html#id-SMTP::generate_md5

This is being seriously reworked for 2.1 right now too. There is going to be a file analysis policy where you will be able to be declare more easily with much better granularity when you'd like to do certain analyses.

  .Seth

Sounds simple enough.

So, hypothetically, if I wanted SMTP to MD5 hash all mime types that
are image.* or application.*, I would add the lines below to my
local.bro?

redef SMTP::generate_md5 += /image.*/;
redef SMTP::generate_md5 += /application.*/;

I'm assuming that the += operator appends new regular expressions. Is
that correct?

-Chris

So, hypothetically, if I wanted SMTP to MD5 hash all mime types that
are image.* or application.*, I would add the lines below to my
local.bro?

redef SMTP::generate_md5 += /image.*/;
redef SMTP::generate_md5 += /application.*/;

Yep, just keeping in mind that the PDF mime type falls within application/ too (and a number of others).

I'm assuming that the += operator appends new regular expressions. Is
that correct?

Correct.

.Seth

Will the changes in 2.1 allow for passing of data to an MD5 function?
Or will it (the file analysis policy) use protocol knowledge + magic
number to determine if it should be MD5'd or not?

I only ask because seeing an exe downloaded with a mime type of
image/jpg is not completely uncommon.

Will the changes in 2.1 allow for passing of data to an MD5 function?
Or will it (the file analysis policy) use protocol knowledge + magic
number to determine if it should be MD5'd or not?

That's only a cheat mechanism I put in place. You actually have a lot more flexibility than that if you write a bit of code. The HTTP::Info data structure is extended in the scripts/base/protocols/http/file-hash.bro script to get a field named "calc_md5". If you set that field to true (T) before the first chunk of data is seen Bro will calculate an MD5 sum for the transfer. If you handle the http_header event for example, you would just do your condition and then set the field to T. Here's a short and dumb example…

event http_header(c: connection, is_orig: bool, name: string, value: string)
  {
  if ( ! is_orig && name == "CONTENT-TYPE" && value == "IMAGE/JPG" )
    c$http$calc_md5 = T;
  }

This will make Bro calculate md5 sums for any HTTP transfer where the server sent jpg as the content type (this is not what would be matched with the generate_md5 variable as I mention below).

I only ask because seeing an exe downloaded with a mime type of
image/jpg is not completely uncommon.

Those mime types are sniffed (we ignore the content-type header). If it's a windows executable it will be detected as such.

  .Seth