Bro 2.2 File Extraction (RHEL 6.5)

Hey Bro List,

I’m trying to setup the File Extraction using Bro 2.2 on a RHEL 6.5

system and its not functioning properly (no files are being extracted

from the pcap).

Here is what I’ve tried:

I put whatever.bro into the directory:

/opt/bro/share/bro/site

I edited “local.bro” and told it to “load whatever.bro”

I verified all configuration syntax: broctl check

I addressed any errors (none)

I install the script: broctl install

Then bounced bro: broctl restart

To test the bro file extraction capabilities, my “whatever.bro” scrip

contains the following:

-----------START

#This produces logs only, no extracted files

event file_new(f: fa_file)

{

Files::add_analyzer(f, Files::ANALYZER_EXTRACT);

}

-----------END

My (produced from tcpdump) pcap contains a five minute section of

traffic where I downloaded a few hp printer drivers to test. Wireshark

was able to extract the files, so we know the pcap file integrity is good.

I ran this on command line to have Bro extract the hp printer driver

files from same pcap file:

bro -C -r my_pcap_file

Logs are produced in the pwd, but no extracted files.

Any ideas?

Two separate things are going on here. Broctl is really focused around running Bro on live traffic and orchestrating all of the complexity involved in that. You are then separately trying to run the Bro binary on a trace file and get output.

Your whatever.bro script is installed and ready to be used when Bro is run with broctl. Since you're just running Bro directly here though, you will want to load your script on the command line like this:

  bro -C -r my_pcap_file whatever.bro

You could also load the full local.bro script if you want that functionality too like this:

  bro -C -r my_pcap_file local.bro whatever.bro

Does that explain things better?

  .Seth

Yes it does!

What I’m trying to do is “Verify that broctl is configured for File Extraction properly”. My method was to test broctl by using bro on the CLI. Your explanation is good information.

I’m going to try that now and update the list on results.

Too easy, that worked! It created the extracted files in the ‘pwd’. I checked the md5 they matched from the wireshark pcap file. I’ll run another test on a tcpdump file and verify the md5 as well.

Three questions then:

  1. Can I safely assume, based on these test results, that broctl will perform the same way as bro?
  2. If so, where will broctl place the ‘extracted_files’ directory?
  3. Lastly, whats the best way to investigate these files (I’m capturing all exe downloads on HTTP)? For example, the directory ‘extracted_files’ will be full of HTTP-blahblah names. How would I correlate those file names to its actual file name? Is that information stored in the conn.log, files.log, http.log, packet_filter.log, & weird.log?

Thanks for your time.

JW

Too easy, that worked! It created the extracted files in the 'pwd'. I checked the md5 they matched from the wireshark pcap file.


Great!

1. Can I safely assume, based on these test results, that broctl will perform the same way as bro?

Generally yes. Broctl is just a control harness for Bro that runs it in a certain way.

2. If so, where will broctl place the 'extracted_files' directory?

Unfortunately that will in the <prefix>/spool/{node-name} directory. You can set it to something system-wide though like this...

redef FilesExtract::prefix = "/extract/here/";

That directory will just need to exist and multiple Bro processes will write extracted files there.

3. Lastly, whats the best way to investigate these files (I'm capturing all exe downloads on HTTP)? For example, the directory 'extracted_files' will be full of HTTP-blahblah names. How would I correlate those file names to its actual file name? Is that information stored in the conn.log, files.log, http.log, packet_filter.log, & weird.log?

Unfortunately again, that's something where you may want to write a script that can take the file names and inspect the logs.

  .Seth

That’s great information. When you say " you can set it to something system-wide though like this"
What file do I edit, or is that entry something I put at the top of my “whatever.bro” ?

No problem about writing a script. We are a big perl/php/shell shop, I guess my question is, what files would I need to parse / correlate to determine the correct / original name of the exe?

Thanks again for your help!

That's great information. When you say " you can set it to something system-wide though like this"
What file do I edit, or is that entry something I put at the top of my "whatever.bro" ?

You could add that directly to local.bro or add it to your whatever.bro script and load that script in local.bro.

I guess my comment about "system-wide" was far too non-specific. :slight_smile:

What I meant is that if you're running a number of worker (traffic sniffing) processes on a single host they will each have their own spool directory which will cause them all to write files to separate subdirectories of their spool/ directory. If you set the prefix to be an absolute path it will cause all of the processes to write their files to that same directory but I don't know what your deployment looks like so I may be giving unhelpful advice.

No problem about writing a script. We are a big perl/php/shell shop, I guess my question is, what files would I need to parse / correlate to determine the correct / original name of the exe?

Ah! That's complicated. You can refer to the "filename" field in the files log. For any files that were extracted, you should be able to find the name of the file that was written to disk in the "extracted" field in the files.log. So, take the filename you have on disk, search for that in the files.log, then look at the "filename" field.

One gotcha here though. We have taken a somewhat tough line on what we consider a "filename". The basic gist is that in order to be a filename it must be something explicitly declared as a filename. In other words, we don't yank path components from HTTP requests to assign as file names. If we did, you'd very likely extract a bunch of files named index.asp and others like that. HTTP actually declares a header field where filenames can be explicitly passed through. Those are extracted and given as filenames in files.log. Other protocols provide file names in various ways as well.

  .Seth

Very Very interesting.

No worries about specifics, I usually ask if I’m still unsure, but thanks for the clarification!
Historically, the standard bro “communication.log, wierd.log, etc.” logs that are created under dated directories are what we currently use. We are now adding the HTTP / exe file carving to our requirements and my thought was how to know what the original .exe filename was since we keep a db of md5’s of known exe’s from the OS that are used for comparisons. The problem is, I won’t know what file/md5_value to compare it too since I wont know the original filename. Hope that makes sense.

For example, if a user downloads something.exe (via http), bro will create a HTTP-blahblah file name. My problem at that point, is how do I know what the user tried to download, was it “notepad.exe” or “maliciousIntent.exe”? I will only have a directory full of HTTP-blahblah names, correct? That was where I was trying to go. Perhaps I misunderstood your response and you already answered me? If so, apolgies, but I still seem to be missing the connection of the bro created file name when its carved and the actual filename of the exe that the user attempted to download.

The problem is, I won't know what file/md5_value to compare it too since I wont know the original filename. Hope that makes sense.

If you're running Bro with broctl, you will already have hashes (md5 and sha1) for every file transferred in your files.log.

For example, if a user downloads something.exe (via http), bro will create a HTTP-blahblah file name. My problem at that point, is how do I know what the user tried to download, was it "notepad.exe" or "maliciousIntent.exe"? I will only have a directory full of HTTP-blahblah names, correct? That was where I was trying to go. Perhaps I misunderstood your response and you already answered me?

# Look at the extracted files.
$ ls ./extract_files
  extract-HTTP-FsRNbD323oiMhWA761

# Look at the line in files.log that maps to that file.
$ grep extract-HTTP-FsRNbD323oiMhWA761 files.log
1407384770.727269 FsRNbD323oiMhWA761 1.2.3.4 5.6.7.8 CuTpVT1LB2eQP0eMP4 HTTP 0 EXTRACT application/x-dosexec - 1.151308 49152 49152 0 0 F - - - - extract-HTTP-FsRNbD323oiMhWA761

# Look for the HTTP request that maps to that file.
$ grep FsRNbD323oiMhWA761 http.log
1407384770.568614 CuTpVT1LB2eQP0eMP4 5.6.7.8 1066 1.2.3.4 80 1 GET 1.2.3.4 /lprx.php - - 0 49152200 OK - - - (empty) - - - - - FsRNbD323oiMhWA761 application/x-dosexec

You can see in that example that the best file name we could have possibly hoped to extract for that connection would be "lprx.php" which I don't think is what you want. That is real traffic (with modified field data) from a compromised host downloading an update to the malware installed on it.

If so, apolgies, but I still seem to be missing the connection of the bro created file name when its carved and the actual filename of the exe that the user attempted to download.

Ah, ok. I can explain a bit more here. Before arriving at the current model, I spent a lot of time thinking about how to flexibly name files. What I realized is that I don't want aspects of the network traffic to be able to affect the name of the file being written to disk (by default at least, you can do whatever you want in your own scripts). There could be maliciously named files or attempts to play with the path to write into sensitive areas of the file system. By giving the files being written to disk names that were totally fabricated by the Bro process we sidestep any of these potential issues. You can use the name of the extracted file to then pivot back into the logs.

  .Seth

This is awesome information, I’ve been busy “playing” with my extract.bro and scripts to see what kind of process I can come up with. Thanks again for all the assistance!!