http analyzer and de-obfuscating the payload

While writing a few policies to track an extremely basic malware
"protocol" that sits on top of HTTP, I ran into a few questions that I
haven't been able to find answers for.

1. Are binpac analyzers preferred over the hand-written one? From what
I can tell, which may be wrong, the http binpac analyzer does not send a
http_entity_data event so using http-extract-items is not possible. Is
it possible to extract http items using the binpac analyzer or am I
better off sticking with the hand-written one?

2. When processing events, i.e. http_message_done, is it possible to
access the entire assembled stream without writing it to disk first? I
have some malware traffic that I would like to analyze with bro, but the
data is obfuscated within the http data section using layers of xor,
compression, and encryption techniques. Ideally, I would use bro to
de-obfuscate the streams and provide additional info in the log files
instead of using python scripts after running bro. I have no problems
writing the bifs (I've already created an xor one), but want to make
sure the info is available if I do write them.

3. Along the same lines as #2, is the assembled stream available for
connections that are not http?

Any help is appreciated. Thanks in advance.

Is it possible to extract http items using the binpac analyzer or am I
better off sticking with the hand-written one?

Binpac analyzers are preferred when writing new analyzers, but some of the binpac analyzers are not at feature parity with their handwritten counterparts (HTTP is the primary problem in this regard). For now, I recommend not using the --enable-binpac flag when doing HTTP analysis.

2. When processing events, i.e. http_message_done, is it possible to
access the entire assembled stream without writing it to disk first?

No. Generally when doing stream analysis with Bro you have two options. The best, if your analysis method allows it is to do the analysis in a streaming fashion with chunks of data as they become available. If your analysis method needs random access to the data, then you are probably best off writing to disk and kicking off an external process (from within Bro) once the stream is completed and the file is closed. The output of that analysis could then feed back into Bro using Broccoli.

You typically don't want to try storing large streams in memory because it would be far too easy to use all available memory and crash Bro. Of course, if you are running Bro on tracefiles instead of live network interfaces that may not be a concern.

3. Along the same lines as #2, is the assembled stream available for
connections that are not http?

It depends on the protocol and the analyzer. If you search through the event.bif.bro file for "_data", that will point out analyzer events which likely are sending a stream of data. The analyzers which currently have _data events are: HTTP, SMTP, POP3, and MIME. Unfortunately some of the other obvious ones like SMB and NFS don't currently have _data events. We accept patches though if you'd like to add support for that. :slight_smile:

Is there a protocol or set of protocols in particular that you'd like to see supported with _data events?

  .Seth

2. When processing events, i.e. http_message_done, is it possible to
access the entire assembled stream without writing it to disk first?

No. Generally when doing stream analysis with Bro you have two options. The best, if your analysis method allows it is to do the analysis in a streaming fashion with chunks of data as they become available. If your analysis method needs random access to the data, then you are probably best off writing to disk and kicking off an external process (from within Bro) once the stream is completed and the file is closed. The output of that analysis could then feed back into Bro using Broccoli.

I didn't think of using broccoli to feed it back into the system. I'll have to reconsider my current setup to see if that makes sense. It works now without it, but there is definitely a benefit of having additional information within bro's log files.

You typically don't want to try storing large streams in memory because it would be far too easy to use all available memory and crash Bro. Of course, if you are running Bro on tracefiles instead of live network interfaces that may not be a concern.

All the analysis that I have been (and will be doing) is with tracefiles on a machine that is not connected to a network. I figured that there were chances that I could run out of memory, but was hoping that the memory would be released once the connection was terminated. I did not think about using a table of strings to keep the data... guess I was thinking too deep.

3. Along the same lines as #2, is the assembled stream available for
connections that are not http?

It depends on the protocol and the analyzer. If you search through the event.bif.bro file for "_data", that will point out analyzer events which likely are sending a stream of data. The analyzers which currently have _data events are: HTTP, SMTP, POP3, and MIME. Unfortunately some of the other obvious ones like SMB and NFS don't currently have _data events. We accept patches though if you'd like to add support for that. :slight_smile:

I figured that you would accept patches. It has been awhile since I've used C++, but hoping it will come back to me. I have spent a lot of time looking at the source code to better understand how bro works. I would love to see RDP and SSL decryption, but I know that those aren't easy tasks... doesnt mean I wont try eventually.

Is there a protocol or set of protocols in particular that you'd like to see supported with _data events?

I haven't seen anything yet, but I'm sure that I'll come across something eventually.

Thanks for all the help.

I didn't think of using broccoli to feed it back into the system. I'll have to reconsider my current setup to see if that makes sense. It works now without it, but there is definitely a benefit of having additional information within bro's log files.

It's especially useful when you're using Bro on live network because the information gained from the external analysis could feed back into Bro to change it's behavior if the same thing is seen again. As a personal exercise, I'm going to start including concrete examples when I talk about techniques in Bro. :slight_smile: So, here's my concrete example...

Bro identifies a Windows executable being downloaded over HTTP so it begins calculating an MD5 sum of the bytes being transferred. It could also save the file to disk. When the file is done being transferred, the on-disk filename could be sent off to an external process which grabs the file does something like run it through VirusTotal and returns the result of that scan to Bro. If the file is determined to be malicious an alarm could be raised about the initial transfer and the MD5 sum could be added to a set of malicious MD5 sums. The URL of the file could also be added to a set of URLs. In the future, if any host downloads a file with that MD5 sum or from the same URL then an alarm would automatically be raised without waiting for the external analysis to take place. This full scenario is not currently implemented in Bro, but things are lining up to make this sort of analysis possible.

If you have ideas for analysis scenarios that you'd like to see implemented, I'd really like to hear them!

All the analysis that I have been (and will be doing) is with tracefiles on a machine that is not connected to a network. I figured that there were chances that I could run out of memory, but was hoping that the memory would be released once the connection was terminated. I did not think about using a table of strings to keep the data... guess I was thinking too deep.

You could either keep a table of strings or concatenate the strings together as new data comes in. I'll include some examples here.

Using these inputs...
global a = "first string";
global b = "second string";
global output = "";

You can do this...
global stuff: string_array = table();
stuff[|stuff|+1] = a;
stuff[|stuff|+1] = b;
output = cat_string_array(stuff);

Or this...
output = string_cat(a, b);

I figured that you would accept patches. It has been awhile since I've used C++, but hoping it will come back to me. I have spent a lot of time looking at the source code to better understand how bro works. I would love to see RDP and SSL decryption, but I know that those aren't easy tasks... doesnt mean I wont try eventually.

Bro currently doesn't have any support for RDP but I think that a lot of the support for SSL decryption is already in place. I've haven't ever done it though so I don't know if it is completely there and working though.

  .Seth

> 3. Along the same lines as #2, is the assembled stream available for
> connections that are not http?

It depends on the protocol and the analyzer.

Note, there are also generic tcp_contents() and udp_contents() events.
They likewise return the stream piecemeal.

    Vern

Or this...
output = string_cat(a, b);

One caveat is that the string_cat approach is essentially O(N^2) in the
size of the reassembled stream, because it winds up repeatedly copying the
entire string. Ideally we'd fix this under the hood, one fine day ...

    Vern