File Analysis Inconsistencies

I crafted a custom file analysis plugin that attaches to specific MIME types via file_sniff and fires an appropriate event once processing has been completed.

I had to jump through a few hoops to make a file analysis plugin, first, but those were cleared and everything runs and loads appropriately there (bro -NN verified.) My test regime is very straight forward, I have several PCAPs cooked up that contain simple HTTP file GETs (that extract otherwise properly and do not exhibit missing_bytes) and I am running them via bro -C -r <>.pcap. My issue comes with utter and complete inconsistency with execution - it is, effectively, a coin flip, with zero changes.

When I have dumped the buffers being processed, as my file analysis plugin has a secondary verification to make sure the data passed is appropriate - which is confusing, as the mime type fires correct, which seems to indicate a bug somewhere in the data path - the correct execution, clearly has the proper data in it. The invalid executions, again changing nothing other than a subsequent execution, shows a buffer of what appears to be completely random data.

I currently cannot supply the file analysis plugin for inspection, but would very much appreciate insight in how to find the root cause. It very much seems to be upstream. If I run the analysis portion of the plugin as a free standing executable outside of Bro against the data transferred via HTTP, everything works perfect and the structures are filled accordingly.

I saw BIT-1832, and there could be similar root causes in there, but I have not had time to investigate otherwise. The issues I am raising, again, are command line replay via command line, not even “live” network traffic or tcpreplay over a NIC/dummy interface.

Aaron

I crafted a custom file analysis plugin that attaches to specific MIME types via file_sniff and fires an appropriate event once processing has been completed.

I had to jump through a few hoops to make a file analysis plugin, first, but those were cleared and everything runs and loads appropriately there (bro -NN verified.) My test regime is very straight forward, I have several PCAPs cooked up that contain simple HTTP file GETs (that extract otherwise properly and do not exhibit missing_bytes) and I am running them via `bro -C -r <>.pcap`. My issue comes with utter and complete inconsistency with execution - it is, effectively, a coin flip, with zero changes.

When I have dumped the buffers being processed, as my file analysis plugin has a secondary verification to make sure the data passed is appropriate - which is confusing, as the mime type fires correct, which seems to indicate a bug somewhere in the data path - the correct execution, clearly has the proper data in it. The invalid executions, again changing nothing other than a subsequent execution, shows a buffer of what appears to be completely random data.

That sounds a lot like an uninitialized buffer somewhere. I wonder if you compile bro and your plugin with -fsanitize=address if you will trigger something with that.

I currently cannot supply the file analysis plugin for inspection, but would very much appreciate insight in how to find the root cause. It very much seems to be upstream. If I run the analysis portion of the plugin as a free standing executable outside of Bro against the data transferred via HTTP, everything works perfect and the structures are filled accordingly.

If you are seeing what looks like random data in your plugin you should be able to reproduce this behavior by having a file analysis plugin that just dumps out the buffers to stdout (as hex?). Can you rip out all the custom logic in your plugin leaving something that just dumps the buffers as-is? That should leave you with just the hello world of file analysis plugins. If that shows the problem we should be able to figure out where it is coming from.

I don't think file analysis is inherently broken somewhere, otherwise the bro test suite would fail. I think this would have to point to something unique about your plugin. I think you are the first person to build an out of tree file analysis plugin, so there may be an issue with the bro<->plugin interface for file analsys itself. If that is the case, extracting something like the built in md5 analysis plugin to an external plugin and calling it 'mymd5' would show the same problems.

I saw BIT-1832, and there could be similar root causes in there, but I have not had time to investigate otherwise. The issues I am raising, again, are command line replay via command line, not even “live” network traffic or tcpreplay over a NIC/dummy interface.

That does sound similar, but I'm not sure if they were seeing different results on the same pcap on different runs.

Justin,

Indeed, cutting new territory is always interesting. As for the code,

https://github.com/aeppert/test_file_analyzer

File I am using for this case:
https://www.bro.org/static/exchange-2013/faf-exercise.pcap

bro -C -r faf-exercise.pcap after building and installing the plugin.

My suspicion is it’s either unbelievably trivial and I keep missing it because I am the only one staring at it, or it’s a rather deep rabbit hole.

Aaron

Yeah, sounds worth checking for memory errors with a profiler/analyzer that can do that. I think there’s also the ‘memory’ sanitizer to detect uninitialized reads? Valgrind’s memcheck tool has also helped me a lot with such things, IIRC something like `valgrind --leak-check=full --track-origins=yes ...`

- Jon

Thanks for putting that together.. now I see what you mean. Building the plugin with ASAN confirms it is trying to access uninitialized memory:

$ /usr/local/bro/bin/bro -C -r faf-exercise.pcap
TEST::Finalize total_len = 65960
BUFFER
00 ea 09 00 50 61 00 00 80 eb 09 00 50 61 00 00

Justin,

Thank you. I peeled the egg off my face and updated the github code accordingly.

However, I have run into an additional interesting tidbit if I use event file_sniff to attach an analyzer or Files::register_for_mime_types, neither will generate a files.log entry when I am not running a PCAP from the command line. So any kind of normal network processing and/or playing a pcap over a listening interface via tcpreplay will cause the analyzer to fire properly.

However, if I attach the EXTRACT analyzer, all processing goes as expected. What nuance could I be missing here? The plugin effectively initializes like the existing file analysis analyzers, save it’s a plugin. Is there a hard and fast requirement that ignorer for the file analysis framework to work properly the file has to be explicitly extracted? Most additional analysis, I would assume not use disk resources to extract them and, instead, observe what I need and move on.

Any insight from anyone would be greatly appreciated.

Thank you,

Aaron