Extract files not authentic copy of file

Hello,

The configuration is extracting certain file types but the files that are extracted are not authentic replications of the files in the stream. The hashes do no match the real files at the user’s endpoint. Upon inspecting the extracted files there seems to be mismatched and duplicated streams.

How can this be corrected? I would like the extracted files to be exactly what the user would download.

Thank you kindly for your help.

Ambros

Hello,

When I load arp_main script (https://gist.github.com/grigorescu/a28b814a8fb626e2a7b4715d278198aa) in local.bro, Bro log only arp traffic and not more.
I just have this logs :
stdout
stderr
stats
notice
arp

When I don’t load this arp script, bro log normaly all traffic…
Do you know why ?

Thanks in advance

Nicolas.

Removing this line should fix things:

redef capture_filters += { ["arp"] = "arp" };

Thanks, Justin. I updated the gist (which is just hosting a copy of the
script found in the mailing list) to remove that line.

It's been on my todo list to turn that into a Bro package.

  --Vlad

"Azoff, Justin S" <jazoff@illinois.edu> writes:

Great! Thank you very much, it works.

Nicolas.

Are you having any trouble with dropped packets? If you are dropping a lot of packets, it's possible for your extracted files to be problematic.

   .Seth

Are you having any trouble with dropped packets? If you are dropping a
lot of packets, it's possible for your extracted files to be
problematic.

Along with that, another possibility is that the host does some transformation
before storing the file. What types of files are these?

    Vern

Thank you Seth and Vern.

Im unsure any packets are being dropped. How would I check if packets are being dropped?

Would dropped packets also have duplicated streams? I’m seeing the same text repeated anywhere from 2-4 times in extracted files.

I’m looking at PDF, EXE, and various MS Office files.

Im unsure any packets are being dropped. How would I check if packets are being dropped?

One heuristic you can use is the capture_loss.log. It will give an estimated percentage of dropped packets based on TCP analysis.

Would dropped packets also have duplicated streams? I’m seeing the same text repeated anywhere from 2-4 times in extracted files.

That seems unlikely to me. The way that the file extraction analyzer and the files framework works should prevent this sort of behavior.

   .Seth