Ok thanks for the update.
I have tested the two following modules to extract files (the pathes are the ones I have with SecurityOnion) :
/opt/bro/share/bro/file-extraction/extract.bro that gives me files in the /nsm/bro/extracted folder
/opt/bro/share/bro/policy/frameworks/files/extract-all-files.bro (and the md5sum is generated in my files.log) that saves me files in the /nsm/bro/spool/test-seconion-eth0-1/extract_files folder.
I encounter a very problematic issue:
When I download the winrar installer (.exe) I get it correctly extracted (md5sums match) in both output folders (via HTTP)
When I download Firefox installer (.exe) I get nothing (it’s via HTTPS so I suppose it’s the reason why)
When I download audacity (.exe) through HTTP, I get an inccorect .exe file. The original file has a size of 26.5 MB and what I collect in my “extract_files” folder has a size of 1.4 kB. Obviously the md5sums mismatch.
For the moment I can’t trust what I get with Bro since the md5 mismatch, if I download a malware how can I be sure that I’ll get it and be able to submit it to VT for an accurate analysis ?
ps: I’ll try the scripts you sent me and hope the files will be extracted correctly
It's very possible that you encountered packet loss. You can either look at the "missed_bytes" field in conn.log or the "missing_bytes" field in the files.log. If either of those aren't zero, then you probably dropped packets.
Damn, now that I look at those field names, we ended up naming them unfortunately different.
Ok so I tried something. I downloaded audacity, notepad++, 7zip (in HTTP from filehippo not from the official sites to make sure it’s HTTP download).
I captured the downlaod with wireshark, and I found the PE in the pcap, even with bro -r extract_file.
When I just load the extract_file plugin and download my exe files, the extracted files are incomplete (they are much smaller than the real ones).
In addition to that, I suspected that it might have been caused by the -C option but even without this option, my bro -r pcapfile.pcap extract_file module could extract the whole executable.
In interactive mode though, I don’t extract the whole executable.
tldr: The live capture doesn’t extract the whole file but the bro -r pcapfile.pcap path/extract_file does work
Ok so I added the -C option to my brocfg and it works now. I got abused by my tests. My bro installation does not check the checksums at all and I can capture all the files correctly.