I am currently investigating an issue with http file extraction with file analyzer that very frequently I see missing_bytes in the file log which causes the file to be incomplete and fails extract the file nor generate a hash.
I am running bro in a virtual machine sniffing on a interface in promiscuous mode that’s is on a virtual switch.
After examining a bunch of packet captures, I tracked the problem down to that when Bro sees out of order ACKs before actual packet, the problem with missing_bytes is observed.
This seems to me that there is no TCP reassembler Bro’s documents indicated that the TCP analyzer for the HTTP analyzer (or file analyzer?), since reassembled TCP payloads are only delivered via a tcp_content event.
Does anyone have any information on how to make this work? Is it a configuration problem or…
Appreciate any tips that you may have thanks!
Take a look at capture_loss.log to see if you are in fact not seeing complete connections.
Missed bytes is telling you that there may be a problem in the acquisition of packets. Have you verified with a packet capture in Wireshark that you can reassemble the connection to get a complete file?
I would also create a clean pcap of the file transfer and then test you are getting your hits on the hash, and then figure out the issue with the packet acquisition. Sometimes you have to disable checksum verification on the NIC to get things working.
Thanks for the reply. I understand that based on documentation, missing_bytes is supposed to indicate missing packets. I previously researched that problem and ended up disabling the interface tcp optimization options including checksum as shown in another Bro related thread. The disabling did work as I don’t see any missing packets when I capture packets on the virtual machine’s interface.
However, this problem here seems different to me. Based on packet capture, all the packets do arrive. The difference here is that the ACK arrives prior to the Packets themselves. In wireshark, it would show ACK’ing unseen packet, and immediate shows that those packets arrive immediately after (wireshark marks those as retransmissions).
I have a http capture that is linked below which shows this sequence.