Inconsistent file size during extraction

Hi all,

I’m seeing instances where files are being extracted inconsistently with what is reported in files.log. Here is a redacted example:


#fields ts fuid tx_hosts rx_hosts conn_uids source depth analyzers mime_type filename duration local_orig is_orig seen_bytes total_bytes missing_bytes overflow_bytes timedout parent_fuid md5 sha1 sha256 extracted extracted_cutoff extracted_size
#types time string set[addr] set[addr] set[string] string count set[string] string string interval bool bool count count count count bool string string string string string bool count
1517528771.042220 Fz2Z2m3zwQcc3gqDS3 x.x.x.x x.x.x.x CpaGD227W0Cy2BA1Tf HTTP 0 EXTRACT application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 0.258350 - F 219414 12977556 0 0 F - - - - extract-1517528771.04222-HTTP-Fz2Z2m3zwQcc3gqDS3 F -

File on disk:
219414 Feb 1 16:04 extract-1517528771.04222-HTTP-Fz2Z2m3zwQcc3gqDS3

The file on disk is the same size as the amount of bytes sent to the file analyzer (seen_bytes field) – it should be the same size as the total_bytes field. I’ve seen this happen many times (though, relatively speaking, it is rare).

Any thoughts on this behavior? I’m seeing this on Bro 2.5.1.


Seems that this particular connection may be affected by tapping issues.

Yep, I was going to comment that that’s probably the issue, but I’ll give some more details on why things may end up that way.

“total_bytes” - is for when the size of the file is known by some secondary mechanism, like the file size being transmitted as part of a protocol or a file being read off disk.
“seen_bytes” - represents the number of actual bytes of data that were passed into the file analysis framework.

This is another case where small packet loss issues can have outsized effects because the following bytes can’t be reassembled into the file correctly and you don’t get anymore data.

Also, nice to see on the mailing list again Josh!


Yup, that clears up some things I forgot. And thanks, happy to be active again!