Bytes in conn.log is way to large

Hi,

I got some problems with the number of byte values in conn.log, they are
way too large [1]. My Bro's running offline on traces.

To get rid of this issue, I tried use large-conns.bro, but it looks like
that large-conns.bro has a problem when reading a trace from stdin. I
tried it with bro-1.3.2 and with a current branch from Robin.

I get the following error when reading from stdin:
/home/bro-conn-log/bin/bro: problem with trace file - - bad dump file format

replay:/data/gregor/blub# /home/bro-conn-log/bin/bro -r - gm_conn <
/data/pcap/slice-0000.cr.pcap
/home/bro-conn-log/bin/bro: problem with trace file - - bad dump file format

replay:/data/gregor/blub# cat /data/pcap/slice-0000.cr.pcap |
/home/bro-conn-log/bin/bro -r - gm_conn
/home/bro-conn-log/bin/bro: problem with trace file - - bad dump file format

But reading the file directly works:
replay:/data/gregor/blub# /home/bro-conn-log/bin/bro -r
/data/pcap/slice-0000.cr.pcap gm_conn weird
..... this works.

Since my traces contain contain of several slices, I really do want to
read from stdin.

[1]
I had this problem with two different traces. The first is only
uni-directional, i.e., bro sees only one side of the connection. The
traces contained 50GB of IP data. The sum of the bytes from conn.log was
approx. 2TB(!) however. I then checked for particular large connections
(i.e., > 1GB). All of them had a state with RSTs and judging from the
duration the # bytes was clearly bogus.

I also tried it on a second trace. This one had both directions. 600GB
IP data and conn.log reported 1.9TB. The total # of connections in the
trace is 29M. Of these 62.000 are larger than 1GB. From these large
flows only XXX were terrminated without RSTs.

[2]
# cat gm_conn.bro:
const number_of_regions = 64;
const regin_size = 32 * 1024;
@load large-conns

#@load dpd
@load conn

redef Scan::suppress_scan_checks = T;
redef ignore_checksums = T;

redef dpd_conn_logs = F;

# If we see only one side of a conn, we must reduce these
# timers
redef tcp_inactivity_timeout = 60 secs;
redef udp_inactivity_timeout = 45 secs;
redef icmp_inactivity_timeout = 30 secs;

- --
Gregor Maier gregor@net.t-labs.tu-berlin.de
TU Berlin / Deutsche Telekom Labs gregor.maier@tu-berlin.de
Sekr. TEL 4, FG INET www.net.t-labs.tu-berlin.de
Ernst-Reuter-Platz 7
10587 Berlin, Germany

I also tried it on a second trace. This one had both directions. 600GB
IP data and conn.log reported 1.9TB. The total # of connections in the
trace is 29M. Of these 62.000 are larger than 1GB. From these large
flows only XXX were terrminated without RSTs.

sorry, I sent the Mail too early:

of the 29M connections, 628 are > 1GB and of those 487 are terminated
with a RST.
A lot of these large connections furthermore had very short duratinos
(<<1sec) and had only "traffic" in one direction.

What about adding some sanity checks, so that the byte values are
meaningful even if not using large-conns.bro? Otherwise one cannot rely
at the byte values in conn.log at all.
Maybe such checks could be:
* a "maximum bandwidth" a connection must not exceed
* require that bytes/packets are seen in both directions

cu
gregor

To get rid of this issue, I tried use large-conns.bro, but it looks like
that large-conns.bro has a problem when reading a trace from stdin.

Hmmm, indeed it does. It's because the secondary filter needs to reopen
the packet source, and in this case a second open of stdin gets in trouble
because both filters share the same kernel file descriptor.

It works if you instead use -r filename.

Since my traces contain contain of several slices, I really do want to
read from stdin.

Note, you can use "ipsumdump --collate -w whole-shebang.trace *.trace" to
glue together multiple pcap files into a single coherent trace.

    Vern

What about adding some sanity checks, so that the byte values are
meaningful even if not using large-conns.bro? Otherwise one cannot rely
at the byte values in conn.log at all.
Maybe such checks could be:
* a "maximum bandwidth" a connection must not exceed
* require that bytes/packets are seen in both directions

These are reasonable features to add, but I don't think we'll give them much
priority ourselves. (I.e., if you want to contribute it, we'll integrate it.)

    Vern

Yeah, various post-processing scripts are using such heuristics to
avoid the problem when analyzing connection logs.

Robin