Differences in processing multiple traces with BRO and ipsumdump

Hi everyone,

I am puzzled about the outcomes of using ipsumdump or BRO for processing multiple pcap files.

I am using BRO to analyze anomalities in my 12 hours captured network traffic which was saved in 4 Gb pcap files. I want that BRO consider the cases when a connection may have been split in two or more files. I was using ipsumdump to solve this, but I found that some files have errors and cause ipsumdump to crush with this message:

ToDump(bigPcap1.pcap): Inappropriate ioctl for device

Using the capinfo tool I detected that some of my files have packet size larger than normal (65535), so using tshark I cut the part of the file with problems. For example:

capinfos: An error occurred after reading 3830659 packets from “trace2.pcap”: File contains a record that’s not valid.
(pcap: File has 4065648712-byte packet, bigger than maximum of 65535)

So I create a reduced version of trace2.pcap with tshark:
/usr/sbin/tshark -c 3830659 -r trace2.pcap -w trace2-new.pcap

This solution seemed to work fine, all the ***-new.pcap have no errors while reading with capinfo or wireshark, but even so there are some that still cause problems for processing. For example:

I processed the following files in 3 different ways:
trace1.pcap, trace2-new.pcap, trace3.pcap (trace2.pcap was replaced because of the packet size error)

FIRST TRY - using ipsumdump with collate option:
ipsumdump --collate -w - trace* |bro -r - brolite myenvironment -f “tcp or udp or icmp” dpd_conn_logs=T dpd detect_protocols dyn_disable irc-bot proxy ftp

9.7 MB conn.log with 114861 lines (number of connections)

SECOND TRY - using ipsumpdump without collate option
ipsumdump --collate -w - trace* |bro -r - brolite myenvironment -f “tcp or udp or icmp” dpd_conn_logs=T dpd detect_protocols dyn_disable irc-bot proxy ftp

Output:
19 Mbytes conn.log with 228922 lines with 950 repeated connections

THIRD TRY - without ipsumdump:

/usr/local/bro/bin/bro -r trace1.pcap -r trace2-new.pcap -r trace3.pcap brolite todai -f “tcp or udp or icmp” dpd_conn_logs=T dpd detect-protocols dyn-disable irc-bot proxy ftp 2>bro-error3.log

Output:
15 Mbytes conn.log with 169168 lines, connections are not repeated

COMMENTS:
pcap files has not overlap traffic (it was checked with trace-summary using first packet seen and last packet seen).
I tried the ipsumdump with both collate and no collate option because when I used ipsumdump only (without bro), with collate option the resulted larger pcap file was a 7.9 GB file but without collate option the resulted file was 12.GB (trace1.pcap: 4 MB, trace2-new.pcap: 3.9GB, trace3.pcap: 4GB). Besides, while using ipsumpdump --collate alone, the progress bar showed something like this:
66%****************** |8017MB ETAToDump(LargerTrace.pcap): Success
100%****************************|12113MB
But the progress bar for ipsumdump without the collate option didn’t split and reach the 100% 12113MB.

If anyone can illuminate this matter, it will be a great help.

Veronica

It looks like ipsumdump might be changing the snaplen to 2000 bytes when it writes out the pcap file. I don’t see an runtime option to change the snaplen.

Another tool you can try to merge those files is tcpslice from ftp://ftp.ee.lbl.gov/tcpslice.tar.gz. I have been able to preserve the snaplen using tcpslice.

tcpslice trace*.pcap -w - | bro -r - …

Sri

Hi, Sridhar!

I tried again with others tools (mergecap & tcpslice) and I found that all of them behaved in the same way. I got the same number of connections after analyzing with bro (same number of input several pcap files directly to bro - 169168 connections).

Moreover, after doing other tests with other pcap files, I realized that ipsumdump was having problems with one of my files, although that file can be read with many tools without any problems. That is why bro found a different number of connections using the output of ipsumpdump.

I still puzzled over ipsumdump because the difference in connection number is big and the tool does not give you any hint about the existence of a problem, thus it is easy to get a wrong analysis with bro.

Veronica Estrada
Nakao’s Laboratory
Univ. of Tokyo

I still puzzled over ipsumdump because the difference in connection number
is big and the tool does not give you any hint about the existence of a
problem, thus it is easy to get a wrong analysis with bro.

Hmmmm - we make heavy use of ipsumdump for trace analysis, and haven't run
across this sort of problem before. If you can put together a demonstration
of the problem, send it to Eddie Kohler <kohler@cs.ucla.edu> (the ipsumdump
developer), he's quite responsive in fixing bugs. Also, cc me on the note,
as I'd like to understand the issue better.

    Vern

Vern Paxson wrote:

I still puzzled over ipsumdump because the difference in connection number
is big and the tool does not give you any hint about the existence of a
problem, thus it is easy to get a wrong analysis with bro.

Hmmmm - we make heavy use of ipsumdump for trace analysis, and haven't run
across this sort of problem before. If you can put together a demonstration
of the problem, send it to Eddie Kohler <kohler@cs.ucla.edu> (the ipsumdump
developer), he's quite responsive in fixing bugs. Also, cc me on the note,
as I'd like to understand the issue better.

    Vern

I used to use ipsumdump to stitch together multiple pcap files into one, but
have found on occasion that it doesn't always output in timestamp sorted order.
Don't have a testcase right now, but IIRC, it occurred if using a large number
of files.

Consequently, I wrote a little utility 'tcpsort', which although it has its
deficiencies (in memory sort of timestamps which restricts total size of input
files, and two passes thru the input files) works for the purpose of stitching
multiple pcap files together in timestamp sorted order. I can post if if
there's interest.

Dear Jim/Vern,

Sorry for the delayed answer. I found that ipsumdump has problems with some specific files no matter the number of pcap files, but, of course using a large amount of input files increase the possibilities of having problems ( unfortunately I cannot figure out the reason). I tried to use tcpslice instead, but my server crash twice apparently due to tcpslice trying to merge 300 files.
I couldn’t test it again to avoid problems.
Any help is welcome, but it doesn’t seem timestamp order is the problem for my case.
My goal is to provide BRO with enough input data for recognizing complete connections, detect protocols and avoid any weird activity due cause by split connections among several pcap files.

Thank you,

Veronica Estrada
Nakao Laboratory - Network Systems Research Group
University of Tokyo

I suppose this means that you don't know of any specific differences in the problematic trace files?

  .Seth

Actually, this problem is more related to ipsumdump. However, it can
affect BRO input, thus I briefly explain my founds and we can discuss
further details by e-mail. I've just tested ipsumdump with different
traces. I used Ipsumdump 1.78 (libclick-1.7.0) on Fedora 8.

Using wireshark I saw that my files contain some malformed packets,
particularly packets for Ethernet and FC (Fibre Channel) protocols.

I found that FC malformed packets are not a problem for ipsumdump.
But, in the case of Ehernet malformed packets, ipsumdump cannot handle
files that contains this type of malformed packets correctly. I
corroborated my experiments with tcpslice that it can deal with them.

The situation may be a problem if the user doesn't notice the presence
of Ethernet malformed packets and ipsumdump is used in quiet mode
inside a script, since no error messages are printed. At first, I
noticed the problem in the progress bar printed by ipsumdump, because
the progress bar split into several partial bars and eventually reach
100%. The bar does not split when using input files that don't contain
ETH malformed packets . A user can check the size of the output file
but recognizing the error in this way may be subtle because size can
be different if the input pcap files are overlapped.

A good thing about ipsumdump is that it can deal with a terabyte
output and hundreds of input files. On the other hand, when I use
tcpslice, the server crashed (probably because of the tcpslice
process).

Veronica Estrada
Nakao Laboratory - Network Systems Research Group
University of Tokyo