processing many files with bro

Hello,

I am processing several hours of captured traffic split into pcap files that covers 1 minute traffic each. Actually I am having this basic script to do that.

#!/bin/bash
path=("$@")
for f in $(ls $path);do
export BRO_LOG_SUFFIX=$f;
/usr/local/bro/bin/bro -r $path/$f brolite mysite
done

But my goal is that bro recognize connections that could be split in several files. I am thinking that one solution is to modified some variables and make them “persistent”. Is it correct? Which variables should I modified?

The other solution. I know that split pcap files can be merged in one bigger file, but I will have problems with memory, and bro may crash if it has a limitation for processing big size pcap file. So I am not considering this option.

Best regards!

Veronica Estrada
Nakao Laboratory
The University of Tokyo

I would go for this option. Bro *shouldn't* have memory problems as long as you are expiring all of the state that is accumulated often enough. When you run against the large tracefile, make sure you load the "profiling" script so you can see how much memory your various global variables are holding, that should tease out any variables which you may need to tune to reduce memory usage.

Personally, I've processed a single multi-hundred gig tracefile with a single Bro instance on a machine with 512 megs of memory and didn't encounter any trouble.

   .Seth

Veronica Estrada wrote:

Hello,

I am processing several hours of captured traffic split into pcap files
that covers 1 minute traffic each. Actually I am having this basic
script to do that.

#!/bin/bash
path=("@"\) for f in (ls $path);do
export BRO_LOG_SUFFIX=$f;
/usr/local/bro/bin/bro -r $path/$f brolite mysite
done

But my goal is that bro recognize connections that could be split in
several files. I am thinking that one solution is to modified some
variables and make them "persistent". Is it correct? Which variables
should I modified?

I would rather try to write a pcap application that removes the pcap file
headers from a set of input, and that have this application read the files one
by one and pipe the output to bro.

The other solution. I know that split pcap files can be merged in one
bigger file, but I will have problems with memory, and bro may crash if
it has a limitation for processing big size pcap file. So I am not
considering this option.

I am not aware of any problems of Bro reading huge input file. We are
operationally using Bro and have instances analyze Terabytes of traces in one
run. But of course the more data you put in the more state might be built up.

   best
   Fabian

That's probably the best solution and you can do it on the fly: have
your merge tool (e.g., tcpslice) write to stdout and Bro read from
stdin with "-r -". The effect on memory will indeed be that of one
large pcap file but if that causes trouble, you should to tweak the
Bro configuration.

Using &persistent is unlikely to do what you want as it stores only
script-level state, not internal state for connections that cross
file boundaries.

Robin

Thanks everyone for the answers,

My original question was connected with a second problem. I am trying to associate a summary of wrong fragments to the corresponding line in the connection summary.

To avoid the same connection becoming split and analyzed in different bro runs, I will go for second option as you suggested me. After that, I will have the majority of connections summarize in the same conn.bro file. But after solving this, I am still confused about how to associate the wrong fragment count with its corresponding connection logged in conn.bro

To my understand, wrong fragments are generated in the flow_weird event and they don´t have associated a c$id, only src and dst address.

My questions:

  1. How can I check the connection that generated that wrong fragment event?
  2. Should I assign the fragment to the last connection registered in the conn.bro who has connection initiation time before the fragment I want to count? I don´t think this is enough. For instance, if two different connections between A-B are active I cannot distinguish them.

Besides, I read about active and pasive timeouts on connections (Flow-based TCP Connection Analysis by Limmer and Dressler).
I don´t understand how this topic is treated in BRO. Since I can only find only one type of timeout (tcp_inactivity_timeout). Is this timeout the active timeout? I think probably there are others timeout such as handshake timeouts that I am missing.

Maybe I am getting into the details of bro design, I want to understand what I am doing, and what I shouldn´t do to get the wrong fragment count inside the conn.bro file.

Sorry, maybe I should open another thread with this e-mail. I was not sure how to deal with it.

Veronica Estrada
Nakao Laboratory
The University of Tokyo

Yet another tool:

% ipsumdump --collate -w - *.pcap | bro -r - http-request etc

The switch --collate ensures monotone timestamps.

   Matthias

Yet another tool:

% ipsumdump --collate -w - *.pcap | bro -r - http-request etc

The switch --collate ensures monotone timestamps.

Yeah, indeed that's a bit better than tcpslice, because ipsumdump will
correctly collate traces that overlap in time, while IIRC tcpslice won't.

    Vern

Sorry, I couldn’t make it work.

ipsumdump --collate -w *.pcap | $BROHOME/bin/bro -r - brolite mysite
/usr/local/bro-1.5-dep/bin/bro: problem with trace file - - truncated dump file; tried to read 24 file header bytes, only got 0

Veronica

ipsumdump --collate -w *.pcap | $BROHOME/bin/bro -r - brolite mysite

That needs to be:

  ipsumdump --collate -w - *.pcap | $BROHOME/bin/bro -r - brolite mysite

Unfortunately as you issued it, ipsumdump wrote the output to the first
file in the expansion of *.pcap, overwriting it :-(.

    Vern