Traffic Volume Calculation Using Bro's Connection Log

Hi,

I’m facing a small problem when running Bro. I’m trying to calculate the volume of traffic generated per host. I have a set of pcap files, each containing traffic from a single host. I thought I could run Bro on each pcap file, and then sum the orig_bytes and resp_bytes columns in conn.log to get the total volume of traffic for one host. However when I run Bro on a 250 MB pcap file, the sum of these two columns is only 107 MB approximately, and not 250 MB as I expected. Is there any alternate method for calculating the volume of traffic generated by one host?

Here’s the script I ran to get the sum:
cat conn.log | awk ‘BEGIN{FS="\t"; count=0;} {count=count+$10; count+=$11} END {print count;}’

This was the output of the script (which I expected would be 250 MB instead):
107790112 bytes

It would be great if you could help me resolve this issue!

Thank you,
Zainab

I thought I could run Bro on each pcap file, and then sum the orig_bytes and resp_bytes columns in conn.log to get the total volume of traffic for one host. However when I run Bro on a 250 MB pcap file, the sum of these two columns is only 107 MB approximately, and not 250 MB as I expected.


It's a matter of overhead and unmeasured data. The orig_bytes and resp_bytes is only counting payload bytes so all of the headers (i.e. tcp, udp, icmp, ip, ethernet, etc) are not counted. Also, if you have any packet types that we don't support those won't be counted either. There is also some amount of overhead inherent in PCAP.

Is there any alternate method for calculating the volume of traffic generated by one host?


You are going to need to be more specific about what you are looking for.

  .Seth

Hi Seth/John,

Thank you for your responses. I have a follow up question.

Here’s a quick recap of what I need to do: I want to use Bro to calculate the total volume of traffic captured in a pcap file, including all headers up to (and including) Ethernet headers.

Following your suggestion to sum the resp_ip_bytes and orig_ip_bytes columns of the
conn.log generated over the trace, I now use this script to calculate the volume:

cat conn.log | awk ‘BEGIN{FS="\t"; count=0;} {count=count+$17; count+=$19} END {print count}’

I’ve tried running this over conn logs generated from two different pcap files. In both cases, I get a count that is smaller than the size of the pcap file. Which is fine, because like you said, ethernet headers and pcap headers are still not included. The problem is that when I ran this script on the conn.log generated over a 500 GB trace, the output was 376 GB. If I calculate the total Ethernet header size (assuming 14 bytes per packet) AND the total pcap header size (assuming 16 bytes per packet) for that trace, it comes to around 21 GB. That means (500-376-20 =) ~ 104 GB is still unaccounted for. I’m trying to understand why that would be.

Perhaps that is because of packets with unknown transport protocols? You said that packet types that are not supported will also not be included in the byte count. By unsupported packet types, are you referring to connections for which the value of the enum “proto” is “unknown_protocol”? If yes, I’ve seen that conn.log is showing ONLY TCP, UDP, and ICMP in the “proto” field for my 500 GB trace. Does this mean that when the protocol is unknown, the record is not included in conn.log at all? Because that’s the only explanation I can think of for the unaccounted 104 GB!

Can you please comment, and also tell me if I’m doing something incorrect, and if so, how I should be calculating the volume instead?

Thankyou

Zainab

You can't do this right now. :slight_smile:

Due to how we handle ethernet headers (and vlan and mpls) that data is just not made available. Additionally, any non-ip traffic will be hard to include in the measurement. What we likely need to do is keep global counters that track the size of data pulled from libpcap. We already have a packet counter for that like this…

resource_usage()$num_packets

I'm not saying that the resource_usage built in function will stay around forever though, it's very possible that we'll reorganize that some in the future.

  .Seth

Try using ipsumdump …

Amir