Help with searching logs

Im still learning, so bare with me. I ran the following command:

bro-cut id.orig_h orig_bytes < conn.log             \
    > sort                                          \
    > awk '{ if (host != $1) {                      \
                 if (size != 0)                     \
                     print $1, size;                \
                  host=$1;                          \
                  size=0                            \
              } else                                \
                  size += $2                        \
            }                                       \
            END {                                   \
                if (size != 0)                      \
                     print $1, size                 \
                }'                                  \
    > sort -rnk 2                                     \
    > head -n 10

This worked well to show me the top 10 hosts (originators). What Im trying to do is show the top 10 hosts and the time (ts). Maybe show the resp_bytes field too, if that is possible. Any help would be greatly appreciated.

Thanks!

Im still learning, so bare with me. I ran the following command:

bro-cut id.orig_h orig_bytes < conn.log \
    > sort \
    > awk '{ if (host != $1) { \
                 if (size != 0) \
                     print $1, size; \
                  host=$1; \
                  size=0 \
              } else \
                  size += $2 \
            } \
            END { \
                if (size != 0) \
                     print $1, size \
                }' \
    > sort -rnk 2 \
    > head -n 10

This worked well to show me the top 10 hosts (originators). What Im
trying to do is show the top 10 hosts and the time (ts). Maybe show
the resp_bytes field too, if that is possible. Any help would be
greatly appreciated.

Thanks!
- --

Mike

You cut those fields out when you did your bro-cut. The id.orig_h and orig_bytes are both fields in the log. If you want other fields, you just add those to your list. E.g.:
bro-cut ts id.orig_h orig_bytes resp_bytes < conn.log ….etc

-Sam

I had tried that, but it only outputs the ts field. I need something
else in that awk statement, just don't know what.

The script is lying to you. Here's the correct script:

bro-cut id.orig_h orig_bytes < conn.log \
    > sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n \
    > awk 'BEGIN { size=0;host="" } \
           { if (host != $1) { \
                 if (size != 0) \
                     print host, size; \
                  host=$1; \
                  if ($2 != "-") \
                     size=$2 \
                  else \
                     size=0 \
              } else \
                  if ($2 != "-") \
                     size += $2 \
            } \
            END { \
                if (size != 0) \
                     print host, size \
                }' \
    > sort -rnk 2 \
    > head -n 10

Since this script summarizes, having timestamps in there would not be useful. And, if you want to change the logic to responder, change "orig" to "resp" in the first line. Yes, it would be nice to have the top conversations, not just the top talkers, which would combine both orig and resp, but I'm not sure the result would justify the work.

A few months ago I went through this and we were hoping that the doc would have been changed to show a correct script, but it has not been, apparently.

Hm, I get a syntax error in that script now. Let me figure this out...

Seems as though an awk update tightened some of the syntax restrictions. This script works:

bro-cut id.orig_h orig_bytes < conn.log \
    > sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n \
    > awk 'BEGIN { size=0;host="" } \
           { if (host != $1) { \
                 if (size != 0) \
                     print host, size; \
                  host=$1; \
                  if ($2 != "-") { \
                     size=$2 } \
                  else { \
                     size=0; } \
              } else { \
                  if ($2 != "-") \
                     size += $2 } \
            } \
            END { \
                if (size != 0) \
                     print host, size \
                }' \
    > sort -rnk 2 \
    > head -n 10

The conversations could be done by a script that takes conn.log as input, merges orig_h and resp_h into one field while adding together their bytes, sorting on that, then putting the result through the same awk script. The output might look something like this:

1.2.3.4|5.6.7.8 123456789
...

depending on what you choose for your conjoining character. Also, note that if there are multiple conversations between two IP addresses then you will be adding up all those conversations and presenting the result as one line - that might not be exactly what you are looking for.

Welcome to the world of IDS data mining. Sharpen your awk/sort/sql/perl/bash skills - they will come in very handy.

Thanks! This will give me something to go on.

Just for grins, I worked on the conversation angle a bit. Here's what I came up with as a first hack.

$ bro-cut id.orig_h id.resp_h orig_bytes resp_bytes </nsm/bro/logs/current/conn.log | awk -f convlist.awk | sort -rnk 2 | head -20

convlist.awk:
{
tot=0;
if ($3 != "-") {tot=$3};
if ($4 != "-") {tot+=$4};
print $1".."$2, tot;
}

Output:
166.213.168.9..192.168.11.30 6369575986
166.213.168.8..192.168.11.30 4304514612
192.168.61.21..193.120.199.16 4294967294
70.208.5.91..192.168.13.45 3532074861
192.168.56.166..23.7.65.224 3067067778
166.213.168.22..192.168.11.30 1981760165
174.232.193.204..172.31.251.32 1934543198
166.213.168.22..192.168.11.30 1853700164
166.213.168.22..192.168.11.30 1848753206
166.213.168.22..192.168.11.30 1757683563
166.213.168.22..192.168.11.30 1706006024
166.213.168.22..192.168.11.30 1657117082
166.213.168.22..192.168.11.30 1612028622
166.213.168.22..192.168.11.30 1607600859
166.213.168.22..192.168.11.30 1584767941
166.213.168.22..192.168.11.30 1543389093
166.213.168.22..192.168.11.30 1533513278
166.213.168.22..192.168.11.30 1467372828
166.213.168.22..192.168.11.30 1254028676
192.168.56.166..23.7.65.224 1214734460

I used ".." to join the two IP addresses together so if I wanted to I could extract and sort without changing the original sort command very much. Essentially, this shows the top 20 separate conversations since the current conn.log was started. Notice that several of them have the same endpoints. If I start summarizing using "sort -t '.' <etc>", then I get

$ bro-cut id.orig_h id.resp_h orig_bytes resp_bytes </nsm/bro/logs/current/conn.log | awk -f convlist.awk | sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n -k 6,6n -k 7,7n -k 8,8n -k 9,9n | awk -f toptalk.awk | sort -rnk 2 | head -20
166.213.168.22..192.168.11.30 45782918899
166.213.168.9..192.168.11.30 14376535154
192.168.56.166..23.7.65.224 9097513513
172.31.251.13..218.30.26.68 5826064097
172.31.251.171..192.168.14.81 4321601956
166.213.168.8..192.168.11.30 4304514612
192.168.61.21..193.120.199.16 4294967294
70.208.5.91..192.168.13.45 3532111771
192.168.21.103..23.7.65.224 3059490250
166.213.168.30..192.168.11.30 2285013919
166.137.181.202..192.168.13.45 2083142285
174.232.193.204..172.31.251.32 1934845607
172.31.251.13..218.30.26.70 1198383801
192.168.226.96..72.21.81.253 1175467069
161.97.154.10..172.31.251.32 1137167682
174.239.96.134..192.168.13.45 920925068
192.168.171.25..204.107.64.58 488202352
172.31.251.71..192.168.12.116 477085260
192.168.6.97..74.125.225.161 459930229
172.31.251.13..192.168.12.73 412551956

So, it can be done. Interesting exercise. Note that the "toptalk.awk" script referenced in the command is simply the awk script from before (see below in this email thread), created as a separate file.

Just so you know, the third line from the top in the output is to a Twitter address. I need to find out wtf is going on there. 9GB from/to Twitter? Really?

No, not really. I discovered that Bro is not reporting the bytes correctly. That makes these numbers totally suspect. I wonder what's going on there.

scastle@nsm:~/scripts$ zcat /nsm/bro/logs/2013-04-03/conn.19:00:00-20:00:00.log.gz | bro-cut -d ts id.orig_h id.resp_h orig_bytes resp_bytes | grep 192.168.56.166.23.7.65.224
2013-04-03T19:21:28+0000 192.168.56.166 23.7.65.224 0 1200242775
2013-04-03T19:21:28+0000 192.168.56.166 23.7.65.224 0 1214734459
2013-04-03T19:21:33+0000 192.168.56.166 23.7.65.224 540 3920
2013-04-03T19:21:51+0000 192.168.56.166 23.7.65.224 0 1214734460
2013-04-03T19:21:40+0000 192.168.56.166 23.7.65.224 0 1200242776
2013-04-03T19:21:28+0000 192.168.56.166 23.7.65.224 1112 1200442670
2013-04-03T19:21:33+0000 192.168.56.166 23.7.65.224 1270 4325
2013-04-03T19:24:42+0000 192.168.56.166 23.7.65.224 0 0
2013-04-03T19:24:41+0000 192.168.56.166 23.7.65.224 0 0
2013-04-03T19:24:46+0000 192.168.56.166 23.7.65.224 572 3920
2013-04-03T19:24:47+0000 192.168.56.166 23.7.65.224 572 3920
2013-04-03T19:24:42+0000 192.168.56.166 23.7.65.224 1105 405
2013-04-03T19:24:41+0000 192.168.56.166 23.7.65.224 504 26430
2013-04-03T19:21:28+0000 192.168.56.166 23.7.65.224 0 3067067778

Looks like resp_bytes is not being properly shown sometimes. Hmm, missed_bytes seems to be large here, too. Sigh - I still don't know what's going on. If missed_bytes is nonzero, the orig and resp bytes can't be trusted. More work and research.

Here's the unfiltered output:
scastle@nsm:~/scripts$ zcat /nsm/bro/logs/2013-04-03/conn.19:00:00-20:00:00.log.gz | bro-cut -d | grep 192.168.56.166.\*23.7.65.224
2013-04-03T19:21:28+0000 YTOQlbbx1Pg 192.168.56.166 64878 23.7.65.224 80 tcp - 6.655300 0 1200242775 SF T 1200242775 ShhhhAFf 14 788 8 472 (empty)
2013-04-03T19:21:28+0000 bLfZjXnY29h 192.168.56.166 64888 23.7.65.224 80 tcp - 6.284761 0 1214734459 SF T 1214734459 ShhhAhFfr 15 840 10 572 (empty)
2013-04-03T19:21:33+0000 zs1xf4L3iZ6 192.168.56.166 64906 23.7.65.224 443 tcp ssl 5.754741 540 3920 SF T 0 ShADadFfRR 16 1372 12 4568 (empty)
2013-04-03T19:21:51+0000 ramah8M2Oc1 192.168.56.166 64888 23.7.65.224 80 tcp - 24.008354 0 1214734460 RSTR T 0 hArR 2 92 3 160 (empty)
2013-04-03T19:21:40+0000 JObupHqiDjl 192.168.56.166 64878 23.7.65.224 80 tcp - 36.214781 0 1200242776 RSTR T 0 hArR 3 144 5 260 (empty)
2013-04-03T19:21:28+0000 XkrirkTDGzk 192.168.56.166 64876 23.7.65.224 80 tcp http 113.531073 1112 1200442670 SF T 1200442265 ShhhADadhfF 20 2212 15 1273 (empty)
2013-04-03T19:21:33+0000 PtKQpijzTff 192.168.56.166 64905 23.7.65.224 443 tcp ssl 116.851227 1270 4325 SF T 0 ShADadFfRR 18 2206 13 5025 (empty)
2013-04-03T19:24:42+0000 tAwhU1nQ8e8 192.168.56.166 65044 23.7.65.224 80 tcp - 7.372079 0 0 SF T 0 ShAFfar 6 324 5 268 (empty)
2013-04-03T19:24:41+0000 wWH8h7RtYg6 192.168.56.166 65003 23.7.65.224 80 tcp - 8.491360 0 0 SF T 0 ShAFfar 6 324 5 268 (empty)
2013-04-03T19:24:46+0000 lsIG9TsJ0Ti 192.168.56.166 65081 23.7.65.224 443 tcp ssl 6.057423 572 3920 SF T 0 ShADadFfR 18 1532 12 4584 (empty)
2013-04-03T19:24:47+0000 TXZBimJtLN6 192.168.56.166 65083 23.7.65.224 443 tcp ssl 5.903914 572 3920 SF T 0 ShADadFfR 18 1532 12 4584 (empty)
2013-04-03T19:24:42+0000 qtKgUBFlRNa 192.168.56.166 65041 23.7.65.224 80 tcp http 96.772052 1105 405 SF T 0 ShADadfF 8 1533 7 1587 (empty)
2013-04-03T19:24:41+0000 KmKMWDZ7Kk2 192.168.56.166 65001 23.7.65.224 80 tcp http 104.511714 504 26430 SF T 0 ShADadfF 18 1452 24 29134 (empty)
2013-04-03T19:21:28+0000 R4eSKC00lqe 192.168.56.166 64879 23.7.65.224 80 tcp - 46.186418 0 3067067778 SF T 0 ShhAFfarR 17 932 15 872 (empty)

This may be unrelated (or even something I'm doing terribly wrong), but I noticed what I think is a similar issue when working with the capture-loss script that led me to believe that the underlying code in Bro may not be properly dealing with it when the 32 bit sequence number in a TCP packet rolls over. I spoke briefly to Seth about it, and he asked me to provide the .bro code I was using as well as a pcap demonstrating my concerns -- I failed to do either. The modified capture_loss script I'm running results in:

#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path capture_loss
#open 2013-04-03-21-00-08
#fields ts ts_delta peer gaps acks percent_lost gap_bytes ack_bytes percent_bytes_lost
#types time interval string count count string count count string
1365022809.033023 60.000027 worker-4-4 3 170598 0.002 4239898009 4652853950 91.125
1365022809.059350 60.000047 worker-1-1 0 129786 0.000 0 301416506 0.000
1365022808.638408 60.000054 worker-2-1 1 143234 0.001 1611842887 1956581780 82.381
1365022808.885606 60.000046 worker-3-4 1 144122 0.001 1350348978 1680367485 80.360
1365022809.328350 60.000060 worker-3-3 0 148226 0.000 0 364927843 0.000
1365022809.292290 60.000081 worker-4-2 1 118706 0.001 1609627515 1900369040 84.701

I'll try to throw something together and get it submitted.

-- KS

Keith Schoenefeld
Information Security Analyst
Baylor University
254-710-6667


The orig_bytes and resp_bytes fields can still be trusted even with missed_bytes being something greater than zero. Those two fields are calculated with TCP sequence number counting and so they can cope with packet loss.

  .Seth

This is what I get for not reading the whole email. Bro has/had an issue with middle boxes sending RST packets to kill TCP connections (great firewall of China being a primary offender) where it would use the sequence number from the RST packet instead of the sequence number from the initial syn or syn-ack. It resulted in these connections like you have here with very few packets and huge data sizes. It's fixed in master and if you want more context to the problem you can refer to the ticket where we tracked the issue and fix:
  http://tracker.bro.org/bro/ticket/730

  .Seth

So, this was not included in 2.1? I'm thinking "git clone --recursive git://git.bro.org/bro.git" will get me a new copy that includes this, right?

Correct :slight_smile:

.Seth