sum stats q.

Hi,
I am trying to use Bro sumstats framework. Based on the examples, I came up with the script shown at the end of the email. In the script, I am counting the number of http requests for each method+uri combination.

As dictated by the framework, I am calling observe for each request. At the end, I expected the total sumstats equal to the number of requests in my pcap. However, this doesn’t seem to be the case. I am trying understand if I made a mistake in how I am using the framework of if something else is going on.

For example, I ran the script on try.bro.org website using the http.pcap available there. Per my analysis, there should be 197 requests in the pcap. However, when I dump each of my stat into a log file, I expected the hits column from the log to add up to 197. However, that’s not the case. Running the script against my own pcap is giving different numbers from what I would expect.

Any help understanding the issue is appreciated… Thanks

Dk.

PS: you can copy paste this script in to try.bro.org website and run it against the http.pcap.

@load base/utils/site
@load base/frameworks/sumstats

module HttpStats;

export {
redef enum Log::ID += { LOG };

type Info: record {
ts: time &log;
method: string &log;
uri: string &log;
hits: count &log;
};

global update_http_stats: function(method: string, uri: string);
}

global scount: count = 0;

event bro_init() &priority=5
{
print “Creating HttpStats log stream and HTTP sumstats”;
flush_all();

Create the stream.

Log::create_stream(HttpStats::LOG, [$columns=Info, $path=“http-stats”]);

local r1 = SumStats::Reducer($stream=“http-stats”, $apply=set(SumStats::SUM));

SumStats::create([$name=“http-stats”,
$epoch=5sec,
$reducers=set(r1),
$epoch_result(ts: time, key: SumStats::Key, result: SumStats::Result) =
{
local r = result[“http-stats”];
local host_uri_vec = split_string(key$str, /,/);
local method = host_uri_vec[0];
local uri = host_uri_vec[1];
#local hits = double_to_count(floor(r$sum));
local hits = double_to_count(floor(r$num));

prep the record

local log_rec: Info = [$ts=ts, $method=method, $uri=uri, $hits=hits];
Log::write(HttpStats::LOG, log_rec);
}
]);
}

event bro_done()
{
Reporter::info(fmt(“scount=%d”, scount));
}

function update_http_stats(method: string, uri: string)
{
local key = cat_sep(",", “-”, method, uri);

scount += 1;

count URI hits.

SumStats::observe(“http-stats”, SumStats::Key($str=key), SumStats::Observation($num=1));
}

event http_request(c: connection, method: string, original_URI: string, unescaped_URI: string, version: string)
{
update_http_stats(method, unescaped_URI);
}

Hi!

This is all my fault :disappointed:. Currently trybro limits log output to 200 lines for each file. It shows the first 100 and the last 100. I had always intended on making that more obvious and allowing that '200' parameter to be changed, but forgot all about it. It was mostly done as a performance optimization - the log output can be quite large and the result would either take too long to transfer to the client or the browser would freeze trying to render a table with 20k rows. The good news is that it is already a parameter on the backend, it just needs to be exposed to the api.

If you increase the interval on your script to 500secs that outputs all the records since the total number of rows is just under 200.

If you run it with a local bro binary you should get the output you are expecting as well.

That said.. the script you posted would likely have issues if ran on a cluster. The short time interval combined with the potential for a large number of unique 'keys' in sumstats would cause a large amount of load on the manager. If you're not running it on a cluster on live traffic it should work fine though. If you do want to run that exact analysis on a cluster I can write you a version that uses events directly and would perform a bit better under load.

Hi Justin,
Thanks for responding. My problem is not with try.bro.org but with how sumstats seem to work. I was just using try.bro.org to demonstrate the issue in case someone wanted to try my test.

Hi,

While trying to reproduce your problem I found that this was fixed a few months ago:

I ended up tracking down the root cause only to realize this is already fixed
in 2.6 :slight_smile: Never hurts to practice bro script debugging though. Turns out the old script was deleting entries from a table while iterating over it, which is undefined behavior in bro (and in many other languages).

I have a directory with http.pcap and your script (s.bro)

I run a bro 2.5.5 container and count the results, getting 128 instead of 197.

    justin@mbp:~/b$ docker run -t -i --rm -v `pwd`:/b broplatform/bro:2.5.5
    root@cbd05c9035c3:/# cd /b
    root@cbd05c9035c3:/b# bro -r http.pcap s.bro
    Creating HttpStats log stream and HTTP sumstats
    1320279683.449294 ./s.bro, line 55: scount=197
    root@cbd05c9035c3:/b#
    root@cbd05c9035c3:/b# cat http-stats.log |bro-cut hits | awk '{s+=$1} END {printf "%.0f\n", s}'
    128

Now I do the same test again but using bro 2.6 released yesterday and get the correct result of 197:

    justin@mbp:~/b$ docker run -t -i --rm -v `pwd`:/b broplatform/bro:2.6
    root@869655245d1d:/# cd /b
    root@869655245d1d:/b# bro -r http.pcap s.bro
    Creating HttpStats log stream and HTTP sumstats
    1320279683.449294 ./s.bro, line 55: scount=197
    root@869655245d1d:/b#
    root@869655245d1d:/b# cat http-stats.log |bro-cut hits | awk '{s+=$1} END {printf "%.0f\n", s}'
    197

Thanks for investigating this Justin. I was scratching my head for two days :slight_smile:

Btw, I am using 2.4.1. Since my requirements were very simple, I ended up creating my own table and writing the accumulated counts to the log periodically using the ‘schedule’ primitive. That’s working correctly. Hopefully, I can get rid of that and move to the sumstats version when I upgrade my bro to 2.6.

Thanks again.

Dk.