Using "dbl" instead of "num" in SumStats

Hi

By default, SumStats will apply calculation on “num” instead of “dbl”. How can I make it apply calculation on dbl instead?

Thanks

Hui Lin

Hi Hugo:

The observation record is defined (share/bro/base/frameworks/sumstats/main.bro) as:

Represents data being added for a single observation.

Only supply a single field at a time!

type Observation: record {

Count value.

num: count &optional;

Double value.

dbl: double &optional;

String value.

str: string &optional;
};

so in SumStats::observe, you would supply the dbl optional value instead of num, e.g.

SumStats::observe("mysumstat", 
	                  SumStats::Key($host=foo), 
	                  SumStats::Observation($dbl=bar));

(don't supply more than 1 optional value).

Hope this helps.  BTW: I'm interested in the uses that folks find for sumstats.  Care to comment on your use case?

Jim

Hi Jim,

Thanks for the help. It seemed that I made a stupid mistake. I did exactly what you suggested replacing dbl with num in the observation. However, when I copy the print fmt function from the example into the call back function, I forget to let it print more effective decimals bits. So I always obtain 0, which makes me think that the “observer” is still using “num”. Hope that this record can help others who want to use double type instead of count types in SumStats.

Yes, as you may know, I contributed DNP3 analyzer in Bro with Robin and Seth. So I still use Bro to measure network traces related to DNP3 network packets, related to my research work. At first, I was a little bit of daunting of using SumStats, but it turns out to be very easy. I just use application layer event to calculate round trip time between DNP3 request and responses and trigger SumStats::observe event to record the latency. (to calculate goodput). Then I just calculate the average and standard deviation. RTT is very basic network measurement, so I find SumStats very useful.

May I suggest a few things in SumStats? Maybe I missed something, I don’t know how to directly obtain the number of data recorded in SumStats, so I need to declare another global variable to record that. It will be useful that we can directly know how many data are recorded by far. The reason that I need the number of records is to calculate the 95% or 99% confidence interval. It will be great that we can include them directly in SumStats as well.

Best,

Hugo

Hi Hugo:

May I suggest a few things in SumStats? Maybe I missed something, I don't know how to directly obtain the number of data recorded in SumStats, so I need to declare another global variable to record that. It will be useful that we can directly know how many data are recorded by far. The reason that I need the number of records is to calculate the 95% or 99% confidence interval. It will be great that we can include them directly in SumStats as well.

Each result record returned to epoch_result has a ‘num’ field, which is a count of the number of observations that made up that result - is that what you’re looking for? If you’re looking for a grand total of observations, I suppose they could be totalled up from the result records.

Take care,

JIm

Hi Jim,

I think ‘num’ field seemed like what I am looking for. However, when I tried, it is different from the count that I manually made. Here is the codes that I used to count. As you can see, what I try to is easy, whenever, an observation is received, I increase the value of a global value. However, when I print out through epoch call back function, the value is different from one in ‘num’.

if (…)
{

total_res = total_res + 1;
SumStats::observe(“dnp3 rtt”, SumStats::Key(), SumStats::Observation($dbl=latency));
}

Best,

Hugo

Hmmm, that SumStats::observe line doesn’t seem quite correct. Generally, observations are in the form:

SumStats::observe(“foo”, [$host=bar], [$dbl=val]);

Assuming what you sent was just a typo, it would be interesting to know whether the same behavior is seen both in a cluster and standalone, as SumStats uses a different code path for those two cases. If only the cluster gives a different result (likely less than the manual count), then I would be concerned that that not all cluster results are being received by the manager when it composes the results.

Jim

I am afraid that is not a typo. I copy paste from the documentation at https://docs.zeek.org/en/stable/frameworks/sumstats.html#examples. I think what I wrote is consistent with what you provide, instead I directly call Key and Observe constructor for the second and third parameters. I am just using standalone version to analyze a pcap. More interestingly, as the periodic epoch call back function print out, the “num” field of the epoch result can decrease!

Hi Jim,

I think that I finally got it. The code is correct. But my interpretation is not. I think whatever calculation that we apply on observation, e.g., average, sum, is for the data collected within that epoch only. So ‘num’ field is the total number of observation within that period while I record the accumulated total number of observation by far. Originally, I don’t like it as I think that it will be convenient for me to have statistics on all data. However, it does give me some benefits. As I am using very low-end computers and switches for experiments, I can easily tell when the network becomes stable, e.g., having less packet loss, based on the RTT in each epoch.

P.S. as I am working as a faculty now and I have included Bro in my teaching, I think that SumStats is suitable for a class project as well.

Best regards,

Hui Lin