Sumstats undocumented feature - changing the epoch

I’ve found it convenient to use an undocumented feature of Sumstats: changing the epoch. This comes particularly handy when creating statistics for human consumption, as oftentimes it is useful to synchronize to a logging interval. For example, if hourly stats are desired, it is useful to have a shorter epoch for the original sumstats to align with an hour, then to have subsequent sumstats trigger on the hour.

Researching into this, I realized that the epoch variable can be changed, if the argument to Sumstats::create is a variable, rather than the usual style of an anonymous argument. Then, in epoch_result, or epoch_finished, the timeout for the next epoch can be recomputed on the fly using calc_next_rotate().

However, this fails to work as expected as the next sumstat is scheduled prior to executing epoch_result, and epoch_finished. What does work is the following hack:

  1. Create the initial sumstat with a epoch that will synchronize to the logging interval
  2. Immediately change the epoch to the desired interval

Example:

event bro_init()
{
# So network_time() will be initialized…

schedule 0 usec { setup_sumstat() };
}

event setup_sumstat()
{
… blah …
local mysumstat: SumStats::SumStat;
mysumstat = [
$name=“mysumstat”,
$epoch=calc_next_rotate(10 min) - network_time(),
etc…
];
SumStats::create(mysumstat);
# Now SumStat has been created, and the initial epoch scheduled, change epoch to regular interval for the future

mysumstat$epoch = 10 min;
}

It would be convenient if the epoch could be changed in epoch_result or epoch_finished, but some internals would require a bit of change - the reschedule would need to take place after processing results, which could throw the timing off a bit - on the other hand, unless one is interested in exact statistics over a known time period (as I am), the small amount of jitter probably wouldn’t be noticeable or significant.

The above is horribly hackish, and a different approach for accomplishing the goal would be to allow use scripts to schedule the end of the epoch:

  1. Mark epoch as &optional.

  2. Expose and document SumStats::finish_epoch as part of the public API

  3. Make the minor changes to not schedule SumStats::finish_epoch if epoch is undefined.

By not defining epoch a script would indicate that it will manage epoch timing. The script would schedule the first epoch based on the logging interval, and in the epoch_finished function schedule each successive epoch to stay in sync with the logging interval.

Any comments, suggestions, etc. ???

Jim