In a previous email, I asked the question: Does anyone use sumstats outside of a cluster context?
In my case, the answer is yes, as I perform development on a laptop, and run bro standalone to test new bro policies.
I found several different bugs in share/bro/base/frameworks/sumstats/non-cluster.bro, specifically SumStats::process_epoch_results
- The policy returns sumstats results 50 at a time, and then reschedules itself for the next 50 after 0.01 seconds… Unfortunately, the reschedule is:
schedule 0.01 secs { process_epoch_result(ss, now, data1) };
instead of:
schedule 0.01 secs { **SumStats::**process_epoch_result(ss, now, data1) };
so it silently fails after the first 50 results. Would be nice to have a warning if a script schedules an event that doesn’t exist.
2.The serious issue with the policy, though, is that the ‘for’ loop over the result table is the main loop, with up to 50 items processed and deleted within the loop, the expectation being that the iteration will not thus be disturbed.
The attached program (hash_test.bro) demonstrates that this is not the case (should output 1000, 0, but the 2nd value comes back non-zero), in line with the documented caveat: “Currently, modifying a container’s membership while iterating over it may result in undefined behavior, so do not add or remove elements inside the loop.” I didn’t examine bro source code to appreciate the reason, but surmise that table resizing and rehashing would account for the issue.
The consequences of this issue are that, under certain circumstances:
- Not all data will be returned by SumStats at the epoch
- SumStats::finish_epoch may not be run.
To address the issue can be done via a rearrangement of the code, along the lines of the following pseudocode (boundary conditions, etc. ignored)
original (modifies table inside ‘for’ loop):
i=50;
for (foo in bar)
{
process bar[foo];
delete bar[foo];
–i;
if (i == 0) break;
}
to (modifies table outside ‘for’ loop):
i=50;
while (i >0)
{
for (foo in bar)
{
break;
}
process bar[foo];
delete bar[foo]
–i;
}
… there are a few other subtleties in the code (keeping a closure on the result table so that sumstats can clear the original table & proceed to the next epoch, and not running SumStats::finish_epoch if the result table was empty to begin with).
A bit of rearrangement fixes the bugs while preserving the original behavior, with the help of a wrapper event that checks for an empty result table, and if not makes an explicit copy for further processing by the actual event doing the work.
An additional ‘for’ loop around the result table could be used to keep it all in one event, but looks too much like black magic (and still, albeit probably in a safe way, depending on undefined behavior) - I prefer clear, understandable code (ha!), rather than “dark corner” features. Six months later, when I look at the code, I won’t be able to remember the clever trick I was using
Attached please find hash_test.bro & (patched) non-cluster.bro
Jim Mellander
ESNet
hash_test.bro (183 Bytes)
non-cluster.bro (2.15 KB)