Test timing measurements (Re: [Bro-Commits] [git/btest] topic/robin/timing: Adding a timing mode that records test execution times per host. (808fd76))

(Moving from bro-commits to bro-dev).

Instruction counts are probably going to have a strong dependency on
the compiler version / options used to generate the code. I believe
these counts could additionally be influenced by e.g. library
upgrades, even when restricted to a single host and using a specific
compiler / options.

True, but I'm not sure that's necessarily a bad thing. If the count
changes signficantly, it's worth understanding where it's coming from
I would say. btest won't complain as long as deviations are within a
reasonable range (1% by default, don't know if that's the right
value).

I'm also not sure if instruction count is the right feature; there are
plenty others one could measure, like cycles etc. I was just thinking
this might be the most stable.

One alternative approach to tracking IDs for timing baselines might be
to use system tools to gather a list of all libraries bro is linked
against.

A problem with this is that btest doesn't know about Bro. :slight_smile: The way
I'm doing it currently is that instruction count is measured for all
BTEST-EXEC commands that are part of a test, which are then summed up
for a single number. I'd like to keep it the way that btest can
measure arbitrary command lines (which is part of the challenge of
finding a stable way of doing so ...).

Additionally, formatting the temporary file in a human-readable way
and keeping it as part of / in addition to the baseline can yield
potentially useful information when looking into timing differences.

It's, more or less, human-readable:

    > cat Baseline/_Timing/2a6b457d90e93b6688f312f87f677c5c
    tests.m57-long 705347795206
    tests.ipv6 104508274160
    tests.m57-short 68458131160

What I'm mostly wondering about is if it's worth commiting data that's
very specific to a single user/machine to the repos?

Robin

(Moving from bro-commits to bro-dev).

Instruction counts are probably going to have a strong dependency on
the compiler version / options used to generate the code. I believe
these counts could additionally be influenced by e.g. library
upgrades, even when restricted to a single host and using a specific
compiler / options.

True, but I'm not sure that's necessarily a bad thing.

I don't think it's a bad thing either, exactly. When I wrote this, I was thinking that it'd make it a little easier to interpret the results if code changes could be tested independently of system changes ... but you're right that the system *is* going to have an effect on how bro runs and should therefore be included in the benchmark.

I do think it's important to be able to isolate system-level vs. code-level changes, but just storing the commit should be enough to do that. I think this is already happening, so ... looks good to me.

What I'm mostly wondering about is if it's worth commiting data that's very specific to a single user/machine to the repos?

From an academic standpoint, think it could be useful to review these counts just to see how much they actually vary from platform to platform. Also, if the instruction counts *are* pretty consistent cross-platform, then it might e.g. be an indication that we / the compiler aren't taking advantage of some more complex instructions that exist on a particular platform.

Additionally, given the number of people working on the project, this kind of data probably wouldn't take much space. It also seems like it'd be pretty easy to purge this data in the event it didn't end up being useful.

--Gilbert

Yeah, I think you're right, I've commited my baselines for the
external tests now. Thinking of this, maybe btest should also record a
short descripting of the platform along with that.

Robin