effects of &synchronized and &mergeable

Hi,

I'm trying to write a script to count how many ICMP Destination
Unreachable messages hosts receive. To do that, I'm thinking of using a
table like the below and incrementing the value for each destination
unreachable message.

global icmp_too_many_destination_unreachable_table: table[addr] of count = {}
        &default=0
        &create_expire=icmp_too_many_destination_unreachable_window
        &synchronized
        &mergeable;

I'm a bit unclear about exactly what &synchronized and &mergeable do
though:

Is increment a single atomic operation or is it implemented as multiple
atomic operations (fetch, locally add one, store, return)? I.e. if two
cluster nodes do ++icmp_too_many_destination_unreachable_table[host] at
the same time for the same host, is the value guaranteed to be
incremented twice? Is it guaranteed that the value returned by the two
increments will be different?

If increment is atomic, is it still atomic when incrementing a default
value? I.e., if a host isn't in the table when two nodes simultaneously
increment its count, is the count always properly set to two? If a host
is in the table and one node deletes it while another node increments
it, is the resulting value always either 0 or 1, or can the value be
old_value + 1? Does it matter if the delete is because of &create_expire
or because of an explicit delete?

Is &mergeable necessary in this case? I couldn't figure out from the
documentation if &mergeable applies to the outer table or to its values
if those values are container types.

global icmp_too_many_destination_unreachable_table: table[addr] of count = {}
        &default=0
        &create_expire=icmp_too_many_destination_unreachable_window
        &synchronized
        &mergeable;

Short version: using &synchronized (wo/ &mergeable, not needed here)
should work but there's a better solution coming up.

Longer version follows.

First, regarding your questions:

Is increment a single atomic operation or is it implemented as multiple
atomic operations (fetch, locally add one, store, return)?

Neither. :slight_smile: What happens is (fetch, locally add one, send "add one"
over the other nodes, store, return). The other nodes receive "add
one" and replay that operation locally. In other words, each node
applies the same operation locally and will eventually reach the same
value (because all will see, e.g., two increments when two nodes do
that simultaneously), but they don't have a consistent view at all
times (because there's a delay in propagating the updates, and no
locking in place that would guarantee global consistency.).

There are a number design decisions/trade-offs behind this scheme; if
you're curious, the details are here: http://www.icir.org/robin/papers/acsac05.pdf

If increment is atomic, is it still atomic when incrementing a default
value? I.e., if a host isn't in the table when two nodes simultaneously
increment its count, is the count always properly set to two?

Yes, likewise because what's actually send is two increments, and both
nodes with still start with the default.

If a host is in the table and one node deletes it while another node
increments it, is the resulting value always either 0 or 1, or can
the value be old_value + 1?

Now it's getting tricky and I'm not quite sure off the top of my head,
but I believe this leads to a race condition and depends on order of
the operations (per the paper linked to above, we deliberately accept
race conditions and do a "best effort synchronization").

Does it matter if the delete is because of &create_expire or because
of an explicit delete?

Does not matter iirc ...

Is &mergeable necessary in this case? I couldn't figure out from the
documentation if &mergeable applies to the outer table or to its values
if those values are container types.

It's the latter. I'm impressed that you even got so far in figuring
that out. :slight_smile:

Second, two more notes:

    - in some sense &synchronized is a legacy mechanism. It works and
      and is supported, but we're moving away from using it. One
      replacement is the new upcoming "metrics framework", which is a
      general mechanism to measure/count "stuff". It will have cluster
      transparency built in that "just works" and should support your
      counting application nicely. Internally that framework sends
      evetns around rather than using &synchronized. It's scheduled to
      be part of Bro 2.2.

    - we have been kicking around the idea of removing &synchronized
      completely. it has a number of drawbacks (the loose semantics
      and race condition; a lack of control for which nodes gets
      updates) and internally it's very complex to implement. The idea
      is to replace it with something simpler but more well-defined
      (like a distributed key-value store) that would be wrapped with
      script-layer frameworks to provide for easy use.

      But that's probably more than you wanted to know. :slight_smile:

Robin

If a host is in the table and one node deletes it while another node
increments it, is the resulting value always either 0 or 1, or can
the value be old_value + 1?

Now it's getting tricky and I'm not quite sure off the top of my head,
but I believe this leads to a race condition and depends on order of
the operations (per the paper linked to above, we deliberately accept
race conditions and do a "best effort synchronization").

Was doing some code skimming and found that the "remote_check_sync_consistency" flag and "remote_state_inconsistency" event might be something that can at least be used to check if an operation has led to inconsistent state.

    Jon

[Taking to bro-dev]

    - we have been kicking around the idea of removing &synchronized
      completely. it has a number of drawbacks (the loose semantics
      and race condition; a lack of control for which nodes gets
      updates) and internally it's very complex to implement. The idea
      is to replace it with something simpler but more well-defined
      (like a distributed key-value store) that would be wrapped with
      script-layer frameworks to provide for easy use.

Seth and I have been mulling over this, and I'd be curious what others
think about this. If we'd remove the &synchronized stuff, we could
throw out a lot of C++-level code and complexity.

A distributed key-value store could probably be implemented simply as
input/output plugins, and with the upcoming sqlite interface we'd get
persistence built in there, too. That generally sounds quite appealing
to me. The main drawback is that I/O capabilities would no longer
directly map to Bro data structures, in particular it's not possible
to keep references within non-atomic data types across the
communication channel. Roughly speaking, we could exchange what we can
currently log, but not more (i.e., no nested records, tables, etc.).
On the other hand we could build script-level frameworks that get some
of that back transparentky by rolling stuff out internally.

We could even go a step further then and send events over that channel
as well. And that in turn might let us eventually remove all the
current communication code and replace with something nicer, maybe
indeed an external library as we've been discussing earlier already.

Robin

That's right but it's also more expensive because Bro then includes
the expected values into the communication. Depends on communication
volume if it matters.

Robin

Seth and I have been mulling over this, and I'd be curious what others
think about this. If we'd remove the &synchronized stuff, we could
throw out a lot of C++-level code and complexity.

Hmmmm. I've always liked that &sychronized gives us a general capability
rather than presupposing the nature of cross-Bro state coordination.
I take it your view is that we now have enough experiences with clusters
to conclude that we aren't making full use of the generality, so we should
consider the maintenance/complexity gains we could achieve by removing it.
Is that the right way to summarize it?

What about for non-cluster distributed deployments? As I understand it,
LBL's "Deep Bro" vision is to coordinate Bros that are analyzing different
traffic streams (and with higher intercommunication latencies between those
Bros). One thing I'm wondering is whether that use-case might still benefit
from more general semantics.

We could even go a step further then and send events over that channel
as well. And that in turn might let us eventually remove all the
current communication code and replace with something nicer, maybe
indeed an external library as we've been discussing earlier already.

Here do you mean essentially do explicit synchronization rather than
implicit? Or do you mean changing the paradigm for how implicit
synchronization works?

    Vern

> If a host is in the table and one node deletes it while another node
> increments it, is the resulting value always either 0 or 1, or can
> the value be old_value + 1?

Now it's getting tricky and I'm not quite sure off the top of my head,
but I believe this leads to a race condition and depends on order of
the operations (per the paper linked to above, we deliberately accept
race conditions and do a "best effort synchronization").

So if I understand correctly, there's a race condition where some nodes
have 0 and some have 1, but none have old_value + 1, right? I think 0
and 1 are close enough for this application that this should be fine.

    - in some sense &synchronized is a legacy mechanism. It works and
      and is supported, but we're moving away from using it. One
      replacement is the new upcoming "metrics framework", which is a
      general mechanism to measure/count "stuff". It will have cluster
      transparency built in that "just works" and should support your
      counting application nicely. Internally that framework sends
      evetns around rather than using &synchronized. It's scheduled to
      be part of Bro 2.2.

Is it usable in a testing environment yet? Is the interface with
external scripts mostly stable? Where do you recommend I start reading
(code or documentation) to learn how to use it?

    - we have been kicking around the idea of removing &synchronized
      completely. it has a number of drawbacks (the loose semantics
      and race condition; a lack of control for which nodes gets
      updates) and internally it's very complex to implement. The idea
      is to replace it with something simpler but more well-defined
      (like a distributed key-value store) that would be wrapped with
      script-layer frameworks to provide for easy use.

It sounds like this is still in design stages, is that right?

      But that's probably more than you wanted to know. :slight_smile:

Not at all, thanks for the explanations!

I take it your view is that we now have enough experiences with clusters
to conclude that we aren't making full use of the generality, so we should
consider the maintenance/complexity gains we could achieve by removing it.

While it's a general mechanism, it comes with its own limitations, in
particular there's no control with whom to synchronize; it's everybody
or nobody. That could be solved in principle but only at the expense
of further complexity.

But the real answer is: we aren't making use of &synchronized much
already:

    > grep -R '\&synchronized' scripts/
    scripts/policy/protocols/conn/known-hosts.bro: global known_hosts: set[addr] &create_expire=1day &synchronized &redef;
    scripts/policy/protocols/conn/known-services.bro: global known_services: set[addr, port] &create_expire=1day &synchronized;
    scripts/policy/protocols/ssl/known-certs.bro: global certs: set[addr, string] &create_expire=1day &synchronized &redef;
    scripts/policy/protocols/ssl/validate-certs.bro: &read_expire=5mins &synchronized &redef;
    scripts/policy/protocols/ssh/detect-bruteforcing.bro: &read_expire=guessing_timeout+1hr &synchronized &redef;
    scripts/base/frameworks/software/main.bro: &synchronized

(Note that all but one are in the optional "policy" set).

In other words, we are already implementing cluster synchronization
with events, not &synchronized.

There's a conceptual change with 2.0 that makes &synchronized less
useful. Originally the attribute was meant for the user: by simply
attaching &synchronized to a table, things get taken care of. The new
2.0 frameworks however work at a higher level, with their own APIs
already hiding clusterization transparently internally. With that, the
focus is shifting from what helps the user to what helps the
frameworks.

That along with the just "best effort" semantics of &synchronized and
its internal complexity leaves me wondering if the better long-term
strategy is something else.

What about for non-cluster distributed deployments? As I understand it,
LBL's "Deep Bro" vision is to coordinate Bros that are analyzing different
traffic streams

That's exactly where the current &synchronized becomes hard to use
because you can't select what state to exchange between which parts of
the deep-bro setup; the one-set-of-state-for-all doesn't really apply
anymore there.

One thing I'm wondering is whether that use-case might still benefit
from more general semantics.

I'm thinking to take out some of the generality that &synchronized
provides, but in return add some new flexibility/capabilites that we
currently don't have (better semantics, sharing of subsets of state,
persistence that's closely tied in).

Here's some further thoughts (mine; don't know if this aligns with what
Seth wants ...)

I like the idea of having a transparent key-value store that's both
distributed and persistent. Scripts get an API to insert/delete value
indexed by strings and Bro guarentees that it will show up everywhere
(we might even be able to do some strict form of global consistency
here; not sure). The master node keeps a persistent copy on disk that
survives restarts. Other frameworks can then use this new API to
distribute/store state.

Actually it wouldn't be a single key-value store but scripts should be
able to create new, separate ones on demand. And they can specify with
which nodes to sync each with; or maybe other nodes could subscribe to
individual stores by their name. Maybe lets call the stores "views".
For example, in a tiered deep-cluster, a set of nodes monitoring a
subnet could use their own view that's not propagated to those for
other subnets (and we could extend that mechanism to events to share
them more selectively as well).

Here do you mean essentially do explicit synchronization rather than
implicit?

Yes, in terms of mechanism. However for most users it would still be
transparent as long as they use the standard frameworks. And if they
don't, they'd at least get a very intuitive/familiar key-value data
model.

Just brainstorming,

Robin

I like the idea of having a transparent key-value store that's both
distributed and persistent.

Oh, this might work and have the additional benefit of being very simple and easy to implement and remember how it works. We'd basically be severely restricting what people can do so that we can do more stuff automatically.

Actually it wouldn't be a single key-value store but scripts should be
able to create new, separate ones on demand. And they can specify with
which nodes to sync each with; or maybe other nodes could subscribe to
individual stores by their name. Maybe lets call the stores "views".

I like this and it could be the first step toward the data distribution and persistence framework (data framework?) we were talking about. So far I had been having a hard time figuring out what this would look like but I was probably trying to make it too complicated too. If I think within the boundaries you are laying out in the proposal, I can imagine creating everything I want in the scripting land.

I like it so far, I'll have to do a bit more thinking and maybe some example scripting to find edge cases where it might not work or be particularly burdensome.

  .Seth

While it's a general mechanism, it comes with its own limitations ...

Ah, I see. Thanks for sketching this. What you & Seth frame seems
then like a reasonable approach to me.

    Vern

So if I understand correctly, there's a race condition where some nodes
have 0 and some have 1, but none have old_value + 1, right? I think 0
and 1 are close enough for this application that this should be fine.

I'd describe it as they all have "old_value + x" but for a while that
might be different "x" per node. They all converge on the right
"new_value" soon though, with "new_value = old_value + <all

""

Is it usable in a testing environment yet? Is the interface with
external scripts mostly stable?

It looks like we're going to do one more iteration on the API exposed
to external scripts, likely soon. The current code is in
topic/seth/metrics-merge.

It sounds like this is still in design stages, is that right?

yeah, actually it's even still the "is this right way to go forward?"
stage ... :slight_smile:

Robin

I've put some thoughts together here:

    http://www.bro-ids.org/development/projects/comm-ng.html

Still quite rough.

Robin