I take it your view is that we now have enough experiences with clusters
to conclude that we aren't making full use of the generality, so we should
consider the maintenance/complexity gains we could achieve by removing it.
While it's a general mechanism, it comes with its own limitations, in
particular there's no control with whom to synchronize; it's everybody
or nobody. That could be solved in principle but only at the expense
of further complexity.
But the real answer is: we aren't making use of &synchronized much
> grep -R '\&synchronized' scripts/
scripts/policy/protocols/conn/known-hosts.bro: global known_hosts: set[addr] &create_expire=1day &synchronized &redef;
scripts/policy/protocols/conn/known-services.bro: global known_services: set[addr, port] &create_expire=1day &synchronized;
scripts/policy/protocols/ssl/known-certs.bro: global certs: set[addr, string] &create_expire=1day &synchronized &redef;
scripts/policy/protocols/ssl/validate-certs.bro: &read_expire=5mins &synchronized &redef;
scripts/policy/protocols/ssh/detect-bruteforcing.bro: &read_expire=guessing_timeout+1hr &synchronized &redef;
(Note that all but one are in the optional "policy" set).
In other words, we are already implementing cluster synchronization
with events, not &synchronized.
There's a conceptual change with 2.0 that makes &synchronized less
useful. Originally the attribute was meant for the user: by simply
attaching &synchronized to a table, things get taken care of. The new
2.0 frameworks however work at a higher level, with their own APIs
already hiding clusterization transparently internally. With that, the
focus is shifting from what helps the user to what helps the
That along with the just "best effort" semantics of &synchronized and
its internal complexity leaves me wondering if the better long-term
strategy is something else.
What about for non-cluster distributed deployments? As I understand it,
LBL's "Deep Bro" vision is to coordinate Bros that are analyzing different
That's exactly where the current &synchronized becomes hard to use
because you can't select what state to exchange between which parts of
the deep-bro setup; the one-set-of-state-for-all doesn't really apply
One thing I'm wondering is whether that use-case might still benefit
from more general semantics.
I'm thinking to take out some of the generality that &synchronized
provides, but in return add some new flexibility/capabilites that we
currently don't have (better semantics, sharing of subsets of state,
persistence that's closely tied in).
Here's some further thoughts (mine; don't know if this aligns with what
Seth wants ...)
I like the idea of having a transparent key-value store that's both
distributed and persistent. Scripts get an API to insert/delete value
indexed by strings and Bro guarentees that it will show up everywhere
(we might even be able to do some strict form of global consistency
here; not sure). The master node keeps a persistent copy on disk that
survives restarts. Other frameworks can then use this new API to
Actually it wouldn't be a single key-value store but scripts should be
able to create new, separate ones on demand. And they can specify with
which nodes to sync each with; or maybe other nodes could subscribe to
individual stores by their name. Maybe lets call the stores "views".
For example, in a tiered deep-cluster, a set of nodes monitoring a
subnet could use their own view that's not propagated to those for
other subnets (and we could extend that mechanism to events to share
them more selectively as well).
Here do you mean essentially do explicit synchronization rather than
Yes, in terms of mechanism. However for most users it would still be
transparent as long as they use the standard frameworks. And if they
don't, they'd at least get a very intuitive/familiar key-value data