[desired broker api as oppose to whats in known-hosts.bro]

SO I came across a sample of Broker-API usage:

when (local res = Broker::exists(Cluster::cluster_store, Broker::data("known_hosts")))
  {
           local res_bool = Broker::refine_to_bool(res$result);
    if(res_bool)
    {
    when ( local res2 = Broker::lookup(Cluster::cluster_store, Broker::data("known_hosts")) )
      {
      local res2_bool = Broker::set_contains(res2$result, Broker::data(host));

      if(!res2_bool)
        {
        Broker::add_to_set(Cluster::cluster_store, Broker::data("known_hosts"), Broker::data(host));
        Log::write(Known::HOSTS_LOG, [$ts=network_time(), $host=host]);
        }
      }
    timeout 10sec
                  { print "timeout"; }
                 }
             }
             timeout 20sec
             { print "timeout"; }

Now this isn't too complicated but I find it cumbersome and one needs to understand execution flow since "when" is involved etc etc etc.

Yeah.. there's a lot of things wrong with how that is being done. There are a few things going on here.

One is that &synchronized is no longer functions. I think we should bring this back, it may not be in the form of &synchronized, but at least some way to create a simple data structure that is automatically kept in sync between nodes.

The other is that the api that known hosts is currently using is too high level:

Broker::exists(Cluster::cluster_store, Broker::data("known_hosts"))
Broker::lookup(Cluster::cluster_store, Broker::data("known_hosts"))
Broker::set_contains(res2$result, Broker::data(host))
Broker::add_to_set(Cluster::cluster_store, Broker::data("known_hosts"), Broker::data(host));

which in english is:

1. see if the known_hosts table exists (why would it not exist?)
2. transfer the entire known_hosts table over from the data node
3. see if it contains host
4. add host if not present

And (due to probably an oversight), it does this twice resulting in the known_hosts table being transferred twice.

This would work a lot better if it keep a persistent copy of the known_hosts set between calls and only updated it from the data node if the host wasn't found. The only downside there is that the entire table is still being copied between nodes instead of just updates.

To accomplish what known hosts really needs, which is just "Have I seen this host before", we could just do something like:

local added = Broker::add_to_set(Cluster::cluster_store, Broker::data("known_hosts"), Broker::data(host));
if(added) {
    # host did not previously exist in the set

The only problem in this case is there is no local cache to prevent the same host from being checked multiple times. That would require a local copy of the set, or like you said, a bloomfilter of sorts (probably one of those reverse bloomfilters that has false negatives but not false positives).

So, for the case of tracking things using a set across the cluster all one needs is a simple function that:

Checks to see if the item is in the local cache or bloom filter
Sends it over to the data node and inspects the response (new or duplicate)

Things get a little more complicated in that I want the ability to scale out the data nodes. So that means the slight variation:

Checks to see if the item is in the local cache or bloom filter
Sends it over to the data node that corresponds to the hash of the item and inspects the response (new or duplicate)

so from a users point of view, the Broker part of the function could just be

if(Broker::check_or_add_to_set("known_hosts", host)) {
    Log::write(Known::HOSTS_LOG, [$ts=network_time(), $host=host]);
}

Another way of writing this, which corresponds to your 'event based' approach is to just have the function instead do:

Check to see if the item is in the local cache or bloom filter
Send an event over to the data node that says a new host was potentially found.

For known hosts purposes, the data node doesn't even need to send anything back to the worker, it can just log it(or not).

It would help me think about this if you could outline some of your use cases for broker stores. I have a good idea of what needs to be done to fix known hosts/services/certs and sumstats/scan detection. But I don't know what things you have in mind :slight_smile:

The first step will be getting this down to (some version of) this
form instead:

    Broker::exists(Cluster::cluster_store, "known_hosts")
    result = Broker::lookup(Cluster::cluster_store, "known_hosts")
    Broker::set_contains(result, host)
    Broker::add_to_set(Cluster::cluster_store, "known_hosts"), host);

This should be relatively straight-forward to achieve once we have the
changes in that Matthias is currently working on.

And hopefully we can then also get rid of the "when" statement for
using these.

As reminder, this is the current proposal for next Broker steps:

    https://www.bro.org/development/projects/broker-lang-ext.html

Whether we want to go further, like with Aashish's &store proposal, we
can see then, I'm still a bit undecided there. Either way, a better
Bro-side UI for Broker is coming, we just need to get the various
pieces in place first that people are working on currently before we
can move forward.

Robin

I'm playing with rewriting some of the scripts for 2.6 right now with events because I think that the broker api is too complicated and could potentially have too many side effects. (like overuse of "when" for instance).

  .Seth

Yeah, that makes sense. We can switch back once we have something
better.

Robin