Broker data store API

TL;DR:

  - Does anyone use Broker's RocksDB backend?
  - Brief overview of the revamped data store frontend API

I've been working on the Broker data store API a bit, trying to come
with the smallest denominator possible for an initial release. So far I
have ported the in-memory SQLite backend over. This made me wonder: did
anyone ever use (or wanted to use) the RocksDB in production? I wonder
if we can keep it out for Bro 2.5.

Regarding the API, here's a snippet that illustrates the user-facing
parts:

  // Setup an endpoint.
  context ctx;
  auto ep = ctx.spawn<blocking>();

  // Attach a master datastore with backend. The semantics of
  // "attaching" are open-or-create: if a master exists under the
  // given name, use it, otherwise create it.
  backend_options opts;
  opts["path"] = "/tmp/test.db";
  auto ds = ep.attach<master, sqlite>("foo", std::move(opts));
  if (!ds)
    std::terminate();

  // Perform some asynchronous operations.
  ds->put("foo", 4.2);
  ds->put(42, set{"x", "y", "z"});
  ds->remove(42, "z"); // data at key 42 is now {"x", "y"}
  ds->increment("foo", 1.7); // data at key "foo" is now 5.7

  // Add a value that expires after 10 seconds.
  ds->put("bar", 4.2, time::now() + std::chrono::seconds(10));

  // Get data in a blocking fashion.
  auto x = ds->get<blocking>("foo"); // Equivalent to: get("foo"), the
                                     // blocking API is the default.

  // Get data in a non-blocking fashion. The function then() returns
  // immediately and one MUST NOT capture any variables on the stack by
  // reference in the callback. The runtime invokes the callback as soon
  // as the result has arrived.
  ds->get<nonblocking>("foo").then(
    [=](const data& d) {
      cout << "data at key 'foo': " << d << endl;
    },
    [=](const error& e) {
      if (e == ec::no_such_key)
          cout << "no such key: foo" << endl;
    }
  });

Here's another setup with two peering endpoints, one having a master and
one a clone (directly taken from the unit tests). This illustrates how
data stores and peering go hand in hand.

  context ctx;
  auto ep0 = ctx.spawn<blocking>();
  auto ep1 = ctx.spawn<blocking>();
  ep0.peer(ep1);
  auto m = ep0.attach<master, memory>("flaka");
  auto c = ep1.attach<clone>("flaka");
  REQUIRE(m);
  REQUIRE(c);
  c->put("foo", 4.2);
  std::this_thread::sleep_for(propagation_delay); // master -> clone
  auto v = c->get("foo");
  REQUIRE(v);
  CHECK_EQUAL(v, data{4.2});
  c->decrement("foo", 0.2);
  std::this_thread::sleep_for(propagation_delay); // master -> clone
  v = c->get("foo");
  REQUIRE(v);
  CHECK_EQUAL(v, data{4.0});

I think this API covers the most common use cases. It's always easy to
add functionality later, so my goal is to find the smallest common
denominator.

    Matthias

My recollection is that it was just nice-to-have an optional backend that users could choose, perhaps if they need better performance relative to SQLite. But I probably took the time to try and get that working/ready just as reassurance that the datastore API would be able to implement a variety of backends. Not sure about the choice of RocksDB in particular — could have just been that it happened to pop up on people’s radar at the right time.

Given those historical reasons for it existing, would make sense to me if it were temporarily ignored or removed completely (unless there’s people already invested in using it). Hope that helps.

- Jon

Not sure about the choice of RocksDB in particular — could have just
been that it happened to pop up on people’s radar at the right time.

It's certainly an industrial-strength key-value, so I think it's solid
choice for those with better performance when needing persistence.

Given those historical reasons for it existing, would make sense to me
if it were temporarily ignored or removed completely (unless there’s
people already invested in using it).

My plan was to put on hold for now, just to have less moving parts. It's
great that you've already invested the time to understand the API and
come up with an implementation. Same for SQLite. It took me only a day
to convert your backend code and read up on SQLite here and there. I
would imagine it will be the same for RocksDB.

That said, adding backends is fortunately a quite mechanical task. It's
easy to ship as an incremental release. I'm curious to find out what
types of backends they would like to see and use once they build
broker-enabled applications.

    Matthias

I can't speak to whether or not it is 'needed', but I have had desire to use it in the past... The only thing preventing me from doing it was the fact that Broker is currently a fast moving target.

Generally speaking, I was wanting to do it so that I could save state between cluster restarts (specifically for authentication data).

I can't speak to whether or not it is 'needed', but I have had desire
to use it in the past... The only thing preventing me from doing it
was the fact that Broker is currently a fast moving target.

Good to know. Scott Campbell also uses the current version of Broker in
his projects and mentioned the need for a scalable and performing
storage backend.

Generally speaking, I was wanting to do it so that I could save state
between cluster restarts (specifically for authentication data).

How many keys to you anticipate in your data store? And what's the rate
of updates? Any ballpark estimate would be useful here.

Given the interest in a scalable backend, I will bring back support for
a RocksDB backend.

    Matthias

The number of key/values would depend on the scale of the environment in the case of the authentication framework. In my last implementation... it was one record per user/host pair... which could scale into the tens of thousands of key/value pairs pretty quickly. I haven't looked at that stuff in a while since I'm eagerly awaiting your rewrite of the Broker APIs :slight_smile: