[Bro-Commits] [git/bro] topic/actor-system: First-pass broker-enabled Cluster scripting API + misc. (07ad06b)

I would keep the particular data-services scalable but allow the user to specify their distribution across the data nodes. As Jon already wrote, it could look like this (I added Spam and Scan pools):

[data-1]
type = data
pools = Intel::pool

[data-2]
type = data
pools = Intel::pool, Scan::pool

[data-3]
type = data
pools = Scan::pool, Spam::pool

[data-4]
type = data
pools = Spam:pool

However, this approach likely results in confusing config files and, as Jon wrote, it's hard to define a default configuration. In the end this is an optimization problem: How to assign data-services (pools) to data nodes to get the best performance (in terms of speed, memory-usage and reliability)?

I guess there are two possible approaches:
1) Let the user do the optimization, i.e. provide a possibility to assign data services to data nodes as described above.
2) Let the developer specify constraints for the data service distribution across data nodes and automatize the optimization. The minimal example would be that for each data service a minimum and maximum or default number of data nodes is specified (e.g. Intel on 1-2 nodes and Scan detection on all available nodes). More complex specifications could require that a data service isn't scheduled on data nodes together with (particular) other services.

Another thing that might need to be considered are deep clusters. If I remember correctly, there has been some work on that in context of broker. For a deep cluster there might be even hierarchies of data nodes (e.g. root-intel-nodes managing the whole database and 2nd-level-data-nodes serving as caches for worker-nodes on per site level).

Jan

2) Let the developer specify constraints for the data service
distribution across data nodes and automatize the optimization. The
minimal example would be that for each data service a minimum and
maximum or default number of data nodes is specified (e.g. Intel on 1-2
nodes and Scan detection on all available nodes). More complex
specifications could require that a data service isn't scheduled on data
nodes together with (particular) other services.

I like the idea of having some algorithm than can automatically allocate nodes into pools and think maybe it could also be done in a way that provides a sane default yet is still customizable enough for users, at least for the most common use-cases.

It seems so far we can roughly group the needs of script developers into 2 categories: they either have a data set that can trivial be partitioned across data nodes or they have a data set that doesn’t. The best we can provide for the later is replication/redundancy and also giving them exclusive/isolated reign of a node or set of nodes.

An API that falls out from that is:

type Cluster::Pool: record {
  # mostly opaque...
};

type Cluster::PoolSpec: record {
  topic: string;
  node_type: Cluster::node_type &default = Cluster::DATA;
  max_nodes: int &default = -1; # negative number means "all available nodes"
  exclusive: bool &default = F;
};

global Cluster::register_pool(spec: PoolSpec): Pool;

Example script-usage:

global Intel::pool: Cluster::Pool;
const Intel::max_pool_nodes = +2 &redef;
const Intel::use_exclusive_pool_nodes = F &redef;

const Intel::pool_spec = Cluster::PoolSpec(
  $topic = “bro/cluster/pool/intel”,
  $max_nodes = Intel::max_pool_nodes,
  $exclusive = Intel::use_exclusive_pool_nodes,
) &redef;

event bro_init() { Intel::pool = Cluster::register_pool(Intel::pool_spec); }

And other scripts would be similar except their default $max_nodes is still -1, using all available nodes.

I think this makes the user-experience also straightforward: the default configuration will always be functional and the scaling procedure is still mostly “just add more data nodes” and occasionally either “toggle the $exclusive flag” or “increase $max_nodes” depending on the user’s circumstance. The later options don’t necessarily address the fundamental scaling issue for the user completely, but it seems like maybe the best we can do at least at this level of abstraction.

- Jon

Just a quick summary of key points of this thread related to cluster-layout, messaging patterns, and API (omitting some minor stuff from Robin’s initial feedback).

- "proxy" nodes will be renamed at a later point toward the end of the project
  ("proxy" actually makes sense to me, but "data" seems to have caught on
  so I'll go w/ that unless there's other suggestions)

- "data" nodes will connect within clusters differently than previous "proxy"
  nodes. Each worker connects to every data node. Data nodes do not connect
  with each other.

- instead of sending logs statically to Broker::log_topic, there will now be
  a "const Broker::log_topic_func = function(id: Log::ID, path: string) &redef"
  to better support multiple loggers and failover use-cases

- add new, explicit message routing or one-hop relaying (e.g. for the simple
  use-case of "broadcast from this worker to all workers”)

- add a more flexible pool membership API to let scripters define their own data
  pool constraints that users can then customize (outlined in previous email)

Let me know if I missed anything.

- Jon

Sounds good to me. We should probably label the new parts experimental
for now, as I'm sure we'll iterate some more as people get experience
with them.

Robin