send_id (Re: [Bro-Commits] [git/bro] topic/jsiwek/actor-system: Finish port of control framework to use broker. (8dddae1))

Jon, replacing send_id() may indeed work better with an extension at
the C++/Broker level. I'd like to avoid introducing new dependencies
on Bro's serialization code, as I'm very much hoping that once the old
communication code code goes we won't need that serialization layer
anymore either (I know we're using it for opaque values over Broker
too, but that's quite contained and should be easy to replace).

Some thoughts here:

     - I'm thinking the best approach may be a new Bro-specific
       message for Broker, similar to the log-create/write messages.
       We can add that to the Bro shim in Broker. It'd send the name
       of the ID and the new value as a broker::data instance, and the
       receiving Bro updates the value as the message comes in.

     - There's one larger problem with replacing send_id() though: the
       old communication system has logic to send large values
       incrementally, so that send_id() won't block stuff. As Seth
       reminded me the SumStats framework is relying on that quite
       extensively for sending tables around. Incremental operation is
       something we don't have with Broker. I think that's ok, we can
       replace the few existing use cases of sending large values with
       something else. For SumStats that should probably be data
       stores. I don't remember if there even are further instances of
       this, my guess is no (I don't think we need to consider
       broctl's configuration updating here; those values are small
       and a non-incrememtnal send_id() is fine).

     - Another approach for broctl's updating could be switching to
       the upcoming configuration framework, which takes a different
       approach to dynamic reconfiguration. However, it's a bit out
       still until we can that switch completely, so for now providing
       a substitute for send_id() that can cover the simple uses cases
       looks like the best way to go.

Robin

I believe that Robin meant the intel framework instead of sumstats. (Hopefully this avoids some confusion)

.Seth

I believe that Robin meant the intel framework instead of sumstats.
(Hopefully this avoids some confusion)

Thanks for the clarification! I was thinking about send_id() in context of the intel framework as well. As you might noticed, I enjoyed playing around with the intel framework :slight_smile: Thus, some questions to make sure I got everything correctly:

Jon, replacing send_id() may indeed work better with an extension at
the C++/Broker level. I'd like to avoid introducing new dependencies
on Bro's serialization code, as I'm very much hoping that once the old
communication code code goes we won't need that serialization layer
anymore either (I know we're using it for opaque values over Broker
too, but that's quite contained and should be easy to replace).

So sending opaque values will still be possible using broker, right?

      - There's one larger problem with replacing send_id() though: the
        old communication system has logic to send large values
        incrementally, so that send_id() won't block stuff. As Seth
        reminded me the SumStats framework is relying on that quite
        extensively for sending tables around. Incremental operation is
        something we don't have with Broker. I think that's ok, we can
        replace the few existing use cases of sending large values with
        something else. For SumStats that should probably be data
        stores.

As far as I understand the broker data stores (straight forward key-value stores), a data store does not fit for the intel framework, as it uses e.g. the patricia-trie implementation in tables to efficiently match subnets. Additionally I was thinking about using cuckoo-filters, implemented as opaque type, to further improve matching on workers.

However, the intel framework uses send_id() only initially to transfer the current min_data_store to newly connected workers. Every further update is handled "manually". Thus I guess there would be two options:
1. Sending all data at once. Maybe ok for that use case.
2. Sending stuff incrementally using some script-layer logic.
Am I right here?

Thanks,
Jan

Thanks for the clarification! I was thinking about send_id() in context
of the intel framework as well.

Yep, I meant Intel framework of course. :slight_smile:

So sending opaque values will still be possible using broker, right?

Yes, correct (one downside of opaque values is that only Bro itself
can send & receive them, for external Broker clients they will remain,
um, opaque :slight_smile:

As far as I understand the broker data stores (straight forward
key-value stores), a data store does not fit for the intel framework, as
it uses e.g. the patricia-trie implementation in tables to efficiently
match subnets.

Good point. Support for efficient subnet lookups is something we
should probably add to Broker stores at some point, but that's for the
future.

1. Sending all data at once. Maybe ok for that use case.

That would be ok for the new Intel client I think, but sending the
whole thing will put load on the sending Bro as well, which could be a
problem. It depends on the volume of the data of course, it's hard to
say where the limit is.

2. Sending stuff incrementally using some script-layer logic.

This might be the best way to go then. In the future I'd like to have
a script-level framework that offers some higher-level communication
patterns on top of Broker, like this one: "send large table safely".
For now, the Intel framework could implement that itself and then
maybe we can even reuse that implementation later by making it more
generic.

Robin

Jon, replacing send_id() may indeed work better with an extension at
the C++/Broker level.

    - I'm thinking the best approach may be a new Bro-specific
      message for Broker, similar to the log-create/write messages.
      We can add that to the Bro shim in Broker. It'd send the name
      of the ID and the new value as a broker::data instance, and the
      receiving Bro updates the value as the message comes in.

Sounds ok. Were you going to work on adding such a message or want me to try?

    - There's one larger problem with replacing send_id() though: the
      old communication system has logic to send large values
      incrementally, so that send_id() won't block stuff.

Thanks, I had missed that fact.

Generally for porting scripts to use Broker, I was probably going to end up doing the most straightforward thing I can think of to just get things to work and hope people follow commit messages well enough to call out anything questionable. Do you think there’s a better process for capturing high-level requirements earlier?

- Jon

Sounds ok. Were you going to work on adding such a message or want me to try?

I can work on it, but it'll probably take me a few days to get to it.
If can make progress with other stuff in the meantime, I'll do that;
otherwise give it a try if you want.

Generally for porting scripts to use Broker, I was probably going to
end up doing the most straightforward thing I can think of to just get
things to work and hope people follow commit messages well enough to
call out anything questionable.

That works for me. The downside for you might be a bit of wasted time
if discussion afterwards leads to a different approach. So feel free
to also ask for feedback upfront, whatever seems best. Either way,
getting it to work by applying the most straightforward approach is
definitely the right rule of thumb.

Robin

I agree, that seems to be the preferable solution in anticipation of a higher-level communication framework. I will keep following the broker development and give it a try.

Thanks,
Jan