[Proposal] Language extensions for better Broker support

Bro's current Broker framework has a few pretty inelegant API parts
because Bro's scripting language doesn't support some of its
operations well currently. I've put some thoughts together on
potential language extensions to improve the situation and come to a
nicer Broker framework API:

    https://www.bro.org/development/projects/broker-lang-ext.html

Feedback welcome, this is just a first draft.

Robin

Asynchronous executions without when: yes!

Was just talking to Vlad about this yesterday. The examples get even worse as soon as you need to do more than one broker operation in sequence. Something like this with when statements would be unmaintainable:

#Check to see if any of these keys exists
local v1 = Broker::lookup(h, 41);
local v2 = Broker::lookup(h, 42);
local v3 = Broker::lookup(h, 43);

if (v1 || v2 || v3) {
    Broker::set(h, ...)
}

Or I could see a trivial example like this for counting things per day:

event connection_established(c: connection)
{
  # not hardcoded ideally..
  Broker::inc(h, fmt("connections:2016-12-02:addr:%s, c$id$orig_h), 1);
  Broker::inc(h, fmt("connections:2016-12-02:addr:%s, c$id$resp_h), 1);
  Broker::inc(h, fmt("connections:2016-12-02:port:%s, c$id$orig_p), 1);
  Broker::inc(h, fmt("connections:2016-12-02:port:%s, c$id$resp_p), 1);
}

Having a way to send a batch of operations would be nice, but that's a separate issue :slight_smile:

Something like this with when statements would be
unmaintainable:

Yeah, exactly.

local v1 = Broker::lookup(h, 41);
local v2 = Broker::lookup(h, 42);
local v3 = Broker::lookup(h, 43);

Having a way to send a batch of operations would be nice, but that's a separate issue :slight_smile:

Good thought, those three lookups above could all proceed in parallel,
and then one would wait for them all to finish before continuing.

Robin

Looks like a big improvement, some ideas:

Not that syntax is super important to nail down right away, but to me, “v as T” syntax is simpler than "as<T>(v)”.

Is there a reason "type(v)” can’t be stored? I’d probably find it more intuitive if it could, else I can see myself forgetting or making mistakes related to that. Alternative to even providing “type(v)”, you could have a “v is T” operation and to use an explicit/new “typeswitch (v)” statement instead of re-using “switch (type(v))”.

Between the two switch syntaxes you gave, maybe provide both and allow mixing of syntax between cases. E.g.:

switch ( type(v) ) {
    case bool -> b:
        print “it’s a bool, and I need to inspect the value", b;
        break;

    case count:
        print “it’s a count, but I don’t care what the value is, I just wanted to know if it’s a count”;
        break;
}

"Asynchronous executions without when”: I’d go with an explicit keyword to denote when a handler will yield execution — it makes it easier to understand the behavior of a function at a glance without requiring a perfect memory for what functions cause execution to yield. Full co-routine support is a nice step to take and this is already similar enough that adding a keyword like “yield” at this point may make it easier to evolve/plan the language into having generalized co-routine support. It might even be useful to try and spec out the co-routine support now and see how this specific use-case fits in with that.

- Jon

Feedback welcome, this is just a first draft.

I like this. Some initial feedback:

  - In the switch statement, I would require that a user either provide
    a default case or fully enumerates all cases. Otherwise it's too
    easy to cause harder-to-debug run-time errors.

  - For the abbreviated switch statement, what do you think about this
    variation?

        switch ( type(v) ) {
            case b: bool
                print "bool", b;
                break;

            case s: set[int]
                print "set", s;
                break;

            case r: some_record
                print "record", r;
                break;

    Since we don't use the arrow operator anywhere and already declare
    variable type with colons as in Pascal, this could feel more natural
    to Bro.

  - Why do we need a string comparison here?
    
        if ( type(v) == string )

    Wouldn't it suffice to have

        if ( type(v) == X )

    where X is a type instance (e.g., addr, count, set[int]) or a
    user-defined type (e.g., connection)? In other words, I don't
    understand why equality comparison need to be between a type and a
    string.

- Uplevelling, why do we need type() in the first place? Can't we just
   overload the switch statement to work on types as well? Or do you
   have other use cases in mind?

- Regarding asynchronous execution:

     local h = async lookup_hostname(“www.icir.org”);

   I like the explicit keyword here to signal asynchrony. Ignoring the
   when statement for now, users rely on non-preempted execution both
   within a function and an event handler, and I would argue that many
   scripts are built around this assumption. When these semantics
   change, it could become harder to reason about the execution model.
   However, if we make timeouts mandatory, I woudn't mind dropping the
   async keyword.

    Matthias

I really like the suggested improvements and I agree to Jon and Matthias
regarding the use of "async" to make corresponding calls explicit. The
first thing I thought of was the get_current_packet() or the
get_current_packet_header() function. If there is some asynchronous
execution in context of a "new" packet, a transparent syntax might
obfuscate what is going on there. I also like the switch syntax Matthias
suggested.

Jan

Thanks for the feedback. I'm replying below to all three of you, as
it's all related. I'm hearing strong support for making asynchronous
calls explicit, which is fine with me.

“v as T” syntax is simpler than "as<T>(v)”.

I see one disadvantage with the "as" syntax: the whitespace will lead
to additional parentheses being required in some contexts. For
example, with T being a record: "(v as T)$foobar" vs
"as<T>(v)$foobar". Not necessarily a showstopper though. The "as"
syntax does feel more in line with other Bro syntax.

Are there other ideas to express casts?

Is there a reason "type(v)” can’t be stored? I’d probably find it
more intuitive if it could, else I can see myself forgetting or making
mistakes related to that.

Mind elaborating how you would expect to use that? My reasoning for
foregoing storage was that I couldn't really see much use for having a
type stored in variable in the first place because there aren't really
any operations that you could then use on that variable (other than
generic ones, like printing it).

  Alternative to even providing “type(v)”, you could have a “v is T”

I like that. That avoids the question of storing types altogether.

  operation and to use an explicit/new “typeswitch (v)” statement
  instead of re-using “switch (type(v))”.

Hmm ... I'd prefer not introduce a new switch statement. However,
seems the existing switch could just infer that it's maching types by
looking at case values: if they are types, it'd use "is" for
comparision.

    case bool -> b:
    case count:

Good point, I like that.

It might even be useful to try and spec out the co-routine support now
and see how this specific use-case fits in with that.

Need to think about that, would clearly be nice if we could pave the
way for that already.

  - In the switch statement, I would require that a user either provide
    a default case or fully enumerates all cases. Otherwise it's too
    easy to cause harder-to-debug run-time errors.

Full enumeration doesn't seem possible, that would mean all possible
types, no?

Are we requiring "default" for the standard switch statement? If so, I
agree it makes sense to do the same here (otherwise, not so sure,
because of consistency).

            case b: bool
            case s: set[int]
            case r: some_record

Interesting thought. I had this at first:

    case b: bool: ...
    case s: set[int]: ...
    case r: some_record: ...

That mimics the existing syntax more closely, but ends up being ugly
with the two colons. I kind of like your idea, even though it makes
the syntax a bit non-standard. However, I'm not sure the parser could
deal with that easily (because there's no clear separation between the
type and the following code). Also, this might not fit so well with
Jon's idea of making the identifier optional.

  - Why do we need a string comparison here?
    
        if ( type(v) == string )

I believe you misread this: "string" *is* a type.

- Uplevelling, why do we need type() in the first place? Can't we just
   overload the switch statement to work on types as well? Or do you
   have other use cases in mind?

My main other use case was offering if-style comparision as well for
types (see previous point). But if we do "is" for that, we indeed
wouldn't need the type() anymore.

   However, if we make timeouts mandatory, I woudn't mind dropping the
   async keyword.

I'm going back and forth on whether to make timeouts mandatory. I
think we need to have a default timeout in any case, we cannot have
calls linger around forever. But explicitly specifying one each time
takes away some of the simplicity. On the other hand, with a default
timeout we'd probably need to throw runtime errors instead of
returning something, and Bro isn't good with tons of runtime errors
(which could happen here).

The first thing I thought of was the get_current_packet() or the
get_current_packet_header() function.

Can you elaborate? These functions aren't asynchronous currently.
Would you change them to being so; and if so, what would that do?

That was just another example for a situation in which "hidden"
asynchronous execution could easily lead to unintended behavior. Calls
of get_current_packet() in context of the same event handler will return
different results in case the execution gets interrupted due to an async
call. But that's most likely an esoteric edge case. Sorry for the noise :slight_smile:

Jan

Ah, got it, I had misread what you meant. Thanks,

Robin

> Alternative to even providing “type(v)”, you could have a “v is T”

I like that. That avoids the question of storing types altogether.

I think we can cover all type-based dispatching with "is" and "switch."
In fact, I see "x is T" as syntactic sugar for:

function is(x: any) {
    switch(x) {
      default:
        return false;
      case T:
        return true;
    }
}

> It might even be useful to try and spec out the co-routine support now
> and see how this specific use-case fits in with that.

Need to think about that, would clearly be nice if we could pave the
way for that already.

+1

Full enumeration doesn't seem possible, that would mean all possible
types, no?

Yeah, not in a distributed systems, at least. I think I was coming from
C++ here where dispatching on type-safe unions requires complete
enumeration of all overloads. In Bro, we don't need that because we
cannot know the complete list of types a priori.

Are we requiring "default" for the standard switch statement? If so, I
agree it makes sense to do the same here (otherwise, not so sure,
because of consistency).

Sounds good to me.

> if ( type(v) == string )

I believe you misread this: "string" *is* a type.

Ah, now that makes sense!

> - Uplevelling, why do we need type() in the first place? Can't we just
> overload the switch statement to work on types as well? Or do you
> have other use cases in mind?

My main other use case was offering if-style comparision as well for
types (see previous point). But if we do "is" for that, we indeed
wouldn't need the type() anymore.

Good, I like "is" better than a type function because of readability.

But explicitly specifying one each time takes away some of the
simplicity.

On the other hand, with a default timeout we'd probably need to throw
runtime errors instead of returning something, and Bro isn't good with
tons of runtime errors (which could happen here).

How is a timeout different from a failed computation due to a different
reason (e.g., connection failure, store backend error)? I think we need
to consider errors as possible outcomes of *any* asynchronous operation.
More generally, we need well-defined semantics for composing
asynchronous computations. For example, what do we do here?

    # f :: T -> U
    # g :: T
    local x = async f(async g())?

One solution: if g fails, x should contain g's error---while f will
never execute. But in the common case of success, the user just wants to
work with x as an instance of T, e.g., when T = count:

    count y = 42;
    print x + y; # use x as type count

If x holds an error, then Bro would raise a runtime error in the
addition operator. Would that make sense?

    Matthias

"(v as T)$foobar” vs "as<T>(v)$foobar”.

Could just do some time trials to see which one people can type faster. There’s online typing speed tests that let you enter the test’s text. I consistently typed "(v as T)$foobar” faster.

Is there a reason "type(v)” can’t be stored?

Mind elaborating how you would expect to use that?

Was thinking I’d end up doing something like (more a personal habit/style thing):

  local t = type(v);

  if ( t == count )
    …
  else if ( t == int )
    ...

But this example came to mind only because the draft didn’t have “v is T”. Point seems moot now.

seems the existing switch could just infer that it's maching types by
looking at case values: if they are types, it'd use "is" for
comparision.

+1

- In the switch statement, I would require that a user either provide
   a default case or fully enumerates all cases. Otherwise it's too
   easy to cause harder-to-debug run-time errors.

Are we requiring "default" for the standard switch statement? If so, I
agree it makes sense to do the same here (otherwise, not so sure,
because of consistency).

The “default" case is optional. If it were required, I wouldn’t feel safer — maybe I don’t frequently make/see mistakes related to that and/or don’t find them hard to debug.

More often I’ve seen forgotten “break” at the end of a cases causing unintentional fallthroughs, but Bro does require explicit “break” or “fallthrough” statement to end cases.

- Jon

Sounds like agreement on most suggestions, I'll update the web page
shortly with the conclusions.

Couple further comments:

I consistently typed "(v as T)$foobar” faster.

Personally, I see this more as a question of readability (as opposed
to typeability :). But it's a matter of taste, and I'd be fine with
using "as" instead of "cast<>".

How is a timeout different from a failed computation due to a different
reason (e.g., connection failure, store backend error)?

I'm thinking such errors need to be addressed by the function itself,
using whatever mechanism it deems appropriate (e.g., signaling trouble
through its return value). The problem is that we don't have any
further error handling mechanisms (e.g., exceptions) in the language
right now (that's a project for some other time). I see timeouts as
different as that's something the function itself may have a hard time
catching internally, if at all; we couldn't rely on that. It's kind of
last line of defense against trouble: if for whatever reason the
function ends up never returning, we'll catch it and clean up at
least.

asynchronous operation. More generally, we need well-defined
semantics for composing asynchronous computations.

I think this would make sense only if we already had some model for
propagating errors as part of the language. In the absence of that, I
don't really see much to do here. For the normal (non-error) case,
return values are just passed along as expected. And in the abnormal
error case, there's not much else to do than abort the whole event
handler.

Robin

Personally, I see this more as a question of readability (as opposed
to typeability :). But it's a matter of taste, and I'd be fine with
using "as" instead of "cast<>".

Probably aligned with that thought is consistency and intuition: we
don't have C++-style angle brackets in Bro, so "as" feels more in line
with, e.g., type declarations of the form table[T] of U. Angle brackets
would feel like a one-off, whereas space-separated short keywords fit in
more naturally.

I'm thinking such errors need to be addressed by the function itself,
using whatever mechanism it deems appropriate (e.g., signaling trouble
through its return value). The problem is that we don't have any
further error handling mechanisms (e.g., exceptions) in the language
right now (that's a project for some other time). I see timeouts as
different as that's something the function itself may have a hard time
catching internally, if at all; we couldn't rely on that. It's kind of
last line of defense against trouble: if for whatever reason the
function ends up never returning, we'll catch it and clean up at
least.

Indeed, timeouts are a last line of defense and generated by the
runtime, as opposed to the function itself. You point out that these are
two separate kinds of errors.

However, from a user perspective, I think errors should be treated
uniformly. For example, when performing a Broker store lookup of a key,
but the returned value has a type different than expected, script
execution (in particular: a chain of asynchronous functions) should not
continue. The same thing happens for a timeout, even though the
runtime---as opposed to the actual function---generated the error.

The alternative strikes me as complicated, because it is application
dependent. If a response to an asynchronous request doesn't arrive
within a time bound, *the effect* might be equivalent to an error within
the function itself. An invalid cast might generated a runtime error to
the console (as Bro does now for use of uninitialized variables) and
abort, whereas a timeout would simply abort.

In Bro, one way to represent represent errors would be as a record with
an optional error field. If the field is set, an error took place and
one can examine it. Either the function returns one of these explicitly,
or the runtime (i.e., the interpreter) populates one.

Here's a strawman of what I mean:

    type result = record {
      error: failure &optional;
      value: any &optional;
    }

    function f(x: T) : result {
      if (store lookup with bad cast)
        return [$error=..];
      if (nothing to do)
        return ;
      if (computed something)
        return [$data=42];
    }

    local r = f(x);
    if (r$?error) {
      // handle error, check if timeout or some other form of error
      // occurred by inspecting the failure instance.
    }

Then, the runtime would allow for automatic chaining of asynchronous
functions returning instances of type result:

    function f(x: T) : result;
    function g(y: U) : result;

    // Calls g iff f did not fail and return value unpacking succeeded.
    local r = g(f(x));

Here, the runtime could do the unpacking of f's result, checking for
errors, and feeding the result data into g---assuming types work out.

In essence, this is a monad for a fixed set of types. The runtime
performs the unboxing of return values automatically.

    Matthias

In Bro, one way to represent represent errors would be as a record with
an optional error field. If the field is set, an error took place and
one can examine it. Either the function returns one of these explicitly,
or the runtime (i.e., the interpreter) populates one.

Here's a strawman of what I mean:

   type result = record {
     error: failure &optional;
     value: any &optional;
   }

That type of structure for error handling/propagation seems fine to me.
(Don’t see the advantage of using exceptions instead).

Then, the runtime would allow for automatic chaining of asynchronous
functions returning instances of type result:

   function f(x: T) : result;
   function g(y: U) : result;

   // Calls g iff f did not fail and return value unpacking succeeded.
   local r = g(f(x));

I might prefer just doing the unpacking myself. Having separate/explicit checks for each individual return value would naturally occur over time anyway (e.g. when debugging, adding extra logging, improving error handling).

How common do you expect async function chains to occur? Any specific examples in mind where auto-chaining is very helpful? Maybe it’s more of a nice-to-have feature than a requirement?

- Jon

I might prefer just doing the unpacking myself. Having
separate/explicit checks for each individual return value would
naturally occur over time anyway (e.g. when debugging, adding extra
logging, improving error handling).

How common do you expect async function chains to occur? Any specific
examples in mind where auto-chaining is very helpful? Maybe it’s more
of a nice-to-have feature than a requirement?

This is a bit of a chicken-and-egg problem: we don't have much use of
when at the moment, because it's difficult to nest. If we provide a
convenient means for composition from the get-go, I'd imagine we see a
quicker adoption of the new features.

The standard workflow I anticipate is a lookup-compute-update cycle.
Here's a half-baked example:

  function test(attempts: count): result {
    if (attempts > threshold)
      NOTICE(...);
    return [$value = attempts + 1]; // no error
  }

  function check(..) : result {
    local key = fmt("/outbound/%s", host);
    local r = async lookup(store, key);
    if (r?$error)
      return local?$error;
    return async put(store, key, failed_attempts + 1);
  }

With the proposed monad, you could factor the implementation of the
check function check into a single line:

  local r = put(store, key, test(lookup(store, key)));
  
Conceptually, this is equivalent to "binding" your function calls:

  local r = lookup(store, key))
        >>= test
        >>= function(x: count) { return put(store, key, x); }; // curry

...where >>= would be the bind operator performing error/value
unpacking. Less functional clutter with the "do" notation:

  do
    x1 <- lookup store key
    x2 <- test x1
    x3 <- put store key x2

The last two snippets derail the conversation a bit---just to show where
these ideas come from.

Note that the function "test" is synchronous in my example and that this
doesn't matter. The proposed error handling makes it straight-forward to
mix the two invocation styles.

    Matthias

For prototyping purposes, I see the convenience in that, but wonder if the runtime will do something that’s useful and widely applicable enough for that to translate directly into production code. What exactly does the runtime do if “lookup” fails here besides turn the outer function calls into no-ops?

Just guessing, but in many cases one would additionally want to log a warning and others where they even want to schedule the operation to retry at a later time. i.e. the treatment of the failure varies with the context. Is this type of composition flexible enough for that?

- Jon

> local r = put(store, key, test(lookup(store, key)));

For prototyping purposes, I see the convenience in that, but wonder if
the runtime will do something that’s useful and widely applicable
enough for that to translate directly into production code. What
exactly does the runtime do if “lookup” fails here besides turn the
outer function calls into no-ops?

The runtime doesn't do anything but error propagation. That's a plus in
my opinion, because it's simple to understand and efficient to
implement.

Just guessing, but in many cases one would additionally want to log a
warning and others where they even want to schedule the operation to
retry at a later time. i.e. the treatment of the failure varies with
the context. Is this type of composition flexible enough for that?

It's up to the user to check the result variable (here: r) and decide
what to do: abort, retry, continue, or report an error. Based on what
constitutes a self-contained unit in an algorithm, there are natural
points where one would try again. In the above example, perhaps at the
granularity of that put-test-lookup chain. To try again after a timeout,
I would imagine it as follows:

    local algorithm = function() : result {
      return put(store, key, test(lookup(store, key))) &timeout = 3 sec;
    };

    local r = result();
    do {
      r = algorithm();
    } while (r?$error && r$error == timeout);

Based on how flexible we design the type in r$error, it can represent
all sorts of errors. The runtime could abort for critical errors, but
give control back to the user when it's a recoverable one.

    Matthias

> type result = record {
> error: failure &optional;
> value: any &optional;
> }

I don't really like using a record like that, as that would associate
specific semantics with what's really a user-definable type. I.e.,
we'd hardcode into the language that the record used here needs to
have exactly these fields. On top of that, it also feels rather clumsy
to me as well. Indeed it's one of the things I don't like about the
current Broker framework API (which currently has to do it that way
just because there's no better mechanism). My draft proposal extends
opaque types to support conversion to boolean to allow for more
elegant error checks. We could generalize that to support other data
types as well, although I don't really see a need for going beyond
opaque right now (assuming we also add the cast operator, per the
proposal, so that we aren't passing anys around).

With the proposed monad, you could factor the implementation of the
check function check into a single line:

  local r = put(store, key, test(lookup(store, key)));

Honestly, I think this operates on a level that's quite beyond any
other concepts that the language currently offers, and therefore
doesn't really fit in. I'm pretty certain this wouldn't really enter
the "active working set" of pretty much any Bro user out there. For
the time being at least, explicit error checking seems good/convinient
enough to me.

r$error == timeout

Based on how flexible we design the type in r$error, it can represent
all sorts of errors.

That's a piece that I like: Being able to flag different error types.
Maybe the conversion to bool that I proposed originally should really
be a conversion to a dedicated error type, so that one can
differentiate what happened. I need to think about it, but that could
eliminate the need for simply aborting the event handler on timeout
(which in turn would help with the problem that Bro isn't good at
aborting code)

Robin

I don't really like using a record like that, as that would associate
specific semantics with what's really a user-definable type.

It was only meant to illustrate the idea of error handling and function
composition. These ideas still hold up when substituting the
user-defined result type with the proper language-level construct.

We could generalize that to support other data types as well, although
I don't really see a need for going beyond opaque right now

Going beyond opaque would have the advantage of applying a uniform error
handling strategy to both synchronous and asynchronous code. That's not
needed right now, because "when" doesn't make it easy to handle errors,
but it could be an opportunity to provide a unified mechanism for a
currently neglected aspect of the Bro language.

Maybe the conversion to bool that I proposed originally should really
be a conversion to a dedicated error type, so that one can
differentiate what happened.

I like that.

    Matthias

I like that too. Having nicely generalized error handling in Bro would be such a huge benefit for script authors.

.Seth