Pattern matching for the Bro language

TL;DR:

    function f() : any;

    local result = "";
    switch( f() )
      {
      case addr:
        if ( x in 10.0.0.0/8 )
          result = "got it!";
      case string:
        result = "f() failed: " + x;
      }

I want to propose introducing pattern matching for the Bro language.
Pattern matching is a powerful concept particularly available in
functional languages, like Haskell, ML, Erlang, Rust, you name it. It
enables typesafe dispatching based on the type of a value. Other
languages often can go beyond type-based dispatching and also enable
"value" dispatch. We *kinda* have this with the when statement in an
asynchronous form, which monitors a given expression value, and whenever
the operands change, the expression is re-evaluated.

But, let's get back to type-based dispatch and "any". The "any" type is
really just a bolt-on fix for the lack of a more sophisticated type
system. We use (and abuse) it anywhere where we need polymorphism and
want to bypass the type system. Today, Bro doesn't have generic
programming facilities besides "any". I hope this will change in the
future; introducing pattern matching is the first step in this
direction.

In the future, I believe that in Bro we see more and more asynchronous
operations, in particular with the proliferation of Broker. This
requires better language support. When users store data remotely and
need to wait for answer. The asynchrony often introduces sum types:
either the result comes back or an error occurs. The above example is
such a sum type: either an addr or a string. If "x" has neither type,
Bro would raise an error---at runtime. Here's a another example:

    function lookup(key: string) : any;

    when ( local x = lookup("key") )
      {
      local result = "";
      switch( x )
        {
        case addr:
          if ( x in 10.0.0.0/8 )
            result = "contained";
        case string:
          result = "error: lookup() failed: " + x;
        }
      }

When we ask a store for data, the runtime doesn't know the type until it
gets a result back. Because there can be multiple return types, "switch"
provides a means to extract the value in a type-safe manner.

Some languages (Ruby comes to mind) design switch as an expression,
which would allow constructs like:

      local result = switch( x )
        {
        case T:
        case U:
        };

Personally, I like this functional treatment, but C-seasoned folks may
have a harder time with it.

If you have any thoughts on this, please chime in.

    Matthias

I want to propose introducing pattern matching for the Bro language.

Per our discussion yesterday, I like this notion in general. (Seems we
need a better term for it, though, as "pattern matching" is very generic -
plus will confuse some people who'll think it refers to NIDS rules rather
than generic type safety!)

Some languages (Ruby comes to mind) design switch as an expression,
which would allow constructs like:

      local result = switch( x )
        {
        case T:
        case U:
        };

Personally, this strike me as a tad weird, since now "result" might not
have a statically determined type, so we're back to it being "any".
So I'd want to wait on going this far until we have use cases where
it clearly would help with code clarity.

    Vern

> local result = switch( x )
> {
> case T:
> case U:
> };

Personally, this strike me as a tad weird, since now "result" might not
have a statically determined type, so we're back to it being "any".

To avoid falling back to "any land," the additional constraint in this
case would be that each case block would have to have a return statement
with the same type.

The use case I had in mind is returning from a function.

    function f(x: any) : string
        {
        return switch(x)
            {
            case T:
                return "T";
            case U:
                return "U";
            }
        }

Though that's simply syntactic sugar for:

    function f(x: any) : string
        {
        local result = "";
        switch(x)
            {
            case T:
                result = "T";
            case U:
                result = "U";
            }
        return result;
        }

I'm not feeling very strong about it.

    Matthias

Had discussed this with Matthias before, but for the record: I like
it, too. :slight_smile: (This form; less the one with return values, at least for
now).

As one additional note, even with this added, we wouldn't otherwise
extend the operations that are allowed on "any" instances. Right now,
there's actually not much one can do with them, and it would stay that
way to avoid people starting to generally skip the typing system
(e.g., one cannot assign an "any" to another "any"; more generally,
one cannot pass them around arbitrarily). The "switch" is for using
"any" safely in cases where it cannot be avoided (which is primarily
bifs with return values that cannot be statically typed).

Robin

I like this proposal a lot too.

  .Seth