attributes & named types

For some scripting I'm doing, I'm running into the problem that
attributes associated with named types don't get propagated. For
example,

  type a: table[count] of count &default = 5;
  global c: a;
  print c[9];

complains that c[9] isn't set, instead of returning the value 5.

Having dived[*] into this and examined some potential fixes, I've identified
that the problem is that (1) the attribute "&default = 5" in the above
example gets associated with the identifier 'a' rather than with a's type,
and (2) when the parser processes the second line, early in the process 'a'
gets converted to its underlying type, with the attributes lost at that
point since, internally, BroType's do not have attributes.

This is a pain to deal with. If we simply add attributes to BroType's and
for statements like the first line associate the attributes with the type,
then a sequence like:

  type a: table[count] of count &default = 5;
  type b: a &read_expire = 1 min;

will wind up changing the attributes associated with 'a' even though it
really shouldn't.

I think the right solution for this is to introduce a new BroType called
a NamedType. A NamedType has an identifier, an underlying BroType, and a
set of attributes. When a NamedType is constructed, for its attributes
it combines both the explicit attributes in its declaration (like the
&read_expire for 'b' above) and the implicit (i.e., it inherits/copies the
&default from 'a').

I plan to implement this soon, so please speak up if you have thoughts.

    Vern

[*] The dive also exposed some other bugs in attribute processing, which
    I'll enter into the tracker shortly.

I think the right solution for this is to introduce a new BroType called
a NamedType. A NamedType has an identifier, an underlying BroType, and a
set of attributes. When a NamedType is constructed ...

Huh, turns out there's already a "TypeType", which is the equivalent.
All I need to do, I believe, is allow these to have attributes.

    Vern

My recollection is TypeType is the type that a value gets when the
stored-value is actually just a type and doesn't quite fit what's
needed here.

Take the original example:

        type a: table[count] of count &default = 5;
        type b: a &read_expire = 1 min;

Declaring a variable of type 'a' or type 'b' doesn't mean that the
values stored in that variable are types themselves, they're still
just storing values of the table-type (but with different attributes
depending on 'a' or 'b').

Other ideas I'm having for solving the overall problem are:

(1) 'a' and 'b' need to actually be distinct BroType's. Instead of
'b' being a reference/alias to 'a' with extended attributes, just
create it as it's own BroType by first cloning 'a', then adding any
additional attributes.

(2) BroType could somehow store a mapping of identifier -> attributes
so that on declaring a variable of either 'a' or 'b', we can choose
which attributes apply. But this is relying on the idea that you have
to push the desired attributes into each new Val because you can't
mutate the underlying the BroType since multiple Vals of either type
'a' or 'b' are going to be referencing it.

- Jon

Other ideas I'm having for solving the overall problem are:

(1) 'a' and 'b' need to actually be distinct BroType's. Instead of
'b' being a reference/alias to 'a' with extended attributes, just
create it as it's own BroType by first cloning 'a', then adding any
additional attributes.

I originally went down this route, but this involves adding attributes to
all types, since currently types don't have attributes. Seems like a poor
fit given there's only one class of type (named ones) that need this.

(2) BroType could somehow store a mapping of identifier -> attributes
so that on declaring a variable of either 'a' or 'b', we can choose
which attributes apply.

That seems more hacky than (1), as it involves adding still more to
types-in-general.

Seems simplest to me to have TypeType's (and only those) include attributes.
The rest of it is easy to do from there. We could also do this with a
separate NameType, but I don't see what that gains, since TypeType's already
come into existence because of binding names to types.

    Vern

My understanding was just that TypeType's currently are *not* used
when creating type aliases. Example:

# "mytype" is an ID w/ type "count" (i.e. it's a type alias).
# It's not using "TypeType" at this point.
type mytype: count;

function foo(x: any)
    { print x; }

# Passing the type name/alias as a value.
# The local variable/argument 'x' becomes of type
# "TypeType". It's not of type "count".
foo(mytype);

# Here, 'y' and 'x' are now actually type "count".
# They aren't a "TypeType".
local y: mytype = 3;
foo(y);

At least that's how I think it's currently working, so are you going
start using TypeType as a means of type aliasing in addition to adding
attributes to them?

Just trying to clarify/understand how things currently work vs. what is planned.

- Jon

My understanding was just that TypeType's currently are *not* used
when creating type aliases.

Correct.

# Passing the type name/alias as a value.
# The local variable/argument 'x' becomes of type
# "TypeType". It's not of type "count".
foo(mytype);

(Note that here any attributes are lost, since there's currently no
place for TypeType's to hold them.)

At least that's how I think it's currently working, so are you going
start using TypeType as a means of type aliasing in addition to adding
attributes to them?

Not quite. Rather, the type of "mytype" would be a TypeType, which would
have attributes. The TypeType instance however would not know that it
belongs to "mytype" (just as is currently the case).

We could continue to support a call like "foo(mytype)" above by hoisting
the base type (what the TypeType points to) when evaluating the expression
"mytype", just as currently the identifier gets turned into a TypeType at
that point. That said, just what is the use for calls like "foo(mytype)"
anyway? Seems a bit peculiar, but maybe I'm missing something.

    Vern

> At least that's how I think it's currently working, so are you going
> start using TypeType as a means of type aliasing in addition to adding
> attributes to them?

Not quite. Rather, the type of "mytype" would be a TypeType, which would
have attributes. The TypeType instance however would not know that it
belongs to "mytype" (just as is currently the case).

Ok, think I got it now (and agree it seems like a possible way forward
regarding attributes).

We could continue to support a call like "foo(mytype)" above by hoisting
the base type (what the TypeType points to) when evaluating the expression
"mytype", just as currently the identifier gets turned into a TypeType at
that point. That said, just what is the use for calls like "foo(mytype)"
anyway? Seems a bit peculiar, but maybe I'm missing something.

The actual usage I know is from Log::create_stream(), like this one:

  Log::create_stream(DPD::LOG, [$columns=Info, $path="dpd"]);

There, the Info is a type that's being stored into the $columns record
field of type 'any'. It's not exactly the same as "passed as function
argument" example I gave, but same idea.

I think TypeType was added particularly for this logging framework usage.

- Jon

Hmmmm I've looked into this and there are some subtle issues.

First, I tried to make this work using TypeType's like I had sketched, and
it turns out to be a mess. Too many points where a decision has to be
made whether to access the base type (what the named type points to) rather
than the TypeType itself.

I then had an Aha and realized it can instead be done in the grammar, by
associating different semantics with resolving type names depending on the
context in which they appear. I have this working. It's pretty simple, too.

HOWEVER: running on the test suite points up an issue I hadn't anticipated.
We have attributes associated with named types that currently aren't expected
to propagate. One example is from share/bro/base/init-bare.bro:

  ## A connection's identifying 4-tuple of endpoints and ports.

To better understand the existing behavior, here’s the commit that introduced this (specifically with regards to conn_id): https://github.com/bro/bro/commit/38a1aa5a346d10de32f9b40e0869cdb48a98974b

The &log keyword now operates as discussed:

    - When associated with individual record fields, it defines them
      as being logged.

    - When associated with a complete record type, it defines all fields
      to be logged.

    - When associated with a record extension, it defines all added
      fields to be logged.

    Note that for nested record types, the inner fields must likewise
    be declared with &log. Consequently, conn_id is now declared with
    &log in bro.init.

I think the discussion this is referring to is here: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2011-March/001107.html

(2) This makes me wonder about adding an operator to remove an
attribute if present. For example, you could imagine wanting
to now do something like:

type my_conn_info: record {
id: conn_id -&log;

};

as a way of specifying “if conn_id’s have a &log attribute,
I don’t want to inherit it”.

I’ve found myself wishing to remove an attribute recently, so this train of thought is relevant. I had imagined something slightly different, which was to maintain &log as it currently exists, but to also be able to explicitly set it to T or F, e.g.:

id: conn_id &log=F;

That would allow me to also be able to use redefs to configure whether or not I want to log something:

const log_conn = T &redef;

id: conn_id &log=log_conn;

I think that if we add something like this for &log, it might make sense to add it for other keywords too.

–Vlad

Thanks for the pointers & thoughts! A quick question, more in a bit:

To better understand the existing behavior, here's the commit that
introduced this (specifically with regards to conn_id):
&log keyword, and vector logging. · bro/bro@38a1aa5 · GitHub
...
> Note that for nested record types, the inner fields must likewise
> be declared with &log. Consequently, conn_id is now declared with
> &log in bro.init.

Does your understanding of this accord with the current behavior when
running on testing/btest/scripts/base/frameworks/logging/attr.bro ?
The test suite result has it not logging Log$id, even though it's of
type conn_id, which has &log. (For my new version, it does log it.)

    Vern

Thanks for the pointers & thoughts! A quick question, more in a bit:

To better understand the existing behavior, here’s the commit that
introduced this (specifically with regards to conn_id):
https://github.com/bro/bro/commit/38a1aa5a346d10de32f9b40e0869cdb48a98974b

Note that for nested record types, the inner fields must likewise
be declared with &log. Consequently, conn_id is now declared with
&log in bro.init.

Does your understanding of this accord with the current behavior when
running on testing/btest/scripts/base/frameworks/logging/attr.bro ?
The test suite result has it not logging Log$id, even though it’s of
type conn_id, which has &log. (For my new version, it does log it.)

Hmm. I had to think about this for a bit, and I think it does agree with the commit message. It’s rather subtle, but because the message talks about how the fields “must likewise be declared with &log,” I can see how the expectation would be that both the conn_id declaration in init-bare and the usage in your record need to have the &log keyword to be logged. However, before reading that commit message, that was not my expectation for how Bro would behave.

I’ve been playing around with this a bit more, and I think that what’s described in the commit message is not the current behavior. Specifically, the following seem to behave the same:

type conn_id: record {
orig_h: addr;
orig_p: port;
resp_h: addr;
resp_p: port;
} &log;

type conn_id: record {
orig_h: addr &log;
orig_p: port &log;
resp_h: addr &log;
resp_p: port &log;
};

This example demonstrates that all fields are still logged: http://try.bro.org/#/trybro/saved/275829

In my mind, if the keyword is applied to a record, I would expect any new fields added to that record to also be logged. However, if I use conn_id as defined in init-bare (with &log applied to the record), and I redef conn_id as follows, it will not log the new field:

redef record conn_id += {
nolog: bool &optional;
}

I believe that applying &log to a record is just shorthand to applying it individually to all fields on that record, whenever you define or redef that record.

Simply put, I think the current behavior is not correct, and that we should take this opportunity to determine what the behavior should be.

–Vlad

I believe the reason for not doing that is that then one couldn't add
a field that's *not* being logged (because currently we don't have
remove-an-attribute support).

I like the "&log=T|F" syntax to control this more directly, as long as
"&log" remains being equivalent to "&log=T".

Generally we need to be very careful changing if we want to change any
current semantics here, as it will impact custom log files that people
create in their own scripts.

Robin

Yeah, I think the reasoning makes sense, and that seemed to be the consensus from the discussion on bro-dev in 2011. My point is simply that with the current behavior, it’s not clear (or, AFAICT, documented) that adding &log to a record is just a shorthand for adding &log to each attribute, and that it really has no meaning for the record as a whole.

–Vlad

Thanks a bunch for the further context & discussion.

The more I've delved into this, the more convinced I've become that we have
a basic architectural problem with attributes: they're associated with
identifiers and values, but not types ... *except* for hacky ways for
records and record fields.

My alternative implementation for type names fares a bit better with the
example you gave at Try Zeek ... but still
gives counterintuitive behavior when I introduce a minor variant (I'll
spare you the details), with the problem being that a subsequent use of
&log (rather than the use when a record is declared) isn't propagated to
the record's individual fields.

I could I guess add code to do that propagation ... but testing further,
none of this fixes the original problem that I cared about, which is
to be able to declare types with &default values for tables, ala BIT-248:

  type str_tbl: table[string] of string &default="";

Here the problem is that the only opportunity to associate a &default
attribute with a table is when instantiating a table value. It doesn't
work if str_tbl is instead used to define a record field, similar to the
lack of propagation for &log.

I think what we need to do is rethink the basic architecture/structure of
attributes. In particular, types in general (not just named types) should
be able to have attributes associated with them. The attributes associated
with an identifier are those associated with its type plus those directly
associated with the identifier (like &redef).

While doing this, we can also think about mechanisms for removing attributes.
I don't think the "&attr=F" approach mentioned earlier on this thread will do
the trick, since it's syntactically/semantically quite weird for attributes
that already take expressions as values, such as &default or &read_expire.

    Vern

I think what we need to do is rethink the basic architecture/structure of
attributes. In particular, types in general (not just named types) should
be able to have attributes associated with them. The attributes associated
with an identifier are those associated with its type plus those directly
associated with the identifier (like &redef).

Sounds worth pursuing. I think this was also one of the routes
originally offered, but not sure if it actually got attempted or there
were other complications. Hard to remember all the twists this issue
has taken, but you probably have the freshest view of things at the
moment to decide which way to try going.

While doing this, we can also think about mechanisms for removing attributes.
I don't think the "&attr=F" approach mentioned earlier on this thread will do
the trick, since it's syntactically/semantically quite weird for attributes
that already take expressions as values, such as &default or &read_expire.

Yeah, attr removal seems to warrant its own unique syntax. But might
help to just review which attrs one may actually want to remove (or
even make sense to remove) -- seems like it's only &log in the first
place so maybe doesn't warrant a generalized mechanism ?

A related idea I haven't thought through: how about providing a BIF
that does attr removal/modification? Actually seems more powerful to
be able to change attributes at runtime rather than just parse-time.

Another thought/worry that may or may not be valid for generalized
attr remove/modification: seems there may be opportunity to create
non-sensical states. e.g. the sequence of (1) create a value of the
type "foo" which initially has attr &bar, (2) later remove &bar from
type foo, (3) are the existing values of type foo still coherent now
that they lack &bar ? Obviously made up the type/attr, but probably
have to think that sequence through for each existing attribute to
make sure behavior is well-defined for each.

- Jon

Oh, I like that too and it dovetails very nicely with the existing syntax. I remember those discussions we had about &log back then and I remember being the one that pushed for it but even then I didn't feel particularly comfortable with it (as I recall you feeling).

   .Seth