IPv6 literal addr constants

Representing compressed-hex IPv6 addresses (replacing consecutive fields of zeros with ::slight_smile: in scripts as literal constants can be ambiguous with identifiers that use "::" as a namespace resolver.

For example, the lexer will treat "aaaa::bbbb" as the "bbbb" identifier in the "aaaa" namespace/module. Specifically, this is the case where someone tries to write an address that uses only the first and last 16-bit fields and the first nibble of both fields is a letter.

I think this would be uncommon, but also maybe not obvious to figure out when someone actually runs into it, though it's easy to workaround once you know what's going on. Any ideas for fixing the ambiguity or does it seem reasonable to just have it documented?

+Jon

The example that I found yesterday was a607:f8b0::/32 (I get an error message from bro, "unknown identifier a607"). If I write it
as a607:f8b0::0:0:0:0:0/32, then I still get the same
error message. Writing it without a double colon
a607:f8b0:0:0:0:0:0:0/32 seems to work.

If the first digit is in the range 0-9 (and not in
the range a-f), then bro does not complain (such
as 2607:f8b0::/32).

-Daniel

The example that I found yesterday was a607:f8b0::/32 (I get an error message from bro, "unknown identifier a607"). If I write it
as a607:f8b0::0:0:0:0:0/32, then I still get the same
error message. Writing it without a double colon
a607:f8b0:0:0:0:0:0:0/32 seems to work.

If the first digit is in the range 0-9 (and not in
the range a-f), then bro does not complain (such
as 2607:f8b0::/32).

Right, the current rule in scan.l for compressed hex notation looks for the first nibble to be a digit and not a letter. That's fixable, but as I was testing more potential address formats I ran into the ambiguity I mentioned before which doesn't look like it's so easy to work around.

+Jon

Right, the current rule in scan.l for compressed hex notation looks for
the first nibble to be a digit and not a letter. That's fixable ...

I don't recall the genesis of this rule (which I probably added a long
time ago), but it could be that starting with a digit is intentional,
because otherwise examples like the one given earlier of aaaa::bbbb are
fully ambiguous. With the rule, you can write 0aaaa::bbbb and it will
(should!) parse correctly. Maybe that's too ugly. I'm not sure there's
a better fix, however.

    Vern

Could we just disallow module names that could be interpreted as addresses?

  .Seth

Could we just disallow module names that could be interpreted as addresses?

Yeee-uck! "Identifiers begin with a letter followed by zero or more digits
or letters. However, they must include at least one letter in the range g-z."

    Vern

How about enclosing IPv6 literals in brackets, e.g. [aaaa::bbbb]? As
with URLs this would also allow IPv6 addresses with ports, e.g.
[2620:83:8000:102::c9]:22.

    Craig

I think the right fix would be not having the lexer make the decision,
but do it later when we can say whether there's an identifier with
that name. But that's not easy to do. :frowning:

Robin

I like that, that's a well defined standard syntax.

Robin

I like it too, but won't it be yet another problem for differentiating between record definition, tuple definition, and now IPv6 definition?

  .Seth

Me too.

That's a bad idea as well. if you have a typo in your identifier, bro won't complain anymore and assume it's an IPv6 literal.

cu
gregor

+1

That is a good point. I'm not fully sure but I believe it should be
less of a problem than with the current syntax. Most of the other
usages have some characters in there that aren't valid inside an
address (or the other way round: the address' ':' and '::' aren't
valid in them). But the trick is to define the lexer so that it picks
the addresses but leaves the other ones alones ...

Robin

That is a good point. I'm not fully sure but I believe it should be
less of a problem than with the current syntax. Most of the other
usages have some characters in there that aren't valid inside an
address (or the other way round: the address' ':' and '::' aren't
valid in them). But the trick is to define the lexer so that it picks
the addresses but leaves the other ones alones ...

Yeah, I think the patterns for bracketed IPv6 literals are going to be specific enough to not be ambiguous with the other uses of brackets.

Sounds like enough people like that syntax so I I'll add it, but what to do with the old syntax for IPv6 literals? Should it be removed at this time, deprecated until 2.2, or kept indefinitely?

+Jon

I vote for just removing. Now is the one time where we can break IPv6
stuff to make it better.

Robin

ACK. let's break it now!