more set operators? (equality/subset)

I've implemented the set operators we discussed, and am now developing
test cases.

In the process, I'm finding that it would be handy to have set equality
and subset/superset testing. These would be (for type equivalence sets):

  s1 == s2 iff both sets have exactly the same members

  s1 < s2 iff every element in s1 is in s2, but s2 has some
           elements not in s1

  analogous <=, >, >=, != operators

These can already be implemented in terms of existing operators
(if I haven't screwed something up):

  s1 == s2 <=> |s1 & s2| == |s1 | s2|
  s1 < s2 <=> |s1 & s2| == |s1| && |s1| < |s2|

Any concerns with adding these too?

    Vern

  s1 == s2 iff both sets have exactly the same members

  s1 < s2 iff every element in s1 is in s2, but s2 has some
           elements not in s1

[...]

Any concerns with adding these too?

I actually have a small question when thinking about these - which I
should already have raised about the intersect operators. What happens
when sets contain records or other complex types in these cases?

From what I can tell, Bro so far refuses to compare records - the reason

being that (I think) we do not have properly implemented comparison
operators internally. For example, the following script:

type A: record {
};

event bro_init()
  {
  local i = A($a="a");
  local j = A($a="a");
  print i == j;
  }

outputs:

$ bro test.bro
error in ./test.bro, line 9: illegal comparison (i == j)

I assume what will at the moment happen with sets is that the pointers of
records are checked for equality - not the content. Which might arguably
be a bit non-intuitive.

As I said this is more of a concern about the already added operators - in
principle I don't have a problem with ==, but I think it should work for
other complex datatypes too.

Johanna

I assume what will at the moment happen with sets is that the pointers of
records are checked for equality

Specifically, in my branch they are checked for whether the composite hash
index matches. Happily, this works:

  type A: record {
  };

  event bro_init()
    {
    local i = A($a="a");
    local j = A($a="a");
    print set(i) | set(j);
    }

when run prints

  {
  [a=a]
  }

and if you change j to be $a="b" then you get:

  {
  [a=b],
  [a=a]
  }

This in fact suggests we could implement record equality by converting the
records to hash indices and then comparing those.

    Vern

Oh, neat. If that actually works in all cases (so also with records of
records, etc) I would be totally on board with this. Hash equality was
something that I missed a few times :).

Johanna

Oh, neat. If that actually works in all cases (so also with records of
records, etc)

Well, it almost does. I tried it with records that contain records and
that's fine. For records that contain sets, it often works in my testing,
but not always, evidently due to the randomized hash keying, since I can
make it go away by always loading the same seeds.

The same problem occurs with set deletion: deleting from a set of
records-containing-sets sometimes fails to delete an element that's
indeed in the set. (Hmmm and we also don't support sets of sets, which
seems like a natural.)

I think the right answer for this is to have some sort of canonical ordering
for hash keys. Seems like a pain given the need to also randomize hash
keys. I'll file a ticket, but won't aim to fix it this go-around.

    Vern