serialization problems

I've been having a heck of a time porting the istate.events unit test to the policy-scripts-new branch. I understand the script changes that need to be done and everything, but the way the test compares the serialized events from each bro instance has started to fail: they are reporting differing values of arguments in some places that I'm trying to understand.

I think the difference isn't a result of the new policy scripts themselves, but probably just that they're exercising some part of the serialization code that wasn't before. Here's an simpler example script I wrote that (I think) shows the same kind of thing I was running into with the new http scripts:

---------------- event_serialize_test.bro ----------------

type I: record {
    method: string;
    cnt: string &default="";
};

type S: record {
    pending: table[count] of I;
};

type C: record {
    somefield: string &default="blah";
    state: S;
};

global E: event(c: C);

event E(c: C)
    {
    print c;
    c$state$pending[1]$method = "after event";
    c$state$pending[1]$cnt += "*";
    }

event bro_init()
    {
    capture_events("events.bst");
    local c: C;
    local i: I;
    c$state$pending[1] = i;
    c$state$pending[1]$method = "by init";
    event E(c);
    event E(c);
    }

---------------- event_serialize_test.bro ----------------

Here's some output that looks ok:

$ ./src/bro event_serialize_test.bro
[somefield=blah, state=[pending={
[1] = [method=by init, cnt=]
}]]
[somefield=blah, state=[pending={
[1] = [method=after event, cnt=*]
}]]

But here's the serialized events:

$ ./src/bro -x events.bst event_serialize_test.bro
Event [1308772552.798098] E([somefield="blah", state=[pending={[1] = [method="by init", cnt=""]}]])
Event [1308772552.798098] E([somefield="blah", state=[pending={[1] = [method="by init", cnt=""]}]])

So the value of the 'pending' table doesn't seem right to me for the second serialization of event E.

After enabling the serialization debug logs, I think what I'm seeing is that the first event is serialized with the full table value, but the second event is serialized with just a reference to the first's even though that value has changed.

Does this seem like a problem or am I not really on the right track?

- Jon

After enabling the serialization debug logs, I think what I'm seeing
is that the first event is serialized with the full table value, but
the second event is serialized with just a reference to the first's
even though that value has changed.

Yes, that's exactly what's happening. Generally, the serialization
framework sends an object only the first time, and from then on sends
just references to it (i.e., unique IDs).

While for your example this is not ideal, it's a trade-off with three
other objectives: (1) keeping the volume manageable (always sending
everything would quickly become a lot; and it's actually not easy to
find what has changed since last time); (2) maintaining correct
references where they are needed (think about two record types A and B
both having a subfield of record type C; now separate instances of A
and B can reference the *same* instance of C, and if we send A and B
to a peer, that structure needs to be recreated over there); and (3)
making this all work with remote peers that come and go during
run-time as they please ...

Also, the use-case you describe is rare in actual Bro scripts: it
involves a single record instance that's passed into two events and
modified in between. (Well, I *believe* it's still rare; with scripts
getting more complex these days, that may be changing ...).

And note that for Bro-to-Bro communication, record modifications are
normally covered by using &synchronized. For that, state operations
are send across the channel and then replayed on the other side to
reflect the update.

This whole sending-object-IDs around is, btw, also the reason why we
can't do real broadcasts ...

Robin

> After enabling the serialization debug logs, I think what I'm
> seeing is that the first event is serialized with the full table value,
> but the second event is serialized with just a reference to the first's
> even though that value has changed.

Yes, that's exactly what's happening. Generally, the serialization
framework sends an object only the first time, and from then on sends
just references to it (i.e., unique IDs).

Ok. And I think it also makes more sense to me when I think about it in the context of how the serialized events will be replayed -- against the same given input with `bro -R`, the values should actually come out to be the same.

But the simple example I gave doesn't really show what's happening with the istate.events test in policy-scripts-new like I thought it did...

I think what's happening there is that the sender serializes events as expected into events.bst with references as appropriate, but for a given event the receiver serializes the full value into its events.bst. So when both sides read back their version of events.bst, some values actually differ. I could be wrong, though, the serial debugging log was pretty hard to read with that complex of a situation.

I'll commit my current version of the test that's failing to policy-scripts-new and maybe you can take a look, else I'll try to come up with another simple example.

- Jon

Ok, I'll take a look (potentially not before we merge the branch but I
hope that will be soon :slight_smile:

Robin

Give me a couple of days to do a bit more cleanup on stuff, SSL isn't done and I want to cleanup the Intelligence framework. I definitely agree we're very close to being able to merge it.

The cluster framework can come after the merge, but I don't think I have too much longer on that either.

  .Seth