Updating / Accessing ConnVal from child analyzers

Hi,

has anybody (Robin?) a good idea on how I could update / access a
connections's ConnVal from Child analyzers? In my case that would be to
facilitate counting and reporting of number of packets (and bytes) seen
on the wire for this connection.

What Bro currently does:
  * Conn keeps a ConnVal pointer. A call to BuildConnVal() will update
    that pointer.
  * BuildConnVal() calls root_anlyzer->UpdateEndpointVal() to get the
    current size and state. (root_analyzer is either UDP_Analyzer or
    TCP_Analyzer).
  * UpdateEndpointVal() is abstract in TransportAnalyzer, which both
    UDP_Analyzer and TCP_Analyzer inherit.

My counting analyzer (ConnSize) are children (in the DPD sense, not the
class hierarchy sense) of UDP_Analyzer and TCP_Analyzer.

Two ways come to mind:
a) My ConnSize analyzer could just update the ConnVal on every packet.
   Every analyzer has a pointer to its Conn class. But the pointer is
   private in Analyzer.h, so derived classes don't have access to it.
   However, the Conn instance is passed to the ConnSize constructer, so
   I could just keep a copy of this conn instance.
   Then I have access to the conn instance. In order to access to the
   ConnVal, I'd need to add a method to conn to update the ConnVal, or
   make my analyzer a friend of Conn (since the ConnVal is protected).
   I would also have to update ConnVal on every packet (instead of only
   when requested by BuildConnVal).

b) I make my ConnSize analyzer inherit from TransportAnalyzer and
   implement UpdateEndpointVal.
   TCP_Analyzer::UpdateEndpointVal and UDP_Analyzer::UpdateEndpointVal
   could then iterate through their children, check whether they are
   derived from TransportAnalyzer (I think I can do this check with a
   dynamic_cast, but I don't know how standard this is). If the child
   is a TransportAnalyzer, the parent can call the child's
   UpdateEndpointVal.
   I guess one consideration here would be whether we see other
   use-cases in which child analyzers update the connection record
   (actually it's the endpoint record) or whether that would only ever
   be used for my counting.

I think a) is really ugly.

What are your thoughts on b)?

Does anybody have better/other ideas?

cu
Gregor

a) My ConnSize analyzer could just update the ConnVal on every packet.
   Every analyzer has a pointer to its Conn class. But the pointer is
   private in Analyzer.h, so derived classes don't have access to it.

(It's private but there is an accessor method, Conn(), so you would
not have to store a copy.)

   I would also have to update ConnVal on every packet (instead of only
   when requested by BuildConnVal).

Yeah, that's the part I don't like here.

I think a) is really ugly.

What are your thoughts on b)?

b) is really ugly as well. :slight_smile: (Even more so than (a) IMHO ...)

Here's another idea:

    - move UpdateEndpointVal() from TransportAnalyzer to Analyzer

    - have BuildConnVal() iterate over the analyzer tree and call
      *every* analyzer's UpdateEndpointVal. Doing so gives all
      analyzers the chance to update the value as appropiate, with
      most often just doing nothing. In principle, there could be
      conflicts with multiple analyzers updating the same fields,
      but in practice that shouldn't happen and we can just declare
      that situation as "undefined".

This adds a tiny bit of overhead but my guess is that it would be
quite a bit less noticable (if at all) than updating the counters
with every packet.

What do you think?

Robn

Here's another idea:

    - move UpdateEndpointVal() from TransportAnalyzer to Analyzer

    - have BuildConnVal() iterate over the analyzer tree and call
      *every* analyzer's UpdateEndpointVal. Doing so gives all
      analyzers the chance to update the value as appropiate, with
      most often just doing nothing. In principle, there could be
      conflicts with multiple analyzers updating the same fields,
      but in practice that shouldn't happen and we can just declare
      that situation as "undefined".

This adds a tiny bit of overhead but my guess is that it would be
quite a bit less noticable (if at all) than updating the counters
with every packet.

I really think that the overhead should be small. After all, you have to
call DeliverPacket/DeliverStream on every *packet*, but
UpdateEndpointVal() only whenever an event is generated.

What do you think?

If we do that, we should probably generalize this, so that Analyzers can
update the full ConnVal record and not just the endpoints. Afterall, it
might well be the case that analyzers want to store some information
that's for the connection as a whole and not just one side of the
connection. So instead of of UpdateEndpointVal, we could have an:
  UpdateConnVal (RecordVal *connval,
         RecordVal *orig_endp, RecordVal *resp_endp)

(note that the endpoint val would also be accessible by extracting them
from the connval, but I think this is easier).

One disadvantage is now, that analyzers can change the actual ConnVal,
including starttime. (OTOH, they can also change the services set).
Plus, analyzers should not change the pointers to the endpoints that are
stored in the connval.

What we could do though is to add an additional RecordVal to ConnVal,
that analyzers can overwrite:
type connection: record {
    id: conn_id;
    orig: endpoint;
    resp: endpoint;
    start_time: time;
    duration: interval;
    service: string_set; # if empty, service hasn't been determined
    addl: string;
    hot: count; # how hot; 0 = don't know or not hot
    history: string;
    foo: conn_addl &optional;
};

type foo_addl: record {
   bar: string &optional;
}

Then we could have an
   UpdateConnVal (RecordVal *foo_addl,
         RecordVal *orig_endp, RecordVal *resp_endp)

(We could then even move the "history" field into the optional foo_addl,
since it's often not set. This might save some memory.)
(I actually wanted to name the additional foo field in connection addl,
but unfortunately this name is already taken :frowning: )

What do you think about that?

cu
Gregor

> Here's another idea:
>
> - move UpdateEndpointVal() from TransportAnalyzer to Analyzer
>
> - have BuildConnVal() iterate over the analyzer tree and call

This sounds good to me too.

One disadvantage is now, that analyzers can change the actual ConnVal,
including starttime.

This doesn't strike me as a significant problem in practice. It's not as
though we're dealing with adversarial Analyzer's :-).

What we could do though is to add an additional RecordVal to ConnVal,
that analyzers can overwrite:

That strikes me as more complexity than is merited if the only concern is
isolating what Analyzers can do. (However, maybe this would be reasonable
as a generalization of $history.)

    Vern

  UpdateConnVal (RecordVal *connval,
         RecordVal *orig_endp, RecordVal *resp_endp)

I can see passing the fill conn_val in, that would give more
flexibility at not much cost. But:

(note that the endpoint val would also be accessible by extracting them
from the connval, but I think this is easier).

... not sure about this: passing them in seems redundant, and it's
not much work to get the fields, so I'd just do the connval.

    foo: conn_addl &optional;

type foo_addl: record {
   bar: string &optional;
}

I don't see much benefit with this, given that one will still have
to adapt foo_addl when an analyzer needs to add something. It also
creates another structure that mixes information from different
analyzers (which "connection" already does).

Robin