Logging VLAN IDs

Dear Bro developers,

I've been tasked with trying to modify the Bro source code so that
conn.log includes the VLAN IDs (including 802.1ah) that have been observed
in packets associated with that connection. I've scoped out a solution,
but I want to run it by you first before I start to go for it, in case I'm
missing something really big.

PktSrc::Process() does processing of VLAN and 802.1ah, but it just skips
over them by advancing the data pointer. I will, in addition, store those
VLAN IDs in a new member of the modified PktSrc class. This gets passed on
through net_packet_dispatch() and NetSessions::DispatchPacket(). At this
point NetSessions::NextPacket() gets called, but since the PktSrc doesn't
get passed to it, I'd need another way to pass it the VLAN ID. I am
considering two options:

1. duplicate NextPacket() adding a new parameter to pass it the VLAN IDs,
and call that instead, or
2. store the VLAN IDs in the NetSessions class, in DispatchPacket() so
it¹s available to NextPacket() and DoNextPacket() <- Is there a reason
this wouldn¹t work, e.g. issues with multi-threading/multi-processing?

Is there one option that seems better to you?

NetSessions::DoNextPacket() is called next and I would also need a
modification to pass it VLAN IDs, using one of the options above. In this
method we finally get access to the appropriate Connection instance, so I
would store the VLAN IDs in that instance in DoNextPacket().

I'd need to modify the Connection class in Conn.h to include a new member
for tracking VLAN IDs. I'd modify Connection::BuildConnVal() and
scripts/base/init-bare.bro's connection record to make the VLAN IDs
available to scripts. Lastly, I'd write a script to redef the conn Info
structure and handle one or more connection events (perhaps
connection_state_remove) to copy the VLAN IDs from the connection record
to the Info record.

Is there anything I'm missing? Is there a better way to approach this?

And a big +1 to this. Would love to be able to filter VLAN's as well as we have listen to physical interfaces that have other interfaces mirrored that include some unwanted VLAN's.

James

(Cc'ing bro-dev, I suggest we continue the thread there).

This sounds generally reasonable, however I think we could take the
opportunity here to generalize this a bit more for generally including
link-layer information into connection handling.

One thing that I didn't quite get form your description is if the
objective is really just to get the VLAN ID into conn.log, or whether
you also want to use it for defining what constitutes a connection in
the first place. The latter would aim at the situations where the same
IP addresses can appear on different VLANs for independent
connections. Right now, Bro can't keep them apart, but if we made the
VLAN part of the connection index into the session table, it would
treat them separately. Same applies to other link-level features that
could sometimes be useful to be a part of a connection's ID (like MAC
addresses).

With that in mind, some thoughts on generalizing this (note, I not
sureif you're working from 2.3 or git master. The PktSrc API has
changed a bit recently, I'll take git as my starting point).

    - One challenge is passing the the VLAN ID through to the various
      packet-related methods. You're suggesting additional parameters,
      which would work. However, these methods are already taking a
      bunch of parameters, and if in the future we wanted to pass
      through further link-layer info, we'd have to add even more. A
      more flexible alternative would be switching to simply passing a
      Packet structure around that encapsulates all the information,
      including what's already there (e.g., timestmap, pcap_hdr,
      payload, etc.). The new PktSrc API already has such a class:
      PktSrc::Packet; from a quick look I think we could elevate that
      to be something passed around more generally, and then extend it
      accordingly.

    - For the connections, I would store the VLAN inside the ConnID
      struct, and then modify BuildConnIDHashKey() to take it into
      account. That way, the session table will make it part of its
      index. Same for the script-land conn_id record; that will then
      make script-level tables work that index by conn_id.

    - Extending the ConnID like this could actually be made a run-time
      option: I believe it shouldn't be too difficult to let users
      chose the fields defining a ConnID, so that they can decide if,
      say, they want to VLAN to be in there or not. We could predefine
      a set of potential features to choose from, along with some
      script-land API to pick the set to use, with the current 4-tuple
      being the default. (This could be a 2nd step for later; if the
      first two points above were in place, this extension should
      become mainly a question of finding the right configuration
      interface.)

I haven't thought this thruogh too carefully, so it's conceivable that
I'm missing something. But I think it would be really helpful for many
folks to get more flexibility into the definition of what consitutes a
connection, with VLANs being a good initial target to support.

Robin

Hi Robin, thanks for the reply. You can remove bro@bro.org if you want,
and I will follow your lead.

I was working from 2.3.2. I¹ll take a look at git master next.

In my specific case, I¹m just interested in logging VLAN IDs. For each
connection it could be a set of VLAN IDs, perhaps multiple IDs tracked in
each direction. I¹m dealing with a scenario where multiple streams of
traffic are coming into my Bro sensor¹s interface. I want a way to
demultiplex them, without relying upon IP address blocks. My ultimate goal
is to be able to identify which networks a connection is associated with.
If the VLAN IDs are just encapsulated in the connection ID, that would not
gain me anything.

Also, having link-level features be part of the connection index may be
dubious, but I¹ll let the experts decide. With MAC addresses, for example,
a packet might take a different route than other packets in the
connection, but I wouldn¹t want Bro to treat it as a different connection.
Perhaps the same thought applies to VLANs, I don¹t know. I¹m also having
difficulty coming up with a security scenario where it provides additional
benefit to have link-level features in the connection index, except
perhaps detecting packet spoofing.

Looking forward to your thoughts on this,

Hi Robin,

I thought more about your generalized idea and would like to follow up. To
start, adding link-level features to the connection ID hash, while perhaps
useful in some contexts, does not provide us the functionality we desire.
I have an incoming feed of VLAN-tagged traffic (both VLAN and 802.1ah)
with perhaps dozens of different VLANs, and I would like to handle the
connections differently in scripts but also mainly in offline log analysis
depending upon which VLANs the traffic is associated with.

Initially I had proposed simply adding the VLAN Ids to the conn.log file,
but that is certainly too specific of a solution. What are your thoughts
on exposing link-level features at the script layer for connections? For
example, if all observed VLAN tags for a connection were in a set variable
of the script-level Connection record, I could then label my data by
matching VLAN Ids, then process them differently accordingly. Thoughts?

What if we did a combination of what I suggested and your thoughts
here? We carry link-level features through to script-land inside the
connection record, and in addition allowed to transfer a custom subset
over to the connection ID for hashing? The latter could be done later
as a second step.

Robin

That sounds good! Both ideas seem to add an interesting level of
additional flexibility and analytic potential.

Apologies for resurrecting an old thread.

I’m wondering if anyone has given any further thought to or done any work on this. While looking at BIT-1480 (adding ERSPAN decapsulation support), I was reminded of what a mess Sessions.cc currently is. I think moving towards passing a Packet structure around would help to simplify things a lot - possibly by breaking up the code into per-protocol classes.

Curious to hear any thoughts. Thanks,

–Vlad

I'm wondering if anyone has given any further thought to or done any work
on this.

Yep, it's in place. :slight_smile: See fb848f795de8ef22987ba01980ff65be1231b312.

was reminded of what a mess Sessions.cc currently is. I think moving
towards passing a Packet structure around would help to simplify
things a lot

And that too: fe3579f1b4c82db3cd9cd6f372c1a5387bac24ac and
c72d191ab5c960be94c9f516aa94ec214f54b0c7

Does that do what you're looking for?

Robin

Oooh, yes, thank you. I’m not sure how I missed that, but that looks nice.