Hi + LL Analyzer

Hi everyone,

I am researcher in KIT where I’m also doing my PhD. I had the chance to meet some of you in the last (well first) Bro Europe workshop.

My research work is focused on the cyber-physical security of communication networks of substations based on IEC 61850. Thus, I was wondering if I could use Zeek for network monitoring of some layer 2 protocols (GOOSE and SV).

I already quickly discussed the topic with Jan. However, I still would like to know more details what has been already done and what the current state is. To start also gathering some ideas about the topic.

Best,

Ghada

To add a bit more context: The idea is to implement a plugin interface for low-level analyzers (see https://github.com/zeek/zeek/issues/248) and collect requirements on the list.

Some first thoughts and questions:
- What would be the lowest layer to built up on or should everything be pluggable down to the packet source?
- What about the concept of connections? For some LL protocols the concept might be counterintuitive.
- The interface should support to pass payload to other analyzers. Does it make sense to come up with a generalized DPD-mechanism?

Jan

(I realized this slipped through the cracks, sorry for the late
feedback, hope it still helps)

- What would be the lowest layer to built up on or should everything be
pluggable down to the packet source?

I see three pieces here overall that I think can be tackled
independently:

(1) Link-layer: Currently hardcoded in Packet::ProcessLayer2()

(2) IP-Layer: Currently hardcoded in NetSessions::NextPacket()

(3) Transport-layer: Currently hardcoded in NetSessions::DoNextPacket().

Case (1) is all about skipping the header to get to IP. There's some
redundancy across cases, though, and MPLS makes it all more messy.

With (2), a plugin would be able to add support for non-IP protocols.
However, due to Bro generally assuming that it is analyzing IP, the
plugin would either need to take care of such packets completely (like
ARP does), or eventually get to an IP packet that it can then feed
back for further analysis (like if it some kind of a tunnel).

Similar for (3): A plugin would be able to add support for further
transport layer protocols, but it'd be mostly about stripping
additional headers to eventually get to TCP/UDP/ICMP.

There's also a more general version of (2) and (3) where we'd remove
Bro's assumption of analyzing TCP/IP protocols. But that's a separate,
large effort by itself.

On a technical level, plugging in such low-level analyzers needs to be
very efficient, in particular if we move the currently hardcoded cases
into the plugins as well (as I think we should; similar to how
application-layer analyzers have all moved into internal plugins).
Then the lookup-the-analyzer-and-dispatch operation will happen
multiple times for every packet.

- What about the concept of connections? For some LL protocols the
concept might be counterintuitive.

Couple cases there:

- If there's really no sense of a connection, then the plugin will
  need to take complete care of the packets, as the rest of Bro
  assumes connection-semantics.

- If it's just the definition of what defines a connection that is
  different, then I think we could make that more flexible. I've been
  hoping for a while that we can make Bro's notion of connection IDs
  dynamic, so that it's not necessarily just the 5-tuple. There are
  use cases outside of new protocols for this, too. For example, one
  could include the VLAN ID to deal with overlapping IP ranges in
  independent VLANs.

- The interface should support to pass payload to other analyzers. Does
it make sense to come up with a generalized DPD-mechanism?

Not quite sure what you're thinking here, but I believe that fully
solving this would require addressing Bro's overall assumption of
analyzing TCP/IP. For now, maybe the best way would be just having the
analyzer call back into entry points corresponding to the various
layers where analysis would then proceed as normal. I.e., some
variation of: ProcessLinkLayer(...), ProcessIP(...),
ProcessTransport(data), ProcessAppLayer(...). The caller would be
responsible for providing all the right (meta-)data, like IP headers.
Were you thinking something different / more general?

Robin

I see three pieces here overall that I think can be tackled
independently:

(1) Link-layer: Currently hardcoded in Packet::ProcessLayer2()

(2) IP-Layer: Currently hardcoded in NetSessions::NextPacket()

(3) Transport-layer: Currently hardcoded in NetSessions::DoNextPacket().

At first glance it looks like IP-layer multiplexing is done in NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is tackled in Manager::BuildInitialAnalyzerTree in context of initializing a connection.

Case (1) is all about skipping the header to get to IP. There's some
redundancy across cases, though, and MPLS makes it all more messy.

One thing that comes to my mind here is whether it might be possible to pass information such as VLAN tags, MPLS labels or link layer addresses to upper layers in a generic way without hardcoding. However, that might be out of scope for now.

With (2), a plugin would be able to add support for non-IP protocols.
However, due to Bro generally assuming that it is analyzing IP, the
plugin would either need to take care of such packets completely (like
ARP does), or eventually get to an IP packet that it can then feed
back for further analysis (like if it some kind of a tunnel).

The non-IP packet might also contain a Transport-layer PDU. I guess it should be possible to pass these on as well.

There's also a more general version of (2) and (3) where we'd remove
Bro's assumption of analyzing TCP/IP protocols. But that's a separate,
large effort by itself.

That is the central point. So a first step would be to rely on TCP/IP in the "middle" of the stack but allow pluggable Link-layer protocols. Those might feed their data to the TCP/IP pipeline or handle them on their own. The next step would be the IP-layer.

On a technical level, plugging in such low-level analyzers needs to be
very efficient, in particular if we move the currently hardcoded cases
into the plugins as well (as I think we should; similar to how
application-layer analyzers have all moved into internal plugins).
Then the lookup-the-analyzer-and-dispatch operation will happen
multiple times for every packet.

One question here would be whether it makes sense to assume that the set of LL-analyzers tash should be available is known at compile-time?

- What about the concept of connections? For some LL protocols the
concept might be counterintuitive.

Couple cases there:

- If there's really no sense of a connection, then the plugin will
   need to take complete care of the packets, as the rest of Bro
   assumes connection-semantics.

Maybe there is another general abstraction that is worth to be supported as well. I was thinking of request-reply-pairs that can be correlated. However, I haven't put much thought into this, yet.

- If it's just the definition of what defines a connection that is
   different, then I think we could make that more flexible. I've been
   hoping for a while that we can make Bro's notion of connection IDs
   dynamic, so that it's not necessarily just the 5-tuple. There are
   use cases outside of new protocols for this, too. For example, one
   could include the VLAN ID to deal with overlapping IP ranges in
   independent VLANs.

I think this would be part of the larger effort to re-think Zeek's notion of connections. This could be addressed together with implementing a flexible mechanism to make meta data like LL-addresses available in context of a connection.

- The interface should support to pass payload to other analyzers. Does
it make sense to come up with a generalized DPD-mechanism?

Not quite sure what you're thinking here, but I believe that fully
solving this would require addressing Bro's overall assumption of
analyzing TCP/IP. For now, maybe the best way would be just having the
analyzer call back into entry points corresponding to the various
layers where analysis would then proceed as normal. I.e., some
variation of: ProcessLinkLayer(...), ProcessIP(...),
ProcessTransport(data), ProcessAppLayer(...). The caller would be
responsible for providing all the right (meta-)data, like IP headers.
Were you thinking something different / more general?

While I haven't looked into it, I noticed that there are distinct PIA implementations for TCP and UDP. In case we allow to plug in new transport protocols, they might need their own PIA to support the analysis of known protocols like HTTP etc. However, if we keep a focus on TCP/IP as suggested that would be out of scope for now.

Jan

At first glance it looks like IP-layer multiplexing is done in
NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is tackled
in Manager::BuildInitialAnalyzerTree in context of initializing a
connection.

Well, there, too. :slight_smile: That's indeed doing the packet dispatching, while
DoNextPacket() sets up state mgmt. It's all not quite clear cut, which
is part of the problem.

That is the central point. So a first step would be to rely on TCP/IP in the
"middle" of the stack but allow pluggable Link-layer protocols. Those might
feed their data to the TCP/IP pipeline or handle them on their own. The next
step would be the IP-layer.

Yeah, that sounds good to me.

One question here would be whether it makes sense to assume that the set of
LL-analyzers tash should be available is known at compile-time?

The built-in ones can be known, but any added through dynamic plugins
can't really. We'll know only at runtime what the final set is. But we
could precompute a lookup table in advance at startup that maps link
types to analyzers.

I think this would be part of the larger effort to re-think Zeek's notion of
connections. This could be addressed together with implementing a flexible
mechanism to make meta data like LL-addresses available in context of a
connection.

Yep.

In case we allow to plug in new transport protocols, they might need
their own PIA to support the analysis of known protocols like HTTP
etc.

Yeah, or a more generic PIA that provides its own hook for plugins.
The main difference between TCP/UDP PIAs is packet vs stream
semantics, iirc. That might generalize sufficiently, but not sure.

Robin

The question here would be whether LL-analyzers have to be linked dynamically. Another option would be to require users to build Zeek if they need additional LL-analyzers. The analyzers would still be modular but using some meta programming one might be able to generate efficient dispatching code at compile-time. If the focus is on performance we could benchmark both approaches and decide based on the results.

Jan

Well, the point of the plugin API is being able to add new
functionality externally through an independently compiled shared
library. Excluding link-layer analyzers from that would feel like a
gap to me. That said, we definitely need to benchmark performance to
make sure it's feasible. My hunch is that a lookup table should be
good enough, but we'll see.

Robin