zeek 4.0 packet analysis questions

Hello.

Please see the attached zeek 4.0 questions file.

Thanks!

zeek_packet_analysis_questions.txt (3.17 KB)

  uint32_t len; /// Actual length on wire
  uint32_t cap_len; /// Captured packet length

I'm trying to understand the difference between these two variable.

`len` is the number of bytes sent on the wire and `cap_len` is the
number of bytes that were captured/recorded, which may be the lesser
amount depending on what the configured "snapshot length" or "snaplen" is:
that's the common pcap parlance and can read more about it in
libpcap/tcpdump or also wireshark docs.

Previously, using the Zeek 3.2.3 version of this source file, the "len" and
"cap_len" seemed to have the same value ..

Typically they'll be the same. For live capture, Zeek defaults to a snaplen
of 9216 bytes via this setting:

https://docs.zeek.org/en/master/scripts/base/init-bare.zeek.html#id-Pcap::snaplen

1) Regarding the above "len" field, is it the length of the Ethernet "packet"?
   (Or, the length of the Ethernet "frame?)

   According to wikipedia.com (Ethernet frame - Wikipedia), an Ethernet packet
   includes:
      A 7 octet "Preamble"
      A 1 octet "Start frame delimiter" (SFD)
      Then, the Ethernet "frame".

   The Ethernet "frame", starts at the "destination MAC" field. (and extends through the "Frame check sequence" field.

2 What does the "cap_len" variable contain?
     1) The length of the Ethernet packet (include the Preamble & SFD) -or-
     2) The length of the Ethernet frame? (Where the "length" field is the size of the "payload" and is always < 1500 bytes long)
     3) Or, something else?

In terms of what's done for Ethernet parsing specifically: it's in terms of
"Ethernet frames" since the 8-byte preamble/SFD is data that's not made
available past the hardware receiving the packets.

Also note that some APIs in Zeek, like packet-analysis, may even start using
`len` in parameter names when actually `cap_len` is the thing being passed in
(e.g. likely in cases where it may not help to know the theoretical number
of bytes on the wire versus what actually available to parse). So just
pointing out the actual values/meaning of `len`/`cap_len` can change depending
on what part of the packet-processing logic you're currently looking at, but
the values will generally decrement toward zero the further along you go
as headers get parsed and stripped.

So, my conclusion is, that the TotalLen() routine returns the length of the
  IP header
  IP packet payload.

The docs for IP_Hdr::TotalLen() are correct: it's the total length of headers
*and* payload as reported by the IP packet itself.

But, line 75 of the file
...zeek-4.0.0-rc3/src/packet_analysis/protocol/ip/IP.cc file, states:

        // total_len is the length of the packet minus all of the headers so far, including IP

Based on this comment, the "total_len" would not include:
  the length of the Ethernet header
  the length of the IP header.

So, is this comment correct?

No, looks incorrect, thanks for reporting. Hopefully just a mistake in the
comment and not surrounding logic -- I'll take a closer look and follow up if I
find anything that needs correction.

Or, should it say something like:
        // total_len is the length of the Ethernet packet minus the Ethernet header

At this point, it could be agnostic of whether the data is embedded in Ethernet
or not -- I would just remove the comment altogether and let the correct
documentation of IP_Hdr::TotalLen() be sufficient explanation.

I fixed the comment here. TotalLen() returns the full size of the IP portion of the packet, including the IP header and the payload.

Tim