Inaccuracy of orig_bytes and resp_bytes?

WIth reference to the conn.log spec base/protocols/conn/main.zeek — Book of Zeek (git/master)

orig_bytes: The number of payload bytes the originator sent. For TCP this is taken from sequence numbers and might be inaccurate (e.g., due to large connections)
resp_bytes: The number of payload bytes the responder sent. See orig_bytes

Are these values inaccurate to the point that they should not be relied upon? Here is an example of a common conn.log entry for TCP traffic:

orig_bytes: 2,145,967,013
resp_bytes: 81,091,225
duration: 15.638
orig_ip_bytes: 104
resp_ip_bytes: 340

The orig_bytes and resp_bytes values appear ‘impossible’ due to the amount of bytes, the short duration, and in relation to the respective ip_bytes. Such entries are really messing with visualizations and analysis using orig_bytes and resp_bytes. Are others relying on orig_ip_bytes and resp_ip_bytes instead of orig_bytes and resp_bytes? Should orig_bytes ever be larger than orig_ip_bytes?

We are running Zeek 5.0.3 on a RPM/RedHat-based multi-node cluster with pf_ring, Myricom 10Gb NICs, OpenSearch, and Corelight’s zeek2es.

Any comments on the experiences of others with these records would be appreciated.

Ryan

More on this topic,… on a sample of 37.4 million "proto": "tcp" conn.log entries approximately 30% have orig_bytes larger than orig_ip_bytes.

Considering the conn.log spec:

orig_ip_bytes: Number of IP level bytes that the originator sent (as seen on the wire, taken from the IP total_length header field).

this should always be larger than "orig_bytes" by a minimum of "orig_pkts" * 20 bytes for the minimum size of an IPv4 header.

Here are more details on the example given above:

    "proto": "tcp",
    "duration": 15.637575,
    "orig_bytes": 2145967013,
    "resp_bytes": 81091225,
    "conn_state": "RSTR",
    "local_orig": false,
    "local_resp": true,
    "missed_bytes": 81091225,
    "history": "ShAgr",
    "orig_pkts": 2,
    "orig_ip_bytes": 104,
    "resp_pkts": 6,
    "resp_ip_bytes": 340,

If "orig_pkts": 2 is to be trusted, the theoretical maximum for "orig_bytes" is 65,535 minus two 20 byte IP headers being 65,495 bytes. Here it is showing 2,145,967,013 bytes.

If "resp_pkts": 6 is to be trusted, the theoretical maximim for "resp_bytes" is 65,535 minus six 20 byte IP headers being 65,475 bytes. Here it is showing 81,091,225 bytes.

I notice most of the impacted conn.log entries have "conn_state": "RSTR" and "history" contains a "g" (for gaps) and a large value for "missed_bytes".

Perhaps there should be a condition in the code that if the value computed for "orig_bytes" or "resp_bytes" is larger than the number of IPv4 packets * 65,335 bytes then to set the corresponding _bytes value to “-1” (unknown) instead of logging impossibly large values.

Ryan

@ryanwmcronald - thanks for the data and observations.

Should orig_bytes ever be larger than orig_ip_bytes?

The orig_bytes and resp_bytes for TCP are determined from the TCP sequence/ack numbers within packets, while the orig_ip_bytes and resp_ip_bytes accounts for packets actually seen by Zeek on the wire, as the docs state. So yes, in the presence of packet loss/gaps (as your entry shows) that can happen.

More on this topic,… on a sample of 37.4 million "proto": "tcp" conn.log entries approximately 30% have orig_bytes larger than orig_ip_bytes.

Do you incur packet loss? Can try the following stats and capture-loss policy scripts or calculate the percentage of conn entries with gaps in them yourself.

You could further look into weird.log for the affected connections and see if something stands out.

If it appears unrealistic that a 2GB TCP connection between the two involved hosts above should happen, would it be possible to capture and share an anonymized pcap containing the few packets of such a connection that Zeek actually saw (I know this isn’t easy). That would allow to dig into the details and possibly explain what’s going on.