Missing http events with zeek

Hello!
I’m trying to set up Zeek to get HTTP data from the network interface, but I’m getting fewer HTTP events than I expected.

So I recorded a pcap file and tried to feed it to zeek directly and to tshark - and it looks like tshark has no problem getting the correct number of HTTP requests and responses, while I struggle to get the same with zeek.

This is how I run zeek:

zeek -b -r test.pcap -C http_test.zeek

This is my http_test.zeek:

@load base/protocols/http

global log_file: file;

event zeek_init() {
    print "zeek_init";
    log_file = open("zeek-output.log");
}

event zeek_done() {
    print "zeek_done";
    close(log_file);
}


event http_request(c: connection, method: string, original_URI: string, unescaped_URI: string, version: string) {
    local msg = fmt(
        "http_request: connection$uid: %s, method: %s, original_URI: %s",
        c$uid, method, original_URI,
    );
    print log_file, msg;
}

event http_reply(c: connection, version: string, code: count, reason: string) {
    local msg = fmt(
        "http_reply: connection$uid: %s, code: %s",
        c$uid, code,
    );
    print log_file, msg;
}

event http_event(c: connection, event_type: string, detail: string) {
    local msg = fmt(
        "http_event: connection$uid: %s, event_type: %s, detail: %s",
        c$uid, event_type, detail
    );
    print log_file, msg;
}

event http_connection_upgrade(c: connection, protocol: string) {
    local msg = fmt(
        "http_connection_upgrade: connection$uid: %s, protocol: %s",
        c$uid, protocol
    );
    print log_file, msg;
}

The results I get with tshark are:

$ for stream_id in $(tshark -r test.pcap -T fields -e tcp.stream -Y "http" | sort -n | uniq); do
    echo stream ${stream_id}
    echo "    requests $(tshark -r test.pcap -Y "tcp.stream == ${stream_id} && http.request" | wc -l)"
    echo "    responses $(tshark -r test.pcap -Y "tcp.stream == ${stream_id} && http.response" | wc -l)"
done

stream 0
    requests 53
    responses 54
stream 1
    requests 1000
    responses 1000
stream 2
    requests 943
    responses 944

The results I get with zeek are:

$ for event in http_request http_reply http_event http_connection_upgrade; do
    echo "event ${event} $(grep ${event} zeek-output.log | wc -l)"
done

event http_request 1082
event http_reply 1944
event http_event 1
event http_connection_upgrade 0

In my zeek log file there are only two unique connection ids - instead of 3 streams that tshark has found. One of the streams seems ok, the other one - after a couple of request events followed by response event has this:

http_event: connection$uid: CFlMto4idR419sL4we, event_type: content gap, detail: seq=440640, len=4096

and since this message there are http_reply events only, with no requests at all.

Is this the expected behavior? I’ve seen this, and I get that lost TCP packets can lead to reassembling issues, but this still seems off, and tshark doesn’t seem to struggle with the same pcap file.

Maybe there are some zeek reassemble logic settings I can play with?

Any help is appreciated.

( it doesn’t let me attach the pcap file, so here’s the link in case you want to look at it yourself)

Running Zeek with -b tells it to run in “bare mode”, with almost all standard stuff turned off. In particular, Zeek will not run the HTTP analyzer (nor a bunch of others). What do you get if you remove -b?

What do you get if you remove -b ?

I get the same results.

I don’t need anything but HTTP data - this is why I run zeek with -b. And my .zeek file starts with @load base/protocols/http - that should load everything necessary for HTTP events to function, as far as I’m aware.

Thanks, I had missed that you did that load.

Running on the PCAP (great that you’re able to share this!), Zeek flags three connections. One of them has a history of ^dADafF which means it started with server-to-client traffic (the ^, meaning Zeek had to flip the flow’s directionality) and doesn’t include an initial TCP handshake. Because of that, it goes unanalyzed by the HTTP analyzer. A second one has a history of ShADadGt which means it has a content gap. HTTP processing will stop at that point.

Thanks for the explanation!

  1. Is there any way to configure Zeek to produce HTTP events for TCP connections for which Zeek hasn’t seen the initial handshake?
  2. Is there any way to configure Zeek to not drop HTTP processing on a content gap event completely but to try to recover the stream at some point in the future and start producing HTTP events for TCP stream sections with no gaps in it?

For (1), no doesn’t look like you can. Looking at the parser, it assumes that the starting state is expecting an HTTP request, so for a PCAP like the one you’re using, which starts with a response, even if the analyzer would pick up mid-stream, it wouldn’t be able to process it.

For (2), there’s no such recovering in the HTTP parser except if it can determine that a gap lies wholly in an HTTP response then it will skip over the gap and continue beyond it.

1 Like