Hello!
I’m trying to set up Zeek to get HTTP data from the network interface, but I’m getting fewer HTTP events than I expected.
So I recorded a pcap file and tried to feed it to zeek directly and to tshark - and it looks like tshark has no problem getting the correct number of HTTP requests and responses, while I struggle to get the same with zeek.
This is how I run zeek:
zeek -b -r test.pcap -C http_test.zeek
This is my http_test.zeek
:
@load base/protocols/http
global log_file: file;
event zeek_init() {
print "zeek_init";
log_file = open("zeek-output.log");
}
event zeek_done() {
print "zeek_done";
close(log_file);
}
event http_request(c: connection, method: string, original_URI: string, unescaped_URI: string, version: string) {
local msg = fmt(
"http_request: connection$uid: %s, method: %s, original_URI: %s",
c$uid, method, original_URI,
);
print log_file, msg;
}
event http_reply(c: connection, version: string, code: count, reason: string) {
local msg = fmt(
"http_reply: connection$uid: %s, code: %s",
c$uid, code,
);
print log_file, msg;
}
event http_event(c: connection, event_type: string, detail: string) {
local msg = fmt(
"http_event: connection$uid: %s, event_type: %s, detail: %s",
c$uid, event_type, detail
);
print log_file, msg;
}
event http_connection_upgrade(c: connection, protocol: string) {
local msg = fmt(
"http_connection_upgrade: connection$uid: %s, protocol: %s",
c$uid, protocol
);
print log_file, msg;
}
The results I get with tshark are:
$ for stream_id in $(tshark -r test.pcap -T fields -e tcp.stream -Y "http" | sort -n | uniq); do
echo stream ${stream_id}
echo " requests $(tshark -r test.pcap -Y "tcp.stream == ${stream_id} && http.request" | wc -l)"
echo " responses $(tshark -r test.pcap -Y "tcp.stream == ${stream_id} && http.response" | wc -l)"
done
stream 0
requests 53
responses 54
stream 1
requests 1000
responses 1000
stream 2
requests 943
responses 944
The results I get with zeek are:
$ for event in http_request http_reply http_event http_connection_upgrade; do
echo "event ${event} $(grep ${event} zeek-output.log | wc -l)"
done
event http_request 1082
event http_reply 1944
event http_event 1
event http_connection_upgrade 0
In my zeek log file there are only two unique connection ids - instead of 3 streams that tshark has found. One of the streams seems ok, the other one - after a couple of request events followed by response event has this:
http_event: connection$uid: CFlMto4idR419sL4we, event_type: content gap, detail: seq=440640, len=4096
and since this message there are http_reply events only, with no requests at all.
Is this the expected behavior? I’ve seen this, and I get that lost TCP packets can lead to reassembling issues, but this still seems off, and tshark doesn’t seem to struggle with the same pcap file.
Maybe there are some zeek reassemble logic settings I can play with?
Any help is appreciated.
( it doesn’t let me attach the pcap file, so here’s the link in case you want to look at it yourself)