http body q.

Hi,
I am trying to understand the behavior of bro with respect to logging http request when the http request has a large body.

In my script, I am trying to log http body. I agree, http bodies can be large. However, I need the body for further parsing and analysis of traffic based on the content of the body content. To capture the body, I am setup events for http_entity_data and http_end_entity. In the ‘http_entity_data’ event, I am accumulating the body data into a request variable. In the end_entity event I am encoding body data using base64_encode (since body can include non printable characters).

This seems to work fine for small bodies. However, for large bodies, I noticed that the log gets written without the body getting encoded. To debug, I added a log filter. In the log predicate call, I can see the http log writing happening before the end_entity even is called.

Is this how it’s supposed to work?

Sorry, hit send before I finished my message.

Is there something I can do to ensure my end_entity event is invoked before http_log event is called? Any input is appreciated. Thanks.

Dk.

This seems to work fine for small bodies. However, for large bodies, I noticed that the log gets written without the body getting encoded.

For performance reasons, there's an option which throttles the maximum
amount of data provided in entity_data events to 1500 bytes by
default:

https://docs.zeek.org/en/stable/scripts/base/init-bare.bro.html#id-http_entity_data_delivery_size

So you can see if changing that `http_entity_data_delivery_size`
option works for you.

Is there something I can do to ensure my end_entity event is invoked before http_log event is called? Any input is appreciated. Thanks.

I think normally that should not be the order of things. Might be
easier to explain what's going on if you provide an example script and
pcap that reproduces what you are seeing.

- Jon

Hi Jon,
I’ve attached a minimal script that does what I am describing. However, when I try this minimal script on try.bro.org it works correctly as I expected. However, in my cluster setup, the end_entity event from response body is coming first, then log-filter call, followed by end entity call for request. Since I have other events I am processing for the request, not sure if they are affecting the order. Giving a priority of 10 for the end_entity event did not fix it either. Running the minimalist script I gave here also causes the problem on my system but not try.bro.org website. Really puzzling :frowning:

fyi, I’ve included the logs I see in my reporter log which indicates the order of the events at the end of this email

Dk.

PS: pcap link https://www.dropbox.com/s/epx2nz60d06uor9/long-user.pcap?dl=0

@load base/protocols/http/main

module HTTP;

export {
redef record Info += {
request_body: string &log &optional;
request_body_complete: bool &log &default=F;

response_body: string &log &optional;
response_body_complete: bool &log &default=F;
log_pred: bool &default=T;
};
}

event bro_init()
{
Log::remove_default_filter(HTTP::LOG);
Log::add_filter(HTTP::LOG, [$name=“new-default”,
$pred(rec: HTTP::Info) = {
Reporter::info(fmt(“request body complete: %s, bsize=%d”
, rec$request_body_complete?“true”:“false”
, |rec$request_body|));
return rec$log_pred;
}
]);
}

event http_entity_data(c: connection, is_orig: bool, length: count, data: string)
{
if (is_orig) {

request body accumulation

if (is_orig && !c$http?$request_body)
c$http$request_body = “”;

c$http$request_body += data;
} else {

response body accumulation.

if (!is_orig && !c$http?$response_body)
c$http$response_body = “”;

c$http$response_body += data;
}
}

event http_end_entity(c: connection, is_orig: bool)
{
Reporter::info(fmt("%s: end_entity called for %s, bsize=%d"

, c$uid, is_orig?“request”:“response”, |c$http$request_body|));

if (is_orig) {
if (|c$http$request_body| > 0) {
c$http$request_body = encode_base64(c$http$request_body);
c$http$request_body_complete = T;
}
} else {
if (|c$http$response_body| > 0) {
c$http$response_body = encode_base64(c$http$response_body);
c$http$response_body_complete = T;
}
}
}

Logs order on my setup:

For the pcap you gave, it looks like that is actually the real order
of the packets: the client is in the middle of sending the request
body, but we see the server's OK response before the request is even
finished (server also starts trying to reset connection at that
point). So seems like it's weirdness to blame on that particular HTTP
server? Or do you generally see this same pattern for other servers,
too?

- Jon

Uhm! didn’t notice that. Thanks for pointing out. Looks like my earlier experiment on try.bro.org was with another server.
I am getting the same behavior now on my setup and try.bro.org setup with the two pcaps.