Log an Arbitrarily Long Collection

Greetings Brofolk,

I have become increasingly interested in Bro lately, and I am starting to explore how my organization can use it as a general network processor to generate some verbose logs that we can then export for indexing and analytics.

The first use case I would like to explore involves generating verbose HTTP logs so that we can identify suspicious characteristics such as direct-to-IP host headers, missing headers, unexpected ordering of headers, RFC compliance issues, etc.

I spent some time auditing the main.bro script for the HTTP protocol, and then proceeded to make some edits to add additional record fields. Specifically, I created a new record:

type HeaderValue: record {
h: string &optional;
v: string &optional;
};

and within the Info type, I added the following members:

client_headers: vector of HeaderValue &log &optional;
client_headers_count: count &default=0;

server_headers: vector of HeaderValue &log &optional;
server_headers_count: count &default=0;

I decided to use a vector because I want to keep track of header order. Further down on the http_header event handler, I add each header to the appropriate vector, indexed by the count field which increments.

When I fire up bro, I get the following error message:

error in /usr/local/bro/share/bro/base/protocols/http/./main.bro, line 83: &log applied to a type that cannot be logged (&log)

So presumably Bro doesn’t like the idea of generating a log entry that includes a vector type (no less consisting of record members, I’m not even sure what that would have looked like but was hoping to find out). The next best thing I can think of doing is recording this information with some custom delimeter in a single string field, such as:

Accept:/|||Accept-Language:en-US|||User-Agent:Mozilla 4.0|||Host:somehost.com||||Connection:Keep-Alive

Further downstream I plan to convert the tab-delimited content to JSON anyways prior to indexing.

Is this a good solution for including an arbitrarily long collection field in my HTTP logs? Is there a better way to accomplish this?

Also, I have a feeling that directly editing http/main.bro is a bad practice. Should I instead be adding this script to the policy branch, redefining the HTTP Info object and handling the http header event in there?

Thanks for your attention!

Christian

server_headers: vector of HeaderValue &log &optional;
server_headers_count: count &default=0;

I decided to use a vector because I want to keep track of header order. Further down on the http_header event handler, I add each header to the appropriate vector, indexed by the count field which increments.

A minor point: you don't really have to store the count since it's implicit in the vector if you use `|…|` to get the size of the container. E.g.

  >c$http$server_headers|

in this case would be equivalent to:

  c$http$server_headers_count

error in /usr/local/bro/share/bro/base/protocols/http/./main.bro, line 83: &log applied to a type that cannot be logged (&log)

So presumably Bro doesn't like the idea of generating a log entry that includes a vector type (no less consisting of record members, I'm not even sure what that would have looked like but was hoping to find out).

A vector of an atomic type, say `vector of string`, should work. So an option would be to store the header names and values in two different vectors.

The next best thing I can think of doing is recording this information with some custom delimeter in a single string field

That could work also and might be better if you're more concerned w/ human readability.

Also, I have a feeling that directly editing http/main.bro is a bad practice. Should I instead be adding this script to the policy branch, redefining the HTTP Info object and handling the http header event in there?

Take a look at "policy/protocols/http/header-names.bro". Copy that to your own script (possibly storing it in $prefix/share/bro/site) and modify it to do what you want (log header name+value in whatever format). Then have bro load that script: if you're already loading local.bro, adding an @load in there for the new script is one way.

- Jon

So presumably Bro doesn't like the idea of generating a log entry that includes a vector type (no less consisting of record members, I'm not even sure what that would have looked like but was hoping to find out).

We don't allow logging collections of complex types. You can log vectors, but only vectors of simple atomic types (addr, string, subnet, count, double, etc) which doesn't include records. You can log records too but they can't be part of a collection type (set, table, vector). If we had allowed the logging framework to log those types we would have been introducing a lot of headaches for ourselves down the road.

Accept:*/*|||Accept-Language:en-US|||User-Agent:Mozilla 4.0|||Host:somehost.com||||Connection:Keep-Alive

That would work but it's just as nasty as I'm guessing you feel like it is. :slight_smile:

Is this a good solution for including an arbitrarily long collection field in my HTTP logs? Is there a better way to accomplish this?


There are a number of ways of doing, each with upsides and downsides. They way I have done it (I have a script floating around somewhere…) is to add fields to HTTP::Info..

client_header_names: vector of string &log &optional;
client_header_values: vector of string &log &optional;
server_header_names: vector of string &log &optional;
server_header_values: vector of string &log &optional;

Also, I have a feeling that directly editing http/main.bro is a bad practice.

Yes, it's bad practice.

Should I instead be adding this script to the policy branch, redefining the HTTP Info object and handling the http header event in there?

Yep.

  .Seth

Jon, Seth,

Thank you both for the responses. I have a much better sense now for how to organize my scripts within the folder structure, and have found the relevant documentation on this subject.

Jon, I took your advice and used header-names.bro as a template. However, it seems to be that header-names.bro in the policy folder has a couple of key logic flaws. The const type boolean types allow both client and server headers to be enumerated. However, if you look at the “http_header” handler, you can see that this function will return immediately for any event that is not “is_orig”, in other words it will return for server responses with no work done. Furthermore, if you set the const boolean value for server responses to True (T), the logic in the event handler is such that you will just get the client header names populated in the server header names vector.

I have attached a (lightly tested) modified and expanded version of this script called header-names-and-vals.bro. Happy to hear if I am mistaken in my understanding of the bro code, as this is the first time I write my own. Also, happy to hear feedback/best practices on how to escape commas in a vector listing (I just gsubbed them with &#2c).

Thanks for your attention.

Regards,
Christian

header-names-and-vals.bro (1.68 KB)

Jon, I took your advice and used header-names.bro as a template. However, it seems to be that header-names.bro in the policy folder has a couple of key logic flaws. The const type boolean types allow both client and server headers to be enumerated. However, if you look at the "http_header" handler, you can see that this function will return immediately for any event that is not "is_orig", in other words it will return for server responses with no work done. Furthermore, if you set the const boolean value for server responses to True (T), the logic in the event handler is such that you will just get the client header names populated in the server header names vector.

I think your read is correct. The tracking of server headers was probably something that got added in as an after-thought and never tested since that script isn't loaded by default anywhere. Thanks for pointing that out.

I have attached a (lightly tested) modified and expanded version of this script called header-names-and-vals.bro.

Your version looks good, though the initial check for c?$http being set is still a nice thing to leave in.

Also, happy to hear feedback/best practices on how to escape commas in a vector listing (I just gsubbed them with &#2c).

I think they should automatically get escaped when appearing within a container value and you shouldn't have to worry about it. Did you see differently?

- Jon

Thanks for your response Jon. You’re right, commas are already escaped as “\x2c”, so gsub is not needed. Thank you for the heads up!

Regards,
Christian

header-names-and-vals.bro (1.58 KB)