New http policy scripts

Hi,

I've looked at the new http policy scripts and have some comments (first: thanks Seth for this re-organization, looks great):

It appears that per default protocols/http now does analyzer HTTP-headers and HTTP-payload. Both of them are quite expensive in terms of CPU time. Particularly body analysis.

I would opt to *not* include those if just protocols/http is loaded (which will always be loaded by default in the future). HTTP is usually going to be the most expensive analysis (due to traffic volume) anyway, so we should give users and easy way to adjust the load according to their traffic and available hardware. So, I would opt to only do http request and http reply analysis by default and provide users with an easy option to

a) load HTTP-header analysis. E.g., protocols/http/headers
b) load HTTP-body analysis. E.g., protocols/http/body

(or name a) and b) http/medium and http/heavy respectively)

(I can see the value of always doing header analysis, so I think I could accept HTTP header analysis by default if others really want this, but I really think body analysis should not be done by default)

Also note that there is an http_headers event (note the "s" at the end). This event gives you all the headers with one event call. You will loose the order of events, and you'll only get the headers after the header is done (i.e., there's an empty line).
In my experience it's a *lot* faster, if you only use the http_headers event instead of relying on individual http_header() events (*). I would therefore opt to use the http_headers() event whenever possible!! (Particularly for everything that gets loaded by default)

cu
Gregor

I knew that someone would make this argument eventually. :slight_smile:

There are two parts to my argument in favor of including both by default.

By removing some of the analysis like that by default, you are basically taking a runtime optimization step as you pointed out since it does certain cause overhead to do everything. The problem is that it would make the usage of Bro more obtuse for users since it would be a singular optimization specifically for a type of traffic that just happens to be prevalent on most networks. I think that in 99% of cases, people want everything anyway (people running Bro on live traffic for operational security purposes at least). That has been my experience.

The other side of this is the http_body events. I don't like how I'm doing that either, but it's a stopgap until we have the more general file analyzer that would do everything i'm doing in the base http analysis scripts internally (identifications, hashing, extraction, etc).

I do agree that I'm doing some pretty egregious stuff in some of those scripts from an optimization perspective, but I think that optimization attempts in Bro scripts have led to incredibly convoluted scripts and dependency chains. I'm going to press instead for optimizations that allow the scripts to remain well structured. For instance, what about just disabling the http_header or http_data events if you don't want those done? This should already be do-able with the disable_event_group like this:
  
  disable_event_group("http-body");
  disable_event_group("http-header");

  .Seth

I do agree that I'm doing some pretty egregious stuff in some of those scripts from an optimization perspective, but I think that optimization attempts in Bro scripts have led to incredibly convoluted scripts and dependency chains. I'm going to press instead for optimizations that allow the scripts to remain well structured. For instance, what about just disabling the http_header or http_data events if you don't want those done? This should already be do-able with the disable_event_group like this:
  
  disable_event_group("http-body");
  disable_event_group("http-header");

after chatting with Seth I actually start to like this idea. Maybe add policy scripts that, when loaded, will disable these groups to shed load.

cu
Gregor