Bro programming intro

Hello.

I want to modify the SQL Injection detection in policy/protocols/http/detect-sqli.bro to include a vector that tracks the associated http request uids and includes them in an additional log field. After getting it working I would like to apply it generally to other Notices such as SSH Password_Guessing.

How this should be implemented ? I do not understand how the timing and garbage collection or expiration of Vals? works. I do not know what is possible from the scripting layer versus modifying the base or policy scripts.

Reading the source and docs helps but I could use some pointers to help accelerate the process.

Thanks !

–TC


The upcoming release actually results in this script getting rewritten a bit because of a rewrite of the metrics (now measurement) framework. The new version actually keeps samples of the requests. It will be relatively easy to write your own script that tracks uid's instead of urls but the benefit to sampling the urls is that if you have Bro send you email for the notice it will add those sample urls to the email (it's been very convenient for determining if something is a false positive without even searching logs).

Otherwise, with the metrics framework in 2.1 there isn't a good way to do it.

.Seth

When are you all planning the next release version?

Thanks!

Ron Jenkins (SnortCP, VCP (3/4), MCNE, CNE6, MCP,CCNA)
RMJ Consulting, LLC. "Bringing Companies and Solutions Together"
Makers of Active Response System(ARS) & Log Siphon
Owner / Senior Architect
Physical Address
11715 Bricksome Ave STE B-7
Baton Rouge, LA 70816
Mail Address
7575 Jefferson Hwy #103
Baton Rouge, LA 70806
Toll: 855-448-5214
Direct. 225-448-5214
Fax. 225-448-5324
Cell. 225-931-1632
Email. rjenkins@rmjconsulting.net
Web. http://www.rmjconsulting.net
ARS. http://www.rmjars.com
Log Siphon. http://www.logsiphon.com
Linkedin. http://www.linkedin.com/profile/view?id=28564151&trk=tab_pro

I hate answering this way, but when it's ready. :slight_smile:

We have several things we're trying to finish up now.

  .Seth

Fully understand.

Thanks!

Ok. I am skeptical of how much emphasis is placed on doing things within BroIDS. Simply buffering uids per Notice? seems much easier and less resource intensive than storing additional? samples. Where is the limit with tracking too much state or using too many cycles within the “IDS” ? I am weary of inadvertently creating DoS conditions with a philosophy that may encompass every script I write in Bro.

I am still interested in a list of key papers on the internals if anyone has a few.

The ability to work with items outside of BroNSM to me is useful and easier than rewriting a BroNSM script and restarting a cluster when I want to look at something differently or trim logs. Searching for items and guessing which requests are related is more time consuming. Long term I can see tweaking a Bro script to perform better. I am very selective with using email as an alert mechanism.

Using samples makes sense, as does uids, samples involve content and sound larger than a simple int32?, but limiting those is fine as well, just as you would the UIDs. How do you plan to implement the sampling ? By time or by unique requests ? Can an attack tool run a number of SQL injection attempts and end the last 5 with something benign ? I’d rather analyze the specifics outside of BroNSM before going back and tweaking BroNSM.

Thanks for the /research link.

I have to point out what I interpret as not answering the question about how to buffer data across time. Not sure how to interpret that other than “go figure it out yourself” or wait for $next_release where it will exist in an altered form :stuck_out_tongue: Maybe what I’m doing is stupid but maybe it will be clever.

Hi:

One paper I can think of off-hand that appeared in RAID a few years back and may be relevant here:

http://www.icir.org/vern/papers/autoconf-raid08.pdf

On principle, there isn’t a hard and fast rule for exactly how much state you can allocate or how many cycles you can spend processing individual packets: it’s really going to depend on load, and will likely need to be experimentally determined (and constantly tuned). One way to do this is to record snippets of the traffic you normally see at the border and run bro against that same set of traffic over and over (bro’s —pseudo-realtime option can help get a realistic sample of what happens if you want to run this locally, or you can alternatively replay traces onto a local link to possibly get some more realistic results) while modifying the scripts bro has loaded to see what happens. This kind of testing can help you identify the limits of what a specific bro configuration can handle in your environment.

Also, I’d like to point out that some folks simply use bro for offline trace analysis and the like; while bro does well as a real-time tool, it doesn’t necessarily have to be.

“I am skeptical of how much emphasis is placed on doing things within BroIDS.”

Relevant cliche: Premature optimization is the root of much evil.

The emphasis, I think, isn’t placed on doing things within bro. Instead, I believe the emphasis is placed on doing things that work for the folks who are deploying bro into their environments. Most of the discussion here focuses on the scripting layer here because, in my humble opinion, that is the most straightforward interface to bro’s event engine, and it’s fast enough to do what folks need it to. In the event it isn’t, there’s always the cluster model … there’s at least one piece of pretty cool hardware I know of that rewrites destination MAC addresses and allows you to load-balance across a cluster pretty effectively, and PF_RING can also even let you cluster on a local box. Often, investing in clustering winds up being cheaper (in both time and money) than trying to throw more experts at the problem to squeeze that last 15 Mbps out of a single node.

Then again, trying to squeeze more out of a single node is always a fun programming challenge to solve :slight_smile:

Normal disclaimer applies: just my $0.02, I’m not an expert, etc.

–Gilbert Clark

The ability to work with items outside of BroNSM to me is useful and easier than rewriting a BroNSM script and restarting a cluster when I want to look at something differently or trim logs.

We're even moving away from "NSM" now.
  http://blog.bro.org/2013/03/broorg-new-home-for-bro.html

I am very selective with using email as an alert mechanism.

I fully believe this will expand over time. Email is just the obvious way that we support right now. Is there some other specific tool you would like to see Bro integrated with?

Using samples makes sense, as does uids, samples involve content and sound larger than a simple int32?, but limiting those is fine as well, just as you would the UIDs. How do you plan to implement the sampling ? By time or by unique requests ? Can an attack tool run a number of SQL injection attempts and end the last 5 with something benign ? I'd rather analyze the specifics outside of BroNSM before going back and tweaking BroNSM.

If you run on a cluster it would become very hard for an attacker to end up sending just 5 at the end that would be forwarded to the analyst. Samples are collected on each worker and then interleaved and size limited again when the measurement results are merged at the end. I'm sure there are ways an attacker could mess with analysts still, but it's not as obvious as just sending a few benign requests at some specific period.

Regardless, this is just Bro scripts that are tracking the content and they can typically be modified fairly easily.

  .Seth

Wow!!! What the heck did it cost to get that domain name???

Congrats. Educause SPC is coming up and I'm sure Bro is going to be a
hot topic of conversation :slight_smile:

Cheers,
Harry

Thanks for your comments, good to reflect on over coffee. FWIW I am running the cluster model with plenty of RAM and CPU to spare with a near default config.

I do not believe what I am trying to learn and implement is premature optimization but thanks for the reminder, I’ve made that mistake a few times.

Another thing I would like to do is tag every orig_h and resp_h with additional identifiers relative to the prefix, sorta like BGP ASNs. I usually use a Patricia-Trie for this. Is there a special data type and BiF I should consider ?

From the documentation on extending logging [1] it seems that is a bit beyond the scripting layer. I read about the input framework [2] and it seems it might work for this application but I’m not sure if this is best. What do you suggest ? I would like all logs that have an orig_h and/or resp_p to include the tags.

[1] http://www.bro.org/documentation/logging.html#extending

[2] http://www.bro.org/documentation/input.html

Thanks !

–TC

Another thing I would like to do is tag every orig_h and resp_h with additional identifiers relative to the prefix, sorta like BGP ASNs. I usually use a Patricia-Trie for this. Is there a special data type and BiF I should consider ?

When indexed by the `subnet` type, the `set` and `table` types should be using a Patricia-Trie internally. So you might be able to use something like a `table[subnet] of MyTag` for mapping orig_h/resp_h to however you want to define the `MyTag` type (probably an enum would work).

From the documentation on extending logging [1] it seems that is a bit beyond the scripting layer. I read about the input framework [2] and it seems it might work for this application but I'm not sure if this is best. What do you suggest ? I would like all logs that have an orig_h and/or resp_p to include the tags.

My opinion would be that extending the logging would be easier, but I don't know all the details of how you want to use it. The way I'm thinking, you'd basically do the same thing as the documentation describes, maybe start with conn.log:

(1) add a field to the Conn::Info record for the orig/resp tags
(2) pick a time at which to lookup the orig_h/resp_h in your tag table and assign them to the fields in a Conn::Info instance. Handling either the Conn::log_conn or connection_state_remove event to do this are some ideas.

Then you can see if it makes sense to extend other logs in a similar way or whether conn.log is adequate.

    Jon

You got the right section in the logging framework docs. I'll give an example if you want to add ASN like you mentioned as an example...

redef record Conn::Log += {
  orig_asn: count &log &optional;
  resp_asn: count &log &optional;
};

event connection_established(c: connection)
  {
  c$conn$orig_asn = lookup_asn(c$id$orig_h);
   c$conn$resp_asn = lookup_asn(c$id$resp_h);
  }

You need to have the MaxMind ASN database in place for the lookup_asn function to work. Anyway, it's pretty easy. :slight_smile:

  .Seth

Thanks for the detail and examples. Makes more sense when you consider a single Bro process.

I'm not sure what you mean by this? Logs don't really have anything to do with multiple processes in most cases.

  .Seth

Don’t worry, neither do I. Good to know !

Wow!!! What the heck did it cost to get that domain name???

Not as much as we were afraid it would. :slight_smile:

Congrats. Educause SPC is coming up and I'm sure Bro is going to be a
hot topic of conversation :slight_smile:

I'll see you there then. I'm going to be participating in the "Ask the Expert" session during the REN-ISAC event following the SPC.

  .Seth