raw bytes question

Hi list,

Is there an event I can hook that would allow me to do a regex on the
raw bytes of a packet if I knew the hex pattern of the bytes I want to
match?

-Tim

Per the mail I sent you earlier, that looks like a task for a
signature.

Robin

Hi Tim,

Is there an event I can hook that would allow me to do a regex on the
raw bytes of a packet if I knew the hex pattern of the bytes I want to
match?

If you want an example of working with signatures and policy script, I went ahead and added a script for detecting SSN leakage that works by having a signature that is subsequently handled in policy script. It uses a list of known US SSNs for your organization and filters out false positives by using that list. We've caught quite a few minor violations with this script since we started running it.

Here's the policy script:
   http://github.com/sethhall/bro_scripts/blob/819d078ad9cf59d9f594f2682fcd6d3c8b89d6ad/ssn-exposure.bro

The corresponding signature definition file is here:
   http://github.com/sethhall/bro_scripts/blob/819d078ad9cf59d9f594f2682fcd6d3c8b89d6ad/ssn.sig

Let me know if you have any problems understanding what's happening between the signature definition and the policy script. That simple interaction is a little muddied by the rest of the script.

   .Seth

This raises a question that I’ve been wondering since poring over the 1.4 manual regarding how well Bro greps packets. Specifically, the manual says that signatures are off by default and that the grepping is per-packet with no stream reassembly capabilities. It also appears that there’s no particularly fancy pattern matching engine under the hood, indicating that matching on full snaplengths for many signatures produces high load. I haven’t measured this myself, so I’m wondering if this is the case. Does anyone have any statisical (or anecdotal) evidence as to how many sigs can run under a subnet with mostly web client traffic?

Thanks,

Martin

This raises a question that I've been wondering since poring over the 1.4
manual regarding how well Bro greps packets. Specifically, the manual says
that signatures are off by default and that the grepping is per-packet with
no stream reassembly capabilities.

Uh, does the manual really say that? Can you point me to where you
found these statements?

The signature is not really "off by default". Rather (like most
functionality in Bro), it's only activated on demand when your
configuration actually defines any signatures. It's true that we
don't ship with many pre-built signatures[1]. But DPD for example
uses those in policy/sigs/dpd.bro, and they are activated once you
turn on DPD by loading dpd.bro.

Likewise, pattern matching *is* usally done stream-wise, not on
packets. More precisely, whenever Bro has reassembly enabled for a
particular connection, the pattern matching is performed after
reassembly. Only if Bro does not reassemble a connection, then
pattern matching proceeds on packets. Generally, you can tell Bro
pretty precisely which connections you want it to reassemble; by
default, it reassembles the *beginning* of all TCP connections, and
it then keeps the reassembler enabled for those for which it has
found a suitable application-layer protocol analyzer.

For more details (including options to control matching), please see
this blog posting:

        The ICSI Networking Group Blog: Bro's Signature Engine

It also appears that there's no particularly fancy pattern matching
engine under the hood, indicating that matching on full snaplengths
for many signatures produces high load.

Likewise, I'm wondering where you got the impression that there's no
"fancy engine" (or what you'd consider a fancy one to look like :-).
There's a paper describing the internals of Bro's approach in more
detail if you are curious:

       http://www.icir.org/robin/papers/ccs03.ps
       
The paper also discusses various trade-offs in signature matching as
well as the difficulty of fairly comparing multiple engines against
each other.

I haven't measured this myself, so I'm wondering if this is the
case. Does anyone have any statisical (or anecdotal) evidence as
to how many sigs can run under a subnet with mostly web client
traffic?

The only systematic measurements I'm aware of are actually those in
the older CCS paper mentioned above. Most people seem to use Bro's
engine mostly with a small number of signatures as it's usally
deployed as *support* for script-level analysis rather than as the
primary detection tool by itself. I remember one specific case in
which someone used a large number of signatures and had some
performance trouble initially; that however was solvable by tuning
the engine's options a bit.

Hope this helps,

Robin

[1] Ignoring the ancient ones converted from Snort which aren't
really useful anymore.

Robin,

Thanks for the quick reply. The “off by default” comment comes from section 7.6.1 of the user manual which states “Signature matching is off by default.” I understand that Bro’s emphasis (and therefore distinction from its competition) is that it relies as little as possible on signature matching. So much so that my concern as a newcomer to Bro is that signature matching is de-emphasized enough that it could suffer in performance.

For stream reassembly, I worded my question poorly. The blog post you mentioned (which was what I was thinking of when I wrote the questions) states that reassembly is only done on the first 1K of streams. So, I (perhaps unreasonably) do not consider that reassembly because I am very regularly interested in the 1K-2K range of a stream.

I read the CCS paper (though it’s rather old!) and I think I now have a much better idea of what the internal sig matching engine uses, namely DFA (or at least that’s what it used to use). I’m wondering how this compares with the Aho-Corasick NFA implementation of simple (non-regexp) string matching a la Snort, both in performance and memory consumption. I’d also be interested in comparisons on CPU cache efficiency.

Thanks,

Martin

Thanks for the quick reply. The "off by default" comment comes from section
7.6.1 of the user manual which states "Signature matching is off by
default."

I see. That paragraph is actually not refering to the signature
engine itself but to the set of
Snort-converted-and-further-augmented signatures that were shipped
as part of the Bro-Lite environment (which is technically still
there but hasn't been maintained for years and will be removed
soon.) But I see how that can be confusing; the text doesn't really
make that distinction clear.

states that reassembly is only done on the first 1K of streams. So, I
(perhaps unreasonably) do not consider that reassembly because I am very
regularly interested in the 1K-2K range of a stream.

Well, I'd call it "reassembly of the first 1K". As I wrote in the
mail and in the blog posting, that's all configurable. Different
people require different trade-offs.

least that's what it used to use). I'm wondering how this compares with the
Aho-Corasick NFA implementation of simple (non-regexp) string matching a la
Snort, both in performance and memory consumption.

The paper actually compares with Snort, though with the Snort of
2003. I can't comment on any recent versions.

I'd also be interested in comparisons on CPU cache efficiency.

That is an interesting question indeed.

Robin