Getting matched substrings ???

I read in the paper "Bro: A
System for Detecting Network Intruders in Real-Time" this phrase about
REGEX implementation : "Second, we anticipate matching sets of patterns
and wanting to know which subset were matched by a given set of
text...". I thought I could get the matched substring by the signatures,
but unfortunately I can't get out of it...

(That text refers to regular-expression matching on general strings, rather
than the context-based signature analyzer that Robin added to Bro, by the
way.)

Since writing that, Bro's style has moved more towards pushing extraction
of elements into either the event engine itself, or into built-in functions,
rather than trying to do it using regular expressions over strings. If it
were easy to add subexpressions to Bro's RE matcher, I'd be happy to do so,
but it's quite a bit of work.

If you give an example of where you want to do this, perhaps we can suggest
alternate ways of structuring your analyzer.

    Vern

Robin Sommer wrote:

text...". I thought I could get the matched substring by the signatures, but unfortunately I can't get out of it...
   
event signature_match(state: signature_state, msg: string, data: string)

The 'data' parameter of the signature_match event contains the
payload that lead to the match. (More precisely, it contains the
last chunk of payload that eventually triggered the match; for TCP,
it depends on the reassembly what exactly is passed).

Is this what you're looking for?

Robin

Vern Paxson wrote:

> I read in the paper "Bro: A
> System for Detecting Network Intruders in Real-Time" this phrase about
> REGEX implementation : "Second, we anticipate matching sets of patterns
> and wanting to know which subset were matched by a given set of
> text...". I thought I could get the matched substring by the signatures,
> but unfortunately I can't get out of it...

(That text refers to regular-expression matching on general strings, rather
than the context-based signature analyzer that Robin added to Bro, by the
way.)

Since writing that, Bro's style has moved more towards pushing extraction
of elements into either the event engine itself, or into built-in functions,
rather than trying to do it using regular expressions over strings. If it
were easy to add subexpressions to Bro's RE matcher, I'd be happy to do so,
but it's quite a bit of work.

If you give an example of where you want to do this, perhaps we can suggest
alternate ways of structuring your analyzer.

                Vern

In fact, I use the "data" parameter at the moment to get the whole payload, but the real idea was to get only the part that matched.
Here is a simple example of what I'd like to do :

*signature apache-server {
    ip-proto == tcp
    src-port == 80
    payload /Server: [aA][pP][aA][cC][hH][eE].***/
    event "Apache"
    tcp-state responder
}

Then, in a policy script, I thought I could get "Apache//version/", using the function sub_bytes(), associated to the IP@ of the host (contained in the signature_state). It was an easy way to know that the information I needed was 8 characters ("Server: ") after the beginning of the matched substring.

*To sum up, I'd like to get some hosts characteristics like : *this host (IP@ W.X.Y.Z) is now running Apache 1.3.29*.

This sounds exactly like what software.bro is doing. Have you tried
that? (You also need to load http-reply.bro as it doesn't use the
signature engine but the HTTP decoder).

Robin

Robin Sommer wrote:

*To sum up, I'd like to get some hosts characteristics like : *this host (IP@ W.X.Y.Z) is now running Apache 1.3.29*.
   
This sounds exactly like what software.bro is doing. Have you tried
that? (You also need to load http-reply.bro as it doesn't use the
signature engine but the HTTP decoder).

Robin

Sounds great in fact !!! I've just tested it, and it will help me for sure. Thanks !!!

Now, next stage will be to get other information contained in URLs, like sensitive CGIs.
I've just seen http-request.bro implements such features, so I'm having a look at it...

Yohann.