Bro: TCP, regex

Greetings.

I have a question regarding bro's analysis.

Consider a TCP connection, as the segments come in they are being
'deliver'ed to different analyzers.
If there are out of order segments, then the TCP Reassembler stores
them and delivers them in order.
Now at the later stages, if a regular expression matching is done,
will it match across different deliveries?
For e.g. if a regex is trying to match across 'N' bytes where N is
large (say 1MB). Is this possible with Bro?
Or the window for matching is smaller?

e.g.

TCP connection established
<start of data> (regular expression partially matched)
<more data>
...1MB data
<end of data> (regular expression match completed)

Is a regular expr match like this possible with bro?

Yes, it will if you're refering to Bro's signatures. Signature
matching is performed on the payload *stream* independent of any
packet boundaries (this is different from Snort, or at least is was
different when I last looked at it; perhaps things have changed
these days).

On the scripting layer things work a bit different. You can use
regepxs there to match on a string but the string needs to be
available completely at that time. You cannot save the matching
state so that you could later pass in more data. However, that's
usually not a problem because the core already extracts the right
semantic units from the protocols on which you can then match. A
typical example are URLs from HTTP sessions: the core will take care
that a script always sees complete URLs; the stream reassembly
happens before the HTTP decoder extract the URLs. So matching a
regexp on the URL you get from the core will work fine even if in
the original packet stream the URL crosses packet boundaries.

Robin

Yes, it will if you're refering to Bro's signatures.

..

On the scripting layer things work a bit different.

Does the Bro signatures work on a different layer than the
scripting/policy layer ?

Signature
matching is performed on the payload *stream* independent of any
packet boundaries (this is different from Snort, or at least is was
different when I last looked at it; perhaps things have changed
these days).

In the code, which are the relevant files I need to look to understand
whether this is done like you mentioned?

RE.cc, TCP_Contents.cc ?

Thanks

Does the Bro signatures work on a different layer than the
scripting/policy layer ?

Yes, the signature matching is done inside the core. Only if there's
a match, an event is passed to the policy layer.

In the code, which are the relevant files I need to look to understand
whether this is done like you mentioned?

The code implementing the signatures is in Rule*.{h,cc}.

Robin