regarding regular expression matching

Greetings all,

I am trying to get some statistics on reg-exp matching in IDS, for that
I am using Bro. Further, I am also using Snort signatures (converted
using s2b utility) as input. I have a few questions and need a
clarification as well on the Bro code.

The way Bro does reg-exp matching is by building a tree of rules (not
more than RE_level I guess). Looking at the RuleMatcher.cc code, the way
it is done is by grouping signatures. A set of signatures are then
compiled and a DFA is built for each set and the process repated for the
entire tree. Have I got it correctly ?.

However, what I do not completely understand is what is the condition
for a node to be a child of another node. Dumping the tree structure
out, I see that a node (a particular signature) with same src-port
range and ip protocol is at one level while another node (another
signature) (with same configs) is at a lower level (down the tree).

What exactly is the importance of RE_level, is it to have any caching
advantages ?

Additionally, each DFA machine (a DFA built using one set of reg-exps)
has its own state cache. A single centralised cache is not used. Is it
right (I disable EXPIRE_DFA_STATE flag) ?

Finally, I see that the in-built Bro rules are built separately. While
building the DFA for this, I notice that 2 copies of the same
regular-expression is built - one called MATCH ANYWHERE and other MATCH
EXACTLY. In MATCH EXACTLY, I see that the same reg-exp as used in MATCH
ANYWHERE is called but with the restriction that the reg-exp has to
start at beginning of line. Would'nt MATCH EXACTLY be a subset of MATCH
ANYWHERE ?.

Finally, I do not see MATCH EXACTLY and MATCH ANYWHERE being done for
snort imported signatures (I am using only a small subset currently). Is
it true for all snort converted signatures ?

Thank you for your very patient reading!

Regards
Govind