Hopefully this isn't too simplistic of a question, but I'm just getting
started with Bro.
In the text pattern syntax for Bro [1], is there an easy way to define
word boundaries, similar to how some of the RegEx dialects use '\b',
'\<', '\>', etc.? [2]
I'm trying to match for specific strings in a data stream. For example,
the word "nmap". I'm trying several approaches, based on past RegEx
knowledge, and I'm having trouble coming up with a single pattern that
would handle it all. Example bro test script attached; hopefully it's
clear.
Fundamentally, is there a syntax reference for pattern matching, or does
it conform to a commonly known dialect (eg. POSIX-style RegEx, or PCRE
RegEx)?
I know Bro’s regex syntax is almost exactly the same as Flex (only differing in some very edge cases). I am not positive, but from a cursory google it seems Flex doesn’t understand word boundaries.
Well, okay. From what I can tell experimentally, it doesn't have
working shortcuts like "\s" or "[:space:]" either, so I guess I'm left
to do it more like *this* attachment.
Unless I'm missing something obvious. I'd be happy to be wrong on this one.
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University http://marylou.byu.edu
For future list-viewers, yes, I was missing something obvious. The word
boundaries are genuinely missing, but I was using the shortcuts like
'[:space:]' incorrectly.
In short, '[:space:]' and others like it, are not character classes
themselves, but they can exist in a character class. The '[:space:]' is
not the equivalent of '[ \f\n\r\t\v]', but '[[:space:]]' is.
Thanks for the feedback on this, Robin. Sorry for the unnecessary list
noise.
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University http://marylou.byu.edu
Yes. I had seen that. And I just missed the double-bracket detail.
Having said that, this looks to me like as much of a RegEx dialect as
any other. Those extended shortcuts like I've been referring to are
reasonably common, but not required. And the first sentence on that
page even says the following:
The patterns in the input ... are written using an extended set of regular expressions.
So Flex and RegEx at least share a lot of features and syntax. Whether
or not it's truly RegEx, seems like a purely semantic discussion.
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University