Patterns and Word Boundaries

Hopefully this isn't too simplistic of a question, but I'm just getting
started with Bro.

In the text pattern syntax for Bro [1], is there an easy way to define
word boundaries, similar to how some of the RegEx dialects use '\b',
'\<', '\>', etc.? [2]

I'm trying to match for specific strings in a data stream. For example,
the word "nmap". I'm trying several approaches, based on past RegEx
knowledge, and I'm having trouble coming up with a single pattern that
would handle it all. Example bro test script attached; hopefully it's

Fundamentally, is there a syntax reference for pattern matching, or does
it conform to a commonly known dialect (eg. POSIX-style RegEx, or PCRE

[2] Regex Tutorial - \b Word Boundaries

patterns.wordboundary.testcase.bro (1.47 KB)

I know Bro’s regex syntax is almost exactly the same as Flex (only differing in some very edge cases). I am not positive, but from a cursory google it seems Flex doesn’t understand word boundaries.


Well, okay. From what I can tell experimentally, it doesn't have
working shortcuts like "\s" or "[:space:]" either, so I guess I'm left
to do it more like *this* attachment.

Unless I'm missing something obvious. I'd be happy to be wrong on this one.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University

patterns.wordboundary.testcase.bro (889 Bytes)

It does actually support the standard "[:...:]" cases.


For future list-viewers, yes, I was missing something obvious. The word
boundaries are genuinely missing, but I was using the shortcuts like
'[:space:]' incorrectly.

In short, '[:space:]' and others like it, are not character classes
themselves, but they can exist in a character class. The '[:space:]' is
not the equivalent of '[ \f\n\r\t\v]', but '[[:space:]]' is.

Thanks for the feedback on this, Robin. Sorry for the unnecessary list

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University

Have you read this ??

Regex != Flex

Yes. I had seen that. And I just missed the double-bracket detail.

Having said that, this looks to me like as much of a RegEx dialect as
any other. Those extended shortcuts like I've been referring to are
reasonably common, but not required. And the first sentence on that
page even says the following:

The patterns in the input ... are written using an extended set of regular expressions.

So Flex and RegEx at least share a lot of features and syntax. Whether
or not it's truly RegEx, seems like a purely semantic discussion.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University