Patterns and Word Boundaries

Hopefully this isn't too simplistic of a question, but I'm just getting
started with Bro.

In the text pattern syntax for Bro [1], is there an easy way to define
word boundaries, similar to how some of the RegEx dialects use '\b',
'\<', '\>', etc.? [2]

I'm trying to match for specific strings in a data stream. For example,
the word "nmap". I'm trying several approaches, based on past RegEx
knowledge, and I'm having trouble coming up with a single pattern that
would handle it all. Example bro test script attached; hopefully it's
clear.

Fundamentally, is there a syntax reference for pattern matching, or does
it conform to a commonly known dialect (eg. POSIX-style RegEx, or PCRE
RegEx)?

[1] https://www.bro.org/sphinx/scripting/index.html#pattern
[2] Regex Tutorial - \b Word Boundaries

patterns.wordboundary.testcase.bro (1.47 KB)

I know Bro’s regex syntax is almost exactly the same as Flex (only differing in some very edge cases). I am not positive, but from a cursory google it seems Flex doesn’t understand word boundaries.

-Sam

Well, okay. From what I can tell experimentally, it doesn't have
working shortcuts like "\s" or "[:space:]" either, so I guess I'm left
to do it more like *this* attachment.

Unless I'm missing something obvious. I'd be happy to be wrong on this one.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

patterns.wordboundary.testcase.bro (889 Bytes)

It does actually support the standard "[:...:]" cases.

Robin

For future list-viewers, yes, I was missing something obvious. The word
boundaries are genuinely missing, but I was using the shortcuts like
'[:space:]' incorrectly.

In short, '[:space:]' and others like it, are not character classes
themselves, but they can exist in a character class. The '[:space:]' is
not the equivalent of '[ \f\n\r\t\v]', but '[[:space:]]' is.

Thanks for the feedback on this, Robin. Sorry for the unnecessary list
noise.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Have you read this ??

http://flex.sourceforge.net/manual/Patterns.html

Regex != Flex

Yes. I had seen that. And I just missed the double-bracket detail.

Having said that, this looks to me like as much of a RegEx dialect as
any other. Those extended shortcuts like I've been referring to are
reasonably common, but not required. And the first sentence on that
page even says the following:

The patterns in the input ... are written using an extended set of regular expressions.

So Flex and RegEx at least share a lot of features and syntax. Whether
or not it's truly RegEx, seems like a purely semantic discussion.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu