Bro regex documentation

Can anyone point me at documentation on Bro’s builtin string/pattern functions? The reference manual on the wiki points me at strings.bif.bro which doesn’t have a lot of documentation around it.

Does bro support back-references? I am trying to look for specific patterns in a tcp stream and need to be able to log out said patterns to a file.


Can anyone point me at documentation on Bro's builtin string/pattern functions?

The regular expressions are most similar to flex's regular expressions (with minor differences), but you can typically assume that they are POSIX regular expressions.

Does bro support back-references?

No. I'll let Robin or Vern give more detail here if they want to, I'm definitely not qualified to explain all of the reasons that back references aren't supported. :slight_smile:

I am trying to look for specific patterns in a tcp stream and need to be able to log out said patterns to a file.

Why don't the string splitting functions (defined in strings.bif) work for your scenario?


Yeah, those can be used together to see what i want. Wanted to see if there was something similar to the match function in gawk where the function returns an array of all of the variables you collect in your pattern. I didn’t see anything like it.


The match function in gawk only seem to return the position of the beginning of the match?

I this the split_all function should work for what you are trying to do. Here's a note from the source code...

# For example, split_all("a-b--cd", /(\-)+/) returns {"a", "-", "b",
# "--", "cd"}: odd-indexed elements do not match the pattern
# and even-indexed ones do.

split_all will give you all of the things matching the split regex and the bits between the split regex. You can just look into the string_array for odd numbers indexes if you want what didn't match as a separator and even if you want what did match.


The match function will also return substrings from your regex pattern and return it in an array if you provide the optional 3rd parameter. So if you do match(input, /a=(.+) b=(.+) c=(.+)/, ret), you will get the values you want in the array ret. I think with what you pointed out for split functions i should be able to get what i need though.


The quick answer here is that Bro matches regexps with DFAs. While
DFAs are very efficient, they can't do backreferences.