Since there are now UDP payload signatures by default for Teredo/AYIYA DPD, we had talked about checking out the potential/necessity for optimizing those signatures to only check for matches on first packets of a connection. I don't think it's worth doing now because (1) the default settings only do matching on a connection for the first 1K payload and (2) the internals don't seem to support such an option that well because, internally, multiple patterns get compiled together into a DFA to check matching and the interface to it is geared towards checking if any pattern was matched, not checking if a given pattern didn't match.
So does it sound reasonable to leave out this feature?
Unrelated to that, I was checking how UDP payload patterns were actually matched and found unexpected behavior. The docs say:
"Regular expressions are implicitly anchored, i.e., they work as if prefixed with the ^ operator. For reassembled TCP connections, they are anchored at the first byte of the payload stream. For all other connections, they are anchored at the first payload byte of each packet. To match at arbitrary positions, you can prefix the regular expression with .*, as done in the examples above."
But for a UDP connection made up of 2 packets with payloads "XXXX'" and then "YYYY", I still need the ".*" prefix to match on the 2nd:
payload and (2) the internals don't seem to support such an option
that well because, internally, multiple patterns get compiled together
into a DFA to check matching and the interface to it is geared towards
checking if any pattern was matched, not checking if a given pattern
didn't match.
That's indeed something hard to get around, and we wouldn't change
that. The performance savings would only kick in later (there's
potentially more logic that triggers upon a regexp match). However,
it's hard to say if that would change much, in particular with the 1K
buffer as you say.
So yes, assuming nobody is seeing signficant performance impact with
the recent changes (which I haven't in my tests on traces), we can
leave things as they are right now. As a test, we could create
something like a "worst-case trace" that only has traffic of the kind
relevent here and measure if the signature matching makes a noticable
difference.
"Regular expressions are implicitly anchored, i.e., they work as if
prefixed with the ^ operator. For reassembled TCP connections, they
are anchored at the first byte of the payload stream. For all other
connections, they are anchored at the first payload byte of each
packet. To match at arbitrary positions, you can prefix the regular
expression with .*, as done in the examples above."
This is indeed the intended behaviour.
Changing the pattern to /YYYY/ or /^YYYY/ results in no match (but
does match if I flip order of packets). Is the bug in the docs or the
code?
That looks like a bug in the code. Also reminds me that we should
really have unit tests for the signature engine ...
As a test, we could create
something like a "worst-case trace" that only has traffic of the kind
relevent here and measure if the signature matching makes a noticable
difference.
I did some tests with 2,5702,400 total 1-byte (\x58) payload UDP packets over 25,100 connections comprised of 1,024 packets each and the worst performance impact I saw was a +0.2% difference when adding the new UDP signatures.
That looks like a bug in the code. Also reminds me that we should
really have unit tests for the signature engine ...