Use of GPUs for signature matching?

Bro currently follows a single-threaded model in which every incoming packet is first filtered, analyzed for protocol based on its signature (and not simply port-number) and then handled according to a user-defined policy for that protocol. While Bro provides mechanisms to distribute the processing of the handled policy events, the protocol analysis poses a performance bottleneck in that it might not be able to keep up with the speed of incoming packets.

In Bro's signature matching engine, connections sometimes trigger more than one signature and so can not be immediately associated with a protocol. But as more connection packets arrive, a better decision about the protocol involved can be made. During this process, different protocol analyzers may be spawned and killed until finally the right protocol is arrived at. Regular expression matching is done here to match signatures.

I believe that GPUs can be used here to perform parallel signature matching by different protocol analyzers, thus speeding up the protocol analysis phase. With this, Bro would be able to operate at a higher packet rate than it does now.

If this is true, I would like to do this. I will appreciate if you could share your thoughts.

Snort's packet processing throughput increased by 60% with the use of GPUs ( http://www.springerlink.com/content/b3m7662014272t8m/ ) and Suricata has plans to introduce GPUs ( http://blog.securitymonks.com/2010/08/26/three-little-idsips-engines-build-their- open-source-solutions/ ).

Thank you,
Sunjeet Singh

That's generally right and, as the Snort work demonstrates,
parallelizing signature matching across GPUs can indeed improve
performance quite a bit. For Bro, however, improving signature
performance is actually not that crucial as its main performance
bottlenecks are elsewhere (the single most important bottleneck
today is the script interpreter).

Thus, while generally improving the performance of Bro's signature
engine would certainly still be nice (and I appreciate your interest
in helping with this!), I'm not sure it's actually worth spending
the time that a solid GPU-based implementation would require.

I'd be happy to provide you with some further thoughts on directions
you could work on for improving Bro's performance. Write me a mail
off-list if you're interested.

Robin

For Bro, however, improving signature
performance is actually not that crucial as its main performance
bottlenecks are elsewhere (the single most important bottleneck
today is the script interpreter).

Robin, can you elaborate on this a bit? I'm very surprised that
pattern matching would not be the first bottleneck.

With that, I've watched the debate fly back and forth between Marty
Rausch (in Snort) and Victor Julien (in Suricata) on the pros and cons
of multithreading and I'd like to hear your take. Marty's point was
that multithreading leads to CPU cache inefficiency which incurs a
penalty greater than the boost to the pattern matching in parallel and
therefore suggests flow-pinned load-balancing for scaling. Do you
have an opinion on the matter?

Thanks,

Martin

Robin, can you elaborate on this a bit? I'm very surprised that
pattern matching would not be the first bottleneck.

The answer is quiet simple actually: Bro just doesn't do that much
pattern matching. While it has a pattern engine similar to what
Snort/Suricata are relying on, a typical Bro setup doesn't use it
very much at all: typically there are just a few signatures
configured, often just for doing dynamic protocol detection.

Bro is doing a lot of other things instead, in particular deep
stateful protocol analysis and execution of its analysis scripts. In
particular the latter is getting more and more expensive compared to
Bro's other components: scripts are becoming larger and more
complex, they track more state, and they have to deal with more
traffic to analyze. The script interpreter is a piece we haven't
spend much time on optimizing yet (it's indeed still an
*interpreter* ...), and it actually accounts for a large share of
Bro's CPU (and also memory) footprint these days.

Note that executing scripts written in Bro's language is much
different from doing pattern matching; improving regexp performance
is not going to help much at all with the scripts. That's quite
different from Snort/Suricata obviously, which don't do much else
than pattern mastching.

Marty's point was that multithreading leads to CPU cache
inefficiency which incurs a penalty greater than the boost to the
pattern matching in parallel and therefore suggests flow-pinned
load-balancing for scaling. Do you have an opinion on the matter?

It's hard to answer that in a few sentences, but generally I agree
that a flow-based load-balancing scheme is a reasonable approach for
the lowest layer of the system. Many NIDS (includig Snort and Bro)
do much of their work on a per-flow basis, so parallelzing at that
granularity certainly makes a lot of sense and avoids communication
overhead (and hence also cache issues). Generally, such a flow-based
scheme can then be implemented either at the system/process level
(i.e., running more than one instance of the NIDS, with a suitable
frontend load-balancer splitting up the work, either externally or
internally); or at the thread-level (multiple threads fed by a
master thread). Conceptually, that doesn't make a lot of a
difference, and the former is what we're doing with the Bro Cluster.

Now, Snort has the "advantage" that such a simple flow-based scheme
is pretty much all it needs to do for parallelizing. Because there's
not much happening after the pattern matching step, there's also no
need for further coordination between the instances/threads. For
Bro, however, this is where things actually start to get
interesting: since much of its CPU cycles are spent for the scripts,
Amdahl's Law tells us that we need to parallelize the interpreter if
we want to scale. Unfortunately, parallelizing the execution of a
free-form Turing-complete language isn't exactly trivial ...

Robin

Ok, this makes a lot of sense now. So you're saying that for the few
true pattern matching activities Bro has to do, there's plenty of CPU
to spare, but for script execution such as going to time-machine,
extracting files from pcap, etc., you're running out of CPU.

So if you're running into a performance challenge with the scripting
language, would you consider switching from the native Bro scripting
language to an embedded interpreter from something like Perl, Python,
or Lua? That in and of itself probably would hurt performance, but my
guess is that it would take a lot less time to embed something and
then multi-thread it then rolling your own from scratch. With the
increase in number of CPU cores climbing exponentially, a small
performance hit would probably be acceptable if it can be offset by
running on multiple cores. I think a well-known script language would
also be a lot less scary for newcomers to Bro and really increase its
user base.

Hi Martin,

So if you're running into a performance challenge with the scripting
language, would you consider switching from the native Bro scripting
language to an embedded interpreter from something like Perl, Python,
or Lua? That in and of itself probably would hurt performance, but my
guess is that it would take a lot less time to embed something and
then multi-thread it then rolling your own from scratch.

That likely not true. The performance hit would probably quite large with many of the dynamic languages. I don't know about Lua but with Perl and Python being untyped they do a lot of acrobatics whenever variables are created, accessed, and modified which doesn't work very with the soft realtime constraints that Bro needs to function within.

I think a well-known script language would
also be a lot less scary for newcomers to Bro and really increase its
user base.

I think that every who start working with Bro has a point where they get frustrated with having to learn a new language (I know I did), but then after some time they start to recognize the reason that Bro has it's own language. The Bro policy script language is a large part of what makes Bro, Bro. :slight_smile: It's a domain specific language for doing event analysis and Bro's core has been made to turn network traffic into a stream of events so that it would be possible to analyze it in this style. General purpose scripting languages would likely have to use strange syntaxes to get some of the features and functionality of the Bro language.

What will likely increase Bro's user base in a big way is for Bro to do a lot of interesting detections out of the box. There's likely going to ever only be a fairly small proportion of users who would ever learn or heavily use the scripting language even if it were Python or Perl. More documentation is going to help too. :slight_smile:

  .Seth

for the few
true pattern matching activities Bro has to do, there's plenty of CPU
to spare

Right.

but for script execution such as going to time-machine,
extracting files from pcap, etc., you're running out of CPU.

Yes in general for script execution, though that usually doesn't involve
the Time Mchine or pcap files.

So if you're running into a performance challenge with the scripting
language, would you consider switching from the native Bro scripting
language to an embedded interpreter from something like Perl, Python,
or Lua?

No, because we view Bro's domain-specific language as a big plus.

With the
increase in number of CPU cores climbing exponentially, a small
performance hit would probably be acceptable if it can be offset by
running on multiple cores.

Note, we have a major project on multicore network security analysis, which
focuses on Bro. So this is definitely on our radar. Here, having a
domain-specific language can be a significant win, since we can leverage
particular semantics for optimization that we could't if we used a general
interpreter.

I think a well-known script language would
also be a lot less scary for newcomers to Bro and really increase its
user base.

I wonder if it's the particulars of the language. Bro's scripting language
isn't itself that peculiar or hard to pick up. What gets harder is (1)
the large set of predefined events, (2) langauge quirks in support of things
like state management (but we'd need those anyway), (3) the lack of adequate
"here's the overall model" and "here's the paradigm for XYZ" documentation -
which we're definitely aiming to fix.

    Vern