About the performance improvement by compiling zeek scripts to C++ code

Hi everyone,

I have a few questions about trying to improve the overall performance of zeek by compiling zeek scripts to C++. As Dr. Vern mentioned in the post , in recent zeek releases (5.x), zeek now have the ability to compile zeek scripts to C++ code and run them directly within zeek (as mentioned in the README guide on GitHub).

We did some initial testing, but the results were not as expected. Here are some test details:

  1. We extended the default http handling script in the base directory by adding a few more fields to the original record and extracting the bodies of http requests and replies through events. We named the new extension script as http-ext.zeek and put it in the site directory.

  2. We load the http-ext.zeek by the local.zeek script under the same directory.

  3. We followed the instructions and compiled the script using zeek -O gen-C++ http-ext.zeek.

  4. The network traffic we used for the test contained a mix of http and dns, with a CPS of ~1.5K and an http payload of ~3.9KB.

  5. We ran a single zeek process using -O use-C++ local.zeek and analyzed the network traffic. Zeek reported no drops and had a CPU usage of ~52% according to the top command.

  6. For comparison, we ran zeek again under the same conditions, but with the -O ZAM option. We found that zeek also reported no drops and had a CPU usage of ~48%.

However, the results show that the ZAM optimization is similar or better than the compilation, which is contrary to what we expected. Based on our limited understanding, we thought that compiling the extended http script into C++ code would bypass the zeek script engine, and since our test network traffic contained over 90% http traffic, we expected a significant performance (1x% ~2x%) improvement. But it seems that’s not the case.

We are wondering if there are any mistakes in our tests or understanding? And we have some additional questions about the compilation process.

Since the http-ext.zeek is an extension of the default HTTP scripts, will the HTTP handling scripts in the base directory also be compiled to C++? And if not, can we also compile the HTTP handling scripts in the base directory and expect an improvement in the TCP to HTTP analyzing pipeline?

We appreciate any help and are happy to provide more details about the tests if needed.

Thank you!

If you’re active on Zeek Slack, let’s take the discussion there. There are a lot of specifics to go into that’ll be easier to do via Slack rather than here. However if that’s not an option, let me know and we can iterate here.

Sure, Dr. Vern, let’s discuss in slack. Thanks!

Would be happy if you can share here the conclusions from your slack discussion

Hello,

Dr. Vern had fixed several bugs of the optimization code and I believed all code had been merged into the latest release of zeek. Now, the optimization mechanism functions seamlessly right out of the box.

You can find both the ZAM and C++ optimization guides at: https://github.com/zeek/zeek/tree/master/src/script_opt

Depending on the traffic model, both ZAM and C++ should provide performance enhancements, although the extent of the enhancement may vary. And you can only choose one of the optimization method.

I wonder if @Vern Dr. Vern has additional insights or if there’s anything I’ve misunderstood.

Thank you.

What you sketch is correct. Those who are interested should note that script optimization remains experimental, and is not yet heavily tested, so bugs continue to turn up. It’d be great to hear from users who encounter problems - if possible, first confirming that they still manifest when running off of the latest version in GitHub, since I’m making frequent updates there.