How could get better optimization of pcap processing in Zeek?

Hello all,

I am using version 5.0.7-0 Zeek-Lts. I use ‘thread’ structure with C++ to process multiple pcap files. In this way, I had the opportunity to speed up.I get all the files while extracting. When content-intensive pcaps come in, the processing speed becomes extremely slow. What can I do for it? How can I speed it up?

(What I think is a solution is how can I prevent file output with unknown type to speed it up? If I can do this, fewer files will come out and I can get meaningful results faster.)

Thanks,

Hi there,

It sounds like you’re either modifying Zeek or writing a plugin. You will find that parallelizing pcap processing at this level will get tricky fast. The natural route in Zeek to parallelize processing is to build out a larger cluster, running more worker nodes. This doesn’t immediately work for pcaps, though, due to various subtleties around traffic load balancing and time management.

The good news is that we’ve started to work toward supporting this capability. It will take a bit for this to land fully. In the meantime, to parallelize pcap processing I recommend you run multiple individual instances of Zeek. If you configure each so that logs end up in a separate location (assuming you’re logging to files), you can then aggregate the outputs from there. This approach has drawbacks, for example around continuous state-keeping and state-sharing, but depending on your setting may work out fine.

There’s an interesting recent community contribution around pcap processing here: GitHub - emnahum/zeek-pcapovertcp-plugin: Zeek Packet acquisition via PCAP over TCP — I have not had a chance to explore it, but you could take a look and see if it’s useful to you.

More general recommendations include BPF filtering to remove unwanted traffic, and disabling analyzers via Analyzer::disable_analyzer().

I’m not sure what you mean by “file output with unknown type”.

Hope this helps,
Christian

2 Likes

First of all, thanks for giving detailed information.

As you mentioned, I got results by running a separate Zeek exe for each pcap file. It gave me better results for time efficiency.

What I exactly want is to be able to filter the extracted files. Instead of extract all files, for example, eliminating the extracted Binary or Archive (zip, tar, etc.) files and obtaining other file types instead. In this way, I think that the number of files will be reduced and the system will be faster. But when trying to do this filtering (in the directory .../policy/frameworks/files/extract-all-files.zeek) I got some errors about mime_type and file_sniff. Therefore, I cannot filter the files.

Thanks,

Cool — take a look at this package: GitHub - hosom/file-extraction: Extract files from network traffic with Zeek.

You can install it by saying zkg install file-extraction.

It’s dated but its approach remains valid, so could use it as a guide you in your own implementation. Also make sure to read the file extraction framework docs.

Hope this helps,
Christian

1 Like

Thank you for the information you have provided :slight_smile:

In addition to that, I want to find all file types (with the mime types) that Zeek has extracted. The mime types in the file-extension.zeek script in the GitHub link you mentioned are all mime types that Zeek is extracting? Do you have a resource you can recommend regarding all the mime types Zeek has made?

Thanks,

Please see the docs I pointed you at, here.

Best,
Christian

1 Like

I’m really appreciated. Thank you so much for your helping me.

Best wishes,

1 Like