Modern developer tooling for Zeek script

The typical experience of developing in a programming language has changed substantially since the time Zeek script was first introduced in the mid 90s. Today users rightfully expect an inclusive environment with approachable ways to interact with the community, and development tools which aid in both source comprehension as well as development itself.

For some time, Zeek has provided https://try.zeek.org as a low barrier way to explore and share code. Zeek can be invoked to validate code instead of executing it with zeek --parse-only <file>. More recently, Zeek has moved to Discourse as well as Slack for asynchronous and synchronous communication.

Nevertheless, development tooling was lacking and mostly centered around careful reading of the Zeek framework documentation and manual inspection of installed system scripts. Users have come to expect a development environment which supports source formatting, syntax checking in their editors, code navigation, and code completion. This post reports on such tooling for the Zeek ecosystem.

Prelude: a reusable parser for Zeek script

A crucial ingredient for the development for tooling for Zeek script was a new, reusable parser. This might appear surprising since Zeek itself already includes a parser for its scripting language.

Our requirements for a parser are (in no particular order):

  1. In order for tools to support different Zeek versions, the parser needs to be able to parse script syntax for a wide range of Zeek versions.
  2. Formatting requires that the parser can produce a full, lossless representation of the parsed source code, i.e., we require parsing to be able to produce a concrete syntax tree (CST).
  3. Use cases like code completion require a parser which can successfully parse syntactically or semantically invalid script code. This means that parsing needs to be only according to the language grammar.
  4. We do not want to be restricted to using only certain programming languages. The parser bindings should be ergonomic and not introduce too much overhead.

Zeek’s built-in parser makes these requirements hard to satisfy:

  1. It evolves together with the rest of Zeek. Support for language constructs might be dropped.
  2. While the produced parser output can contain references to source locations (e.g., for producing diagnostic messages), some source constructs are not directly represented.
  3. During parsing it performs syntactic as well as some semantic validation. If the input contains syntax errors, the parser is unable to recover and produces incomplete output.
  4. Currently the Zeek script parser is not a reusable component (e.g., a shared library). Consuming its output requires working with C++ types so integration with typical foreign function interfaces (FFI) of other languages requires a C wrapper.

For these reasons the choice was to instead develop an new, independent parser with Tree-sitter which allows writing fast, robust and dependency-free parsers for any programming language. Grammars are written in JavaScript and compiled into C parser code which can be used from a number of different languages.

Formatting code with zeek-format

As a first use case, support for formatting Zeek source code was implemented in the zeekscript set of tools. zeekscript is implemented in Python and can be installed with pipx or pip:

$ pipx install zeekscript
  installed package zeekscript 1.1.0, installed using Python 3.10.8
  These apps are now globally available
    - zeek-format
    - zeek-script
done! ✨ 🌟 ✨

The primary user-facing tool is zeek-format which formats Zeek source code:

$ zeek-format --help
usage: zeek-format [-h] [--version] [--inplace] [--recursive] [FILES ...]

A Zeek script formatter

positional arguments:
  FILES            Zeek script(s) to process. Use "-" to specify stdin as a
                   filename. Omitting filenames entirely implies reading from
                   stdin.

options:
  -h, --help       show this help message and exit
  --version, -v    show version and exit
  --inplace, -i    change provided files instead of writing to stdout
  --recursive, -r  process *.zeek files recursively when provided directories
                   instead of files. Requires --inplace.

zeek-format produces source code formatted in the style of Zeek’s system scripts, e.g.,

$ echo 'event foo(x: Foo) { print x; }' | zeek-format -i
event foo(x: Foo)
{
	print x;
}

zeek-format will attempt to format as much source code as possible, even if the input contains errors, e.g.,

$ echo 'function missing_semicolon() { print 1 }' | zeek-format -i
function missing_semicolon()
{
        print 1
}

zeek-format has been tested on Zeek system scripts and is currently provided as an early preview.

Tooling integration for editors

With a fast and robust parser in place we were able to start working on improved editor integration for Zeek script.

Language server protocol

In the past, providing language integration for editors suffered from m × n complexity: in order to integrate m programming languages into n editors m × n integrations were required. In other words, to provide an integrated experience for development in Zeek script required providing individual plugins for each editor. This meant that outside of major languages or big editors integration was often poor.

Since then the language server protocol (LSP) has emerged as the de facto standard way to provide editor integration for programming languages. LSP solves the m × n problem by defining an API over which editors and language servers can communicate. The m × n problem then reduces to a m + n problem. We now require only m editor-agnostic language servers and n language-agnostic LSP clients for editors. LSP integration is available for many editors. This means that we only need to implement a single language server for Zeek script.

Language servers are typically implemented as separate binaries which are spawned by the editor when a certain language is detected. The client and server then communicate by sending each other notifications (e.g., editor notifying server that new file was opened; server notifying client of new diagnostics for syntax errors), or request/response messages (e.g., client requesting information for hover, target locations when going to definitions, or formatted documents). During registration, client and server announce their respective capabilities so that feature support in the server can be built out gradually.

Usage

  1. An installation of Zeek system scripts is required. We support either automatic discovery of their locations with zeek-config (needs to be in PATH), or manually specifying their location via the ZEEKPATH environment variable.
  2. Install the Zeek language server, e.g., from precompiled binaries. If you compile from source, consider using e.g., cargo-update to pull in improvements or bug fixes.
  3. If your editor does not support LSP out of the box, install a LSP plugin and configure it to launch zeek-language-server.
  4. Edit.

For Visual Studio Code we have created an extension which packages everything needed for single-click installation and automatic updating. Additionally, this extension also contains actions to post code to https://try.zeek.org, and snippets for common code and syntax highlighting, both contributed by Fupeng Zhao.

Detour: Implementation

We choose to implement the server in Rust since it supports safely implementing concurrent architectures like servers as well as providing a good ecosystem for related tools.

The LSP-specific components are implemented on top of tower-lsp which implements glue for message (de)serialization and dispatching of calls to API handlers. With that we can concentrate on implementing API handlers inside its LanguageServer trait:

async fn goto_declaration(
    &self,
    params: GotoDefinitionParams,
) -> Result<Option<GotoDeclarationResponse>> {
    todo!()
}

The server itself runs in a multithreaded tokio runtime which in addition to allowing concurrent handling of messages also allows us to parallelize work, e.g., to provide faster availability of information by concurrently processing Zeek system files.

In order to provide hover or completion information we needed to implement minimal support for scoped name and type resolution. Since an editor would request this information for a small number of localized tokens, we choose to not implement full resolution. We instead perform local lookup in the file by climbing up the CST. This tends to be fast enough for files of typical size or nesting. Only if no resolution was found locally do we need to search declarations in all loaded files.

System Zeek scripts are already extensively documented with docstrings in Zeek’s zeekygen format which are used to e.g., generate API docs. Since the parser preserves these docstrings in the CST, we can extract them and expose them to users.

Since outside of bare mode (activated by zeek -b|--bare-mode) Zeek implicitly pulls in a considerable number of source files, information is when possible cached in a salsa database; we e.g., cache all external declarations visible in a certain file. Such information is precomputed at startup for Zeek system files to provide fast editor feedback when first requested.

When handling completion while editing, source files typically are not syntactically valid, e.g., statements might not yet be terminated with ;. In this case, we benefit from the ability of Tree-sitter CSTs to hold the information on the full source, even if it contains errors, and can inspect source code of CST nodes containing errors to often successfully infer required context for completion.

We do not yet make use of Tree-sitter’s incremental parsing ability (the ability to edit source code for already parsed CST to avoid having to reparse it fully) since in our experience parsing was fast enough for typical files sizes. This could still provide a speedup for huge files if we also make the way CST nodes are represented independent of concrete source location so dependent computations can be cached.

Implementation status

We implement at least partial support for the following API calls:

  • hover
    Provide information for the symbol under the cursor.
  • completion
    Provide completions at cursor location. In addition to completion of identifiers we have also implemented completion for file names in @load, and event handlers after event.
  • document symbol
    Show all symbols in the document. Editors can use this to e.g., show document outlines.
  • workspace symbol
    Search symbol in the full workspace. This can be used to e.g., search for identifiers in loaded as well as not loaded files.
  • declaration
    Find declaration of symbol under the cursor, e.g., a variable or function. For filenames in @load statements this returns the location of the loaded file.
  • definition
    Find definition of symbol under the cursor. This is relevant if a symbol was first declared and defined later.
  • implementation
    Find all implementations of a particular hook or event handler.
  • signature help
    Completion triggered when editing function call arguments.
  • folding range
    Fold code according to syntax, i.e., independent of actual formatting.
  • document formatting
    Format document with zeek-format if available. Some editors can be configured to trigger formatting automatically, e.g., when saving a file.

Exploration: Putting it all together

As an exploration on how a host-agnostic tool like https://try.zeek.org could evolve, we have created a self-contained environment to develop Zeek scripts. For that we have created a Zeek playground which consists of a preconfigured Visual Studio Code workspace on top of a fully containerized Zeek (a dev container). Such an environment would allow development of Zeek scripts on platforms without a Zeek installation, or even on platforms currently not supported by Zeek.

Development containers can be opened with Visual Studio Code locally. In this case the container environment will be provisioned and started, and the host editor can connect to it. The editor will guide through any installation of needed components. Files in the project directory are made available in the container.

Additionally, Github supports running dev containers. In this case, the dev container is started on Github servers, and a browser-based Visual Studio Code instance is launched and connected to it. Files remain in Github infrastructure which also provides tooling to manage them, tear down containers, and optionally commit changes to user repositories.

Epilogue

A self-contained parser for Zeek script has allowed us to implement first iterations of tooling, bringing a more modern development experience to Zeek script. While still a work-in-progress, the existing implementation should make developing Zeek script more accessible by allowing easier code exploration with better access to existing API docs, and giving contextual help in the form of code completion.


1 Like

As a follow-up to this post, I would be curious to know how you write Zeek script code (if at all). I created a simple poll below, but feel free to leave free-form replies as well (this is Discourse after all :stuck_out_tongue:).

Development environment

Which editor do you primarily use to edit Zeek scripts?
  • I do not edit Zeek scripts
  • Visual Studio Code
  • Sublime
  • Atom
  • Intellij IDEA
  • Eclipse
  • vi/vim/nvim
  • Emacs
  • an editor not listed here

0 voters

On what platform do you most often edit Zeek scripts?
  • Windows
  • Linux
  • macOS
  • FreeBSD
  • other

0 voters

In the environment where you typically edit Zeek scripts, can you install additional tooling like e.g., editor plugins?
  • yes
  • no

0 voters

When thinking of editing code in a scripting language like Zeek script, what editor integrations are most important to you?
  • syntax highlighting
  • code navigation
  • automatic code formatting
  • access to documentation
  • code completion
  • automatic linting/syntax checking
  • debugging support
  • snippet support

0 voters

zeek-format

Are you currently using zeek-format?
  • yes
  • no

0 voters

If not, why?
  • not interested
  • unsupported on the platform I use
  • unclear how to install it
  • poor experience when I tried it out
  • does not match the style I care about
  • I did not know about zeek-format and might set it up later

0 voters

Language server

Are you currently using the language server mentioned above?
  • yes
  • no

0 voters

If not, why?
  • not interested
  • unsupported on the platform I use
  • unclear how to set it up for my editing environment
  • poor experience when I tried it out
  • does not support feature(s) I care about
  • I did not know about this language server and might set it up later

0 voters

If you are using the language server mentioned above, how do you install the server binary?
  • it comes with a plugin I use
  • I download binaries from the release page
  • I build from source

0 voters

How do you learn about new releases for this language server?
  • Slack posts
  • I follow the GitHub repository
  • handled by some automation for me
  • visiting the Github repository
  • I never learn about new releases

0 voters

I am closing the polls. Thank you for leaving feedback, all very useful input. If you have anything else to share, feel free to leave it in this topic or bring it up in #tooling.