The typical experience of developing in a programming language has changed substantially since the time Zeek script was first introduced in the mid 90s. Today users rightfully expect an inclusive environment with approachable ways to interact with the community, and development tools which aid in both source comprehension as well as development itself.
For some time, Zeek has provided https://try.zeek.org as a low barrier way to explore and share code. Zeek can be invoked to validate code instead of executing it with zeek --parse-only <file>
. More recently, Zeek has moved to Discourse as well as Slack for asynchronous and synchronous communication.
Nevertheless, development tooling was lacking and mostly centered around careful reading of the Zeek framework documentation and manual inspection of installed system scripts. Users have come to expect a development environment which supports source formatting, syntax checking in their editors, code navigation, and code completion. This post reports on such tooling for the Zeek ecosystem.
Prelude: a reusable parser for Zeek script
A crucial ingredient for the development for tooling for Zeek script was a new, reusable parser. This might appear surprising since Zeek itself already includes a parser for its scripting language.
Our requirements for a parser are (in no particular order):
- In order for tools to support different Zeek versions, the parser needs to be able to parse script syntax for a wide range of Zeek versions.
- Formatting requires that the parser can produce a full, lossless representation of the parsed source code, i.e., we require parsing to be able to produce a concrete syntax tree (CST).
- Use cases like code completion require a parser which can successfully parse syntactically or semantically invalid script code. This means that parsing needs to be only according to the language grammar.
- We do not want to be restricted to using only certain programming languages. The parser bindings should be ergonomic and not introduce too much overhead.
Zeek’s built-in parser makes these requirements hard to satisfy:
- It evolves together with the rest of Zeek. Support for language constructs might be dropped.
- While the produced parser output can contain references to source locations (e.g., for producing diagnostic messages), some source constructs are not directly represented.
- During parsing it performs syntactic as well as some semantic validation. If the input contains syntax errors, the parser is unable to recover and produces incomplete output.
- Currently the Zeek script parser is not a reusable component (e.g., a shared library). Consuming its output requires working with C++ types so integration with typical foreign function interfaces (FFI) of other languages requires a C wrapper.
For these reasons the choice was to instead develop an new, independent parser with Tree-sitter which allows writing fast, robust and dependency-free parsers for any programming language. Grammars are written in JavaScript and compiled into C parser code which can be used from a number of different languages.
Formatting code with zeek-format
As a first use case, support for formatting Zeek source code was implemented in the zeekscript set of tools. zeekscript
is implemented in Python and can be installed with pipx
or pip
:
$ pipx install zeekscript
installed package zeekscript 1.1.0, installed using Python 3.10.8
These apps are now globally available
- zeek-format
- zeek-script
done! ✨ 🌟 ✨
The primary user-facing tool is zeek-format
which formats Zeek source code:
$ zeek-format --help
usage: zeek-format [-h] [--version] [--inplace] [--recursive] [FILES ...]
A Zeek script formatter
positional arguments:
FILES Zeek script(s) to process. Use "-" to specify stdin as a
filename. Omitting filenames entirely implies reading from
stdin.
options:
-h, --help show this help message and exit
--version, -v show version and exit
--inplace, -i change provided files instead of writing to stdout
--recursive, -r process *.zeek files recursively when provided directories
instead of files. Requires --inplace.
zeek-format
produces source code formatted in the style of Zeek’s system scripts, e.g.,
$ echo 'event foo(x: Foo) { print x; }' | zeek-format -i
event foo(x: Foo)
{
print x;
}
zeek-format
will attempt to format as much source code as possible, even if the input contains errors, e.g.,
$ echo 'function missing_semicolon() { print 1 }' | zeek-format -i
function missing_semicolon()
{
print 1
}
zeek-format
has been tested on Zeek system scripts and is currently provided as an early preview.
Tooling integration for editors
With a fast and robust parser in place we were able to start working on improved editor integration for Zeek script.
Language server protocol
In the past, providing language integration for editors suffered from m × n complexity: in order to integrate m programming languages into n editors m × n integrations were required. In other words, to provide an integrated experience for development in Zeek script required providing individual plugins for each editor. This meant that outside of major languages or big editors integration was often poor.
Since then the language server protocol (LSP) has emerged as the de facto standard way to provide editor integration for programming languages. LSP solves the m × n problem by defining an API over which editors and language servers can communicate. The m × n problem then reduces to a m + n problem. We now require only m editor-agnostic language servers and n language-agnostic LSP clients for editors. LSP integration is available for many editors. This means that we only need to implement a single language server for Zeek script.
Language servers are typically implemented as separate binaries which are spawned by the editor when a certain language is detected. The client and server then communicate by sending each other notifications (e.g., editor notifying server that new file was opened; server notifying client of new diagnostics for syntax errors), or request/response messages (e.g., client requesting information for hover, target locations when going to definitions, or formatted documents). During registration, client and server announce their respective capabilities so that feature support in the server can be built out gradually.
Usage
- An installation of Zeek system scripts is required. We support either automatic discovery of their locations with
zeek-config
(needs to be inPATH
), or manually specifying their location via theZEEKPATH
environment variable. - Install the Zeek language server, e.g., from precompiled binaries. If you compile from source, consider using e.g.,
cargo-update
to pull in improvements or bug fixes. - If your editor does not support LSP out of the box, install a LSP plugin and configure it to launch
zeek-language-server
. - Edit.
For Visual Studio Code we have created an extension which packages everything needed for single-click installation and automatic updating. Additionally, this extension also contains actions to post code to https://try.zeek.org, and snippets for common code and syntax highlighting, both contributed by Fupeng Zhao.
Detour: Implementation
We choose to implement the server in Rust since it supports safely implementing concurrent architectures like servers as well as providing a good ecosystem for related tools.
The LSP-specific components are implemented on top of tower-lsp
which implements glue for message (de)serialization and dispatching of calls to API handlers. With that we can concentrate on implementing API handlers inside its LanguageServer
trait:
async fn goto_declaration(
&self,
params: GotoDefinitionParams,
) -> Result<Option<GotoDeclarationResponse>> {
todo!()
}
The server itself runs in a multithreaded tokio
runtime which in addition to allowing concurrent handling of messages also allows us to parallelize work, e.g., to provide faster availability of information by concurrently processing Zeek system files.
In order to provide hover or completion information we needed to implement minimal support for scoped name and type resolution. Since an editor would request this information for a small number of localized tokens, we choose to not implement full resolution. We instead perform local lookup in the file by climbing up the CST. This tends to be fast enough for files of typical size or nesting. Only if no resolution was found locally do we need to search declarations in all loaded files.
System Zeek scripts are already extensively documented with docstrings in Zeek’s zeekygen format which are used to e.g., generate API docs. Since the parser preserves these docstrings in the CST, we can extract them and expose them to users.
Since outside of bare mode (activated by zeek -b|--bare-mode
) Zeek implicitly pulls in a considerable number of source files, information is when possible cached in a salsa
database; we e.g., cache all external declarations visible in a certain file. Such information is precomputed at startup for Zeek system files to provide fast editor feedback when first requested.
When handling completion while editing, source files typically are not syntactically valid, e.g., statements might not yet be terminated with ;
. In this case, we benefit from the ability of Tree-sitter CSTs to hold the information on the full source, even if it contains errors, and can inspect source code of CST nodes containing errors to often successfully infer required context for completion.
We do not yet make use of Tree-sitter’s incremental parsing ability (the ability to edit source code for already parsed CST to avoid having to reparse it fully) since in our experience parsing was fast enough for typical files sizes. This could still provide a speedup for huge files if we also make the way CST nodes are represented independent of concrete source location so dependent computations can be cached.
Implementation status
We implement at least partial support for the following API calls:
-
hover
Provide information for the symbol under the cursor. -
completion
Provide completions at cursor location. In addition to completion of identifiers we have also implemented completion for file names in@load
, and event handlers afterevent
. -
document symbol
Show all symbols in the document. Editors can use this to e.g., show document outlines. -
workspace symbol
Search symbol in the full workspace. This can be used to e.g., search for identifiers in loaded as well as not loaded files. -
declaration
Find declaration of symbol under the cursor, e.g., a variable or function. For filenames in@load
statements this returns the location of the loaded file. -
definition
Find definition of symbol under the cursor. This is relevant if a symbol was first declared and defined later. -
implementation
Find all implementations of a particular hook or event handler. -
signature help
Completion triggered when editing function call arguments. -
folding range
Fold code according to syntax, i.e., independent of actual formatting. -
document formatting
Format document withzeek-format
if available. Some editors can be configured to trigger formatting automatically, e.g., when saving a file.
Exploration: Putting it all together
As an exploration on how a host-agnostic tool like https://try.zeek.org could evolve, we have created a self-contained environment to develop Zeek scripts. For that we have created a Zeek playground which consists of a preconfigured Visual Studio Code workspace on top of a fully containerized Zeek (a dev container). Such an environment would allow development of Zeek scripts on platforms without a Zeek installation, or even on platforms currently not supported by Zeek.
Development containers can be opened with Visual Studio Code locally. In this case the container environment will be provisioned and started, and the host editor can connect to it. The editor will guide through any installation of needed components. Files in the project directory are made available in the container.
Additionally, Github supports running dev containers. In this case, the dev container is started on Github servers, and a browser-based Visual Studio Code instance is launched and connected to it. Files remain in Github infrastructure which also provides tooling to manage them, tear down containers, and optionally commit changes to user repositories.
Epilogue
A self-contained parser for Zeek script has allowed us to implement first iterations of tooling, bringing a more modern development experience to Zeek script. While still a work-in-progress, the existing implementation should make developing Zeek script more accessible by allowing easier code exploration with better access to existing API docs, and giving contextual help in the form of code completion.
- Tree-sitter grammar for Zeek script
https://github.com/zeek/tree-sitter-zeek - zeekscript
https://github.com/zeek/zeekscript - Language server for Zeek script
https://github.com/bbannier/zeek-language-server -
Zeek Slack channel for tooling
#tooling