Building a Redis Analyzer with Spicy

You can find the source code referenced throughout this post here - though, it is quite different from what is written here.

Redis is an in-memory, key-value database. Its primary use is for caching. It does this over the network, but it’s only really meant to be used by “trusted clients”:

Redis is designed to be accessed by trusted clients inside trusted environments.

What if you want to make sure nothing bad happens to your Redis instance? Say you want to analyze the traffic going to and from that Redis instance just to be extra sure it’s secure. Well, you can - Zeek and Spicy have you covered. This is the first of a two-part miniseries of posts in which I’ll be diving into how I went about writing a Spicy Redis parser.

As for myself, I’m Evan, the newest member of Corelight’s open source Zeek team. I joined in August 2024. My background is mostly with parsers and compilers, so working with Spicy came relatively quickly for me.

I won’t be dwelling too much on how to actually parse the Redis serialization protocol - that’s too specific to be worthwhile for any general audience. For what it’s worth, the Spicy docs have a good tutorial that can get you up and running. Instead, I want to focus more on what isn’t covered in documentation, like design decisions and how I chose to solve certain edge cases.

As a word of warning, this is not a tutorial and it’s not meant to be perfect Zeek or Spicy code. It’s an exploration on how to approach writing a Spicy analyzer for Redis from nothing. This was my first analyzer, after all.

Redis RESP

When Redis talks between the client and server, it’s all just serialized data. When communicating between two endpoints, there must be a structured way to transmit data - like protocol buffers. For Redis, this serialization format is the Redis serialization protocol (or RESP).

It turns out RESP is very straightforward to parse. There are a couple weird cases: a bulk string starts with its length, but if you give a length of -1, it’s a special “null bulk string.” From what I can tell, though, those are mostly documented. The edge cases aren’t really hard to plan for.

But, what in the world can you do with a bunch of serialized data? At this point, there are no commands, keys, or values. It’s all just data. You can store it and use your human brain, but that is just an interpretation of the serialized data. Nothing in its structure indicates what it does. So we need another pass!

Spicy’s role

Spicy is a binary parser generator (that is, it generates parsers that create structure from binary data). Spicy is specialized for network traffic - like Redis! All you have to do is describe the structure of the data, then Spicy will generate a nice parser that structures the stream of binary data.

So what data are we parsing? RESP in the general case looks like this:


<type><data>\r\n

Where is a byte representing how to interpret , followed by a carriage return newline (CRLF).

If you are parsing and the first byte you see is + - that means what follows is a simple string. Here’s what that might look like in Spicy:

public type Data = unit {
    ty: uint8; # This will eventually become an enum for each possible type
    if (self.ty == '+') { # … and this will become a switch over those types
        simple_string: # TODO
    };
};

So this parses a byte (uint8), then if that is a ‘+’, it starts trying to parse some simple_string field (left unimplemented). What is a simple string in RESP?

Simple strings are encoded as a plus (+) character, followed by a string. The string mustn’t contain a CR (\r) or LF (\n) character and is terminated by CRLF (i.e., \r\n).

Simple! So we just parse until a CRLF. That’s super convenient in Spicy:

simple_string: bytes &until=b”\x0d\x0a”;

That’ll parse until hex code 0x0D followed by 0x0A - CRLF.

Later iterations of the parser will defensively limit the size of each field. Leaving this as-is would allow simple_string to grow until you run out of memory - which could be very bad.

We can then shove this into some resp.spicy file:

# resp.spicy
module RESP;
public type Data = unit {
    ty: uint8; # This will eventually become an enum for each possible type
    if (self.ty == '+') {
        simple_string: bytes &until=b"\x0d\x0a";
    };
    on %done {
        print self;
    }
};

And run it with some test data to see what it prints:

$ printf "+Hi there\r\n" | spicy-driver resp.spicy
[$ty=43, $simple_string=b"Hi there"]

With that, we can parse simple strings passed via RESP.

I won’t go through every case. It turns out that the remaining cases are pretty simple to parse from here. We have a bunch of types - the first byte. Then, we use a big switch statement that determines how to parse the remainder based on that first byte. Each part is split with CRLF. This works well!

Aside: Unrecognized first byte

There is, sadly, one case where you can’t “parse the remainder based on that first byte.” In fact, the server works perfectly fine if you just send some command followed by CRLF. You see this in pipelining - the example that sends three PING commands never even serializes those PING commands:

$ (printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379
+PONG
+PONG
+PONG

This is called inlining - just arbitrary text followed by CRLF.

How can we account for that? Spicy will just throw an exception if a switch statement doesn’t have an explicit case for the provided value. For example, if we don’t have a default (*) arm for the switch, sending an inline PING command will make the Spicy parser throw an exception:

$ printf "PING\x0d\x0a" | spicy-driver resp.spicy --parser RESP::Data

[error] terminating with uncaught exception of type spicy::rt::ParseError: no matching case in switch statement for value 'DataType::<unknown-80>' (resp.spicy:12:5-35:10)

It turns out, inlining is a bit annoying to deal with. If we don’t recognize the first byte, we should go back and store all of it until CRLF. For now, I went a bit overboard and made this monstrosity:

* -> not_serialized: bytes &convert=(pack(cast<uint8>(self.ty), spicy::ByteOrder::Network) + $$) &until=b"\x0d\x0a";

… all that does is prepend the self.ty byte to the bytes that get parsed. That can look nicer, but hey, it works.

That this changed in the current version to using random access in Spicy to reparse the first byte, which is almost certainly better. You can read more about that in the second post on the Spicy Redis analyzer.

Two stage parsing

Now we have some serialized data, but no idea what that data means. So, we need to parse it again! All a parser does is consume less-structured output and create more-structured output. With that in mind, let’s take a detour. Consider the following construct in any programming language:

if (i > 3) {
}

Parsing the RESP data resembles tokenizing a program in a compiler. Tokenizing just splits the bytes from a file into “tokens” which represent meaningful pieces. For example, the ‘(‘ character would get transformed into an LPAREN token. After tokenizing, the program may have the following token stream:

IF LPAREN ID GREATER_THAN NUMBER RPAREN LBRACE RBRACE

But, this is unstructured. It’s just a stream of tokens that would love to get put into some object. A compiler would then parse this stream of tokens and create some sort of node. Here’s how that might look (in as intuitive as a format as I can muster):

|-statement.If
  |-condition: expression.BinaryOp
    |-lhs: expression.Identifier “i”
    |-rhs: expression.Number 3
    |-kind: GT
  |-body: statement.CompoundStatement {}
  |-else: null

So back to RESP, we parsed the binary data on the wire into proper RESP format - much like a compiler tokenizing a program. Now we want to parse that “tokenized” data in order to get useful information, like commands sent over the wire.

Well, at this point, you can do that wherever you want (Zeek script, C++, Spicy). But, for this analyzer, the second pass parsing is done in Spicy.

Doing more in Spicy

Commands in Redis are just serialized data in RESP. From the Redis documentation:

Clients send commands to a Redis server as an array of bulk strings.

So, if we want to parse a command, we need to grab some array (of bulk strings) and process that. We can then shove that into a data structure, then off into the void (well, to a Zeek event).

First, we need to determine whether a given array is indeed a command. As an example, I’ll use a GET command, like GET my_key. After the Spicy parser gets its hands on it, the serialized data gets printed like this:

[$ty=DataType::ARRAY, $array=[$num_elements=2, $elements=[
    [$ty=DataType::BULK_STRING, $bulk_string=[$length=3, $content=b"GET"]],
    [$ty=DataType::BULK_STRING, $bulk_string=[$length=6, $content=b"my_key"]]
]]]

So we can crudely determine if this is a GET command. Does the array have two elements? Is the first element GET? That means it’s a GET command!

That’s pretty intuitive in Spicy code:


public function is_get(arr: RESP::Array): bool {
    # GET key
    if (arr.num_elements != 2)
        return False;
    local cmd = command_from(arr);
    return cmd && (*cmd == Command::GET);
}

Using some command_from function I won’t bore you with - it returns an optional enum value associated with the command. We get that enum value just comparing the first array element to known commands.

Here we’re using Spicy like a general purpose programming language! This has nothing to do with generating a parser for network traffic yet, it’s simply a function which takes in a value and returns another. Spicy isn’t just a parser generator, it’s a programming language.

Now that we know it’s a GET command, we need to take that data and put it in some structure to pass along to Zeek. That looks pretty simple too:

type Get = struct {
    key: bytes;
};

public function make_get(arr: RESP::Array): Get {
    return [$key = arr.elements[1].bulk_string.content];
}

When this Get struct is passed into Zeek, it magically transforms into the following record in Zeek script:

type GetCommand: record {
    key: string;
};

So how does it get sent into Zeek world? With a definition in a .evt file, of course!

on RESP::Array if ( Zeek_RESP::is_get(self) ) -> event RESP::get_command($conn, $is_orig, Zeek_RESP::make_get(self));

So if our is_get function returns true, then we trigger the event get_command and send the result of make_get off into Zeek. In Zeek script, that event signature looks like:

event RESP::get_command(c: connection, is_orig: bool, command: GetCommand)

The Get struct in Spicy and GetCommand in Zeek script aren’t related, they just automatically translate to each other since they look the same - cool.

That is the foundation for any specific command event we could want. If we want an event for AUTH commands, we’d do the same thing for that. I’m not quite sure that an event is useful for GET commands, but it’s one of the simplest commands that you’ll frequently use.

What do we log?

One of the first questions you need to answer when making a new analyzer is “what do I log?” For RESP (that is, the serialization protocol), the answer is honestly “nothing.” You simply don’t have enough information from just some serialized data over the wire. If you shove everything into some “data log” then it’s just too much to do anything meaningful with.

But what about Redis commands? Most of the communication you see is a client sending commands to a server. Those commands may set or get keys, publish to a channel, or get configuration values. Having those seems useful - just a list of what commands were sent to the server seems like it’d get a pretty useful idea of what was going on. Many commands act on keys and values (Redis is a key-value database, after all). Having that generic info would also be nice.

So what is a command? We know from before it’s an array of bulk strings - so any time the client sends one, it’s a command.

Well, you’re missing a case - inlining! This acts like a command for my local running redis server:

$ (printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379
+PONG
+PONG
+PONG

All Redis actually handles from the client is an array of bulk strings or inlined data. We discussed how to handle the array of bulk strings before, and it’s similar here. Here’s a simple Spicy function that will “tokenize” that inline data, if we get it:

public function inline_command(inline: RESP::Data): vector<bytes> {
    # Only call this if it's inline :)
    assert inline?.not_serialized;
    return inline.not_serialized.split();
}

Well, we can leave it there. Don’t ask what happens if you have a “quoted string with spaces.”

How do we log this?

So now: every array of bulk strings will be treated as a command. Also, every unserialized request will be treated as a command. The rest is actually pretty straightforward: we search through every Redis command, note which ones operate on keys and their values, then structure the data. That slightly structured data is sent into Zeek world in a record.

In order to turn the tokenized data (either from the bulk strings composing the array or the split inline data) into a Command, we have a new function:


function parse_command(raw: vector<bytes>): Command {

The resulting Command just contains the generic info in one structure:

type Command = struct {
    raw: vector<bytes>;
    command: bytes;
    key: optional<bytes>;
    value:optional<bytes>;
    known: optional<KnownCommand>;
};

There are a few cases this may not work, for example command may be multiple tokens. Also, it’s impossible to enumerate every known command, so the known field is an optional enum that is only set if we recognize the command.

This gets transformed into some analogous record in Zeek script auto-magically. That Zeek script record can choose to log only parts of this (like the command, key, and value fields).

The remainder of this is a simple solution: large switch statements! We parse the command, set the known enum field if we recognize it, then switch on that to figure out which arguments correlate to a key or value. Quite frankly, I’d show it, but it’s ugly and unruly enough that I don’t think that’s a good idea :slight_smile:

The end result

Finally, we’re left with something pretty nice. We’ll just log the client’s commands, so there is no additional upkeep, just send the data into logs. Here’s an example where a server will try to GET a fibonacci number, but if it’s not cached, it’ll calculate and cache all of them up to the given point:

GET :1:factorial_50 -
SET :1:factorial_1 1
SET :1:factorial_2 2
SET :1:factorial_3 6
SET :1:factorial_4 24
SET :1:factorial_5 120
SET :1:factorial_6 720
SET :1:factorial_7 5040
SET :1:factorial_8 40320
SET :1:factorial_9 362880
SET :1:factorial_10 3628800
SET :1:factorial_11 39916800
SET :1:factorial_12 479001600
SET :1:factorial_13 6227020800
SET :1:factorial_14 87178291200
SET :1:factorial_15 1307674368000
SET :1:factorial_16 20922789888000
SET :1:factorial_17 355687428096000
SET :1:factorial_18 6402373705728000
SET :1:factorial_19 121645100408832000
SET :1:factorial_20 2432902008176640000
SET :1:factorial_21 51090942171709440000
SET :1:factorial_22 1124000727777607680000
SET :1:factorial_23 25852016738884976640000
SET :1:factorial_24 620448401733239439360000

I removed the timestamp, uid, and 4-tuple for your viewing pleasure.

That’s about as clean as I’d hope for a generic approach to logging “something useful” about Redis. So, I’m happy with it.

Overall, RESP itself is a terribly uninteresting protocol. What Redis does with this RESP could be very interesting. Parsing RESP is more like tokenizing. The second parse step is what actually turns that unstructured data into commands that are useful for the user.

The analyzer explored here is nowhere near complete. The events are a bit minimal and the client/server relationship can be more fleshed out. That’s what we’ll explore next time :slight_smile:

1 Like

Part 2

We left off with a functional but incomplete analyzer for Redis. It analyzed RESP (Redis Serialization Protocol) traffic, crudely created “command” objects, then sent that off into Zeek script in events in order to log it. It created something nice, but there are some big holes.

If you want more on the initial intricacies of making a Redis analyzer in Spicy, check out the first post. This is building off that post, so it may not make sense without the first.

Here we’ll discuss the biggest of the holes after the first part: the client/server relationship.

Redis client/server

As we discussed in the first post, there’s a difference between RESP (that is, serializing data) and Redis commands. RESP is just a way to serialize data, but it does have one small notion of a client/server relationship, even though it’s not entirely clear. Here’s a quote from the “inline commands” section in the RESP documentation:

Basically, to issue an inline command, you write space-separated arguments in a telnet session. Since no command starts with * (the identifying byte of RESP Arrays), Redis detects this condition and parses your command inline.

This seems to imply that the client can only send data as an array (which starts with *) or inline (that is, without any serialization). If we boot up a redis server and try just sending one simple string:

$ (printf "+SET hi there\r\n"; sleep 1) | nc localhost 6379
-ERR unknown command '+SET', with args beginning with: 'hi' 'there'

It seems like the server is interpreting +SET as a command, rather than a serialized piece of data. Sure enough, we can find this in the Redis source code:

/* Determine request type when unknown. */
if (!c->reqtype) {
    if (c->querybuf[c->qb_pos] == '*') {
        c->reqtype = PROTO_REQ_MULTIBULK;
    } else {
        c->reqtype = PROTO_REQ_INLINE;
    }
}

if (c->reqtype == PROTO_REQ_INLINE) {
    if (processInlineBuffer(c) != C_OK) break;
} else if (c->reqtype == PROTO_REQ_MULTIBULK) {
    if (processMultibulkBuffer(c) != C_OK) break;
} else {
    serverPanic("Unknown request type");
}

This will mark any request that starts with * as multibulk and anything else as inline. So, our big Data unit doesn’t need to be quite so big for clients - and, as a bonus, we can just say everything that comes from the client is a command. We will no longer use the same structure for both the client and the server! Even if that’s not what the RESP docs seem to suggest (to me, at least).

Spicy client/server

So now we have a new set of requirements: clients and servers don’t send the same thing back and forth, the client may send one of two possibilities, while the server may send any serialized data back.

Before, our main message parsing looked like this:

public type Messages = unit {
    : (Data &synchronize)[];
};

Which was used for both clients and servers. Now, we’ll have two:

public type ClientMessages = unit {
    : (ClientData &synchronize)[];
};

public type ServerMessages = unit {
    : (ServerData &synchronize)[];
};

Then, we need to tell Zeek to use ClientMessages for the client and ServerMessages for the server in the .evt file:

protocol analyzer spicy::Redis over TCP:
    parse originator with RESP::ClientMessages,
    parse responder with RESP::ServerMessages;

Any originator data is parsed with ClientMessages. Any responder data is parsed with ServerMessages.

The new data

The client data is just a simplified version of Data from before. Instead of a big switch that enumerates all possibilities, we have two possibilities, so we can just go back to an if statement:


if (self.ty == DataType::ARRAY) {
    multibulk: Array;
} else {
    inline: RedisBytes &convert=(pack(cast<uint8>(self.ty), spicy::ByteOrder::Network) + $$) &max-size=1024;
};

Unfortunately for anyone with eyes, the big unruly &convert stayed for inline data. Again, that’s just to prepend the byte from ty to the remaining bytes until CRLF. While we’re here, let’s fix that using random access in Spicy.

First we store the starting point on initialization:

var start: iterator<stream>;
on %init() { self.start = self.input(); }

Then, the ty field just “unparses” itself if it’s not an array:

ty: uint8 &convert=DataType($$) {
    if (self.ty != DataType::ARRAY) {
        # This is inline, so we need to reparse `ty`
        self.set_input(self.start);
    }
}

Voila, we have our complete ClientData:

public type ClientData = unit {
    %synchronize-after = b"\x0d\x0a";

    var start: iterator<stream>;
    on %init() { self.start = self.input(); }

    # Clients can only be an array or inline
    ty: uint8 &convert=DataType($$) {
        if (self.ty != DataType::ARRAY) {
            # This is inline, so we need to reparse `ty`
            self.set_input(self.start);
        }
    }

    if (self.ty == DataType::ARRAY) {
        multibulk: Array;
    } else {
        inline: RedisBytes &max-size=1024;
    };
};

Server data

Server data is actually much easier, it’s just serialized RESP data:

public type ServerData = unit {
    %synchronize-after = b"\x0d\x0a";
    data: Data;
};

So why does the ServerMessages unit not just use Data directly? Well imagine an array of two elements:

[$ty=DataType::ARRAY, $array=[
    $num_elements=2,
    $elements=[
        [$ty=DataType::SIMPLE_STRING, $simple_string=[$content=b"Hi"]],
        [$ty=DataType::SIMPLE_STRING, $simple_string=[$content=b"there"]]
    ]
]]

We have the outer Data object (with $ty=DataType::ARRAY) and two inner Data objects (with $ty=DataType::SIMPLE_STRING). If we trigger an event for some Data object, we’ll get three events, one for each. With the ServerData wrapper, we can trigger an event on all ServerData and only get one event per server response.

Revisiting command parsing

If you recall, there were two steps to parsing the Redis client data:

  1. Parse serialized data as RESP
  2. Form the serialized RESP into Redis commands

The second step was done with spicy function calls from the .evt file. There were separate cases for arrays of bulk strings and inline data (handled by functions like make_command and inline_command). We can clean a lot of that up. It’s now absolutely trivial to know that something is a command when parsing the RESP data: if it’s from the client, it’s a command.

Since anything from the client is a command, we can add an extra step after parsing the client data that “parses” the parsed data. For this, we’ll use a variable and a hook, like so:

public type ClientData = unit {
    # …
    var command: Redis::Command;
    on %done {
        self.command = Redis::make_command(self);
    }
};

Okay, so every ClientData sets its command from the result of make_command. That is defined in some redis.spicy file here:

public function make_command(command: RESP::ClientData): Command {
    if (command?.multibulk)
        return bulk_command(command);
    else
        return inline_command(command);
}

Here we finally have a relatively elegant solution: all client data gets reparsed into a Command object. Bulk commands are already “tokenized,” while inline commands need to get special handling to tokenize them (read: splitting on whitespace, but not if it’s quoted). The Command object from make_command has anything you may need:

type Command = struct {
    raw: vector<bytes>;
    command: bytes;
    key: optional<bytes>;
    value:optional<bytes>;
    known: optional<KnownCommand>;
};

One of the more powerful pieces here is the known field, where if we recognize the command, we store it as an enum. Remember in the previous post, we used the is_get method which got a KnownCommand from a command_from function. This recalculated the KnownCommand every time it was called. We can now avoid that recalculation by using the known field. For example, this is how we can check if a command is GET now:

public function is_get(data: RESP::ClientData): bool {
    return data.command.known && *(data.command.known) == KnownCommand::GET && |data.command.raw| >= 2;
}

Just with some extra validation to make sure we don’t do anything silly.

This also cleans up the .evt file substantially. Here’s the event definition that triggers for each command:

on RESP::ClientData -> event Redis::command($conn, $is_orig, self.command);

There are no conditions nor extra function calls! It just shoves what we parsed into an event. Nice and simple. Before, we needed to (try) to make sure we weren’t sending server data as a command - but now we know!

So, separating the client and server really simplified client data. It also allowed us to avoid repeating work. But, the real reason for this preliminary work is to correlate a request with the server’s response.

Associating requests with responses

Now we want to connect some client data to the server data that responded to it. I implemented this in Zeek script, but depending on your use, it may be a lot easier to use Spicy’s contexts to solve this problem.

This is done in Zeek script, but that isn’t an endorsement for this particular problem - it probably would be better to use contexts in the general case. But, the Redis analyzer has a couple considerations, such as some commands that can disable server responses. I wanted to keep the understanding of what Redis commands do out of the Spicy code entirely - that way Zeek script can decide what to do with it. But, that doesn’t mean this couldn’t have been solved in a better way.

We already have all client data getting sent to Zeek script, but not server data, so we’ll add that to the .evt file:


on RESP::ServerData -> event Redis::server_data($conn, $is_orig, Zeek_Redis::make_server_data(self));

And what’s the result of make_server_data? We can’t currently send over recursive data types, so we’ll trim it down. Here’s the structure I came up with:

public type ZeekServerData = struct {
    err: bool;
    data: optional<bytes>;
};

So it tells you whether this is an error or not. Then, data will get set if we have some really common cases like simple_string or simple_error - that seems like a good enough start for now.

Now we’ll go over into Zeek script for the rest of this. That same structure is here in Zeek script world with attached &log annotations:

type ServerData: record {
    ## Was this an error?
    err: bool &log;
    ## The string response, if it was a simple string or error
    data: string &log &optional;
};

Then we have an Info object that stores the valuable information that we’ll later log (some generic fields removed):

type Info: record {
    # …

    ## The Redis command
    cmd: Command &log &optional;
    ## The response for the command
    response: ServerData &log &optional;
};

Then, we need a way to correlate the Command and its response. Naively, you could just use one Info object attached to the connection, then the following should happen for most requests/responses:

  1. The command event is triggered. Set the Info$cmd field in that event.
  2. The server_data event is triggered. Set the Info$response field in that event.
  3. Log the Info record in the server_data event.

Except that won’t work.

Why not? Well, you can send multiple commands, then get multiple responses. If we sent three commands and got three responses, the above would overwrite Info$cmd twice, use the third value for the first response, then log 2 responses without an associated command.

Instead, we need a queue. That’s implemented in some State record:

type State: record {
    ## Pending requests.
    pending: table[count] of Info;
    ## Current request in the pending queue.
    current_request: count &default=0;
    ## Current response in the pending queue.
    current_response: count &default=0;
};

So we have some association of a count to that request’s Info, then track the current “pointer” to a request and response.

Then we add a bit to the connection record so this persists between events:

redef record connection += {
    redis: Info &optional;
    redis_state: State &optional;
};

Okay, setup complete! So the goal here is to correlate command N with response N, then log those when they’re complete. For now, we’re going to make a lot of assumptions (frankly, probably too many assumptions). That’ll be good to get the proof-of-concept.

First, when we get a Command event, we need to put that into the State. I’ll do that in a helper function, set_state, so the handling in the event is pretty minimal:

# In command event
++c$redis_state$current_request;
set_state(c, T);
c$redis$cmd = command;

And set_state’s signature looks like this:

function set_state(c: connection, is_orig: bool)

We use the c$redis field after set_state - so we expect set_state to properly set the c$redis field. The “appropriate” Info value depends on whether this was a request or a response, which is why we passed in the is_orig boolean. Based on that, the set_state function starts by identifying the correct index:

local current: count;
if ( is_orig ) current = c$redis_state$current_request;
else current = c$redis_state$current_response;

Then we just need to grab that pending Info and set the field in the connection object:

# In case this is the first time we see this request/response
if ( current !in c$redis_state$pending )
    c$redis_state$pending[current] = new_redis_session(c);

c$redis = c$redis_state$pending[current];

Given that set_state, we can see that the server_data event has almost identical handling, just with a different is_orig value:

# In server_data event
++c$redis_state$current_response;
set_state(c, F);
c$redis$response = data;

There is one extra piece in the server_data event, though:

Log::write(Redis::LOG, c$redis);
delete c$redis_state$pending[c$redis_state$current_response];

Since we’re making the assumption that a response always comes after a request, we just assume it’s ready for logging and send it off to get logged.

Requests without responses

Okay, there’s one more point with logging. There are a number of times that Redis will not send one request for one command exactly. You can even control this manually with the CLIENT REPLY command.

With CLIENT REPLY, you may disable responses. Therefore, we can’t just increment current_response and expect it to line up with the requests. Instead, we watch for CLIENT REPLY commands, then if we turn responses back on, we make the response number “catch up” with the request number. This is done with a new response_num function (which I will not show):

local previous_response_num = c$redis_state$current_response;
c$redis_state$current_response = response_num(c);

Then, we log anything between previous_response_num and current_response, and finally this one:

while ( previous_response_num < c$redis_state$current_response )
    {
    if ( previous_response_num == 0 )
        {
        ++previous_response_num;
        next;
    }

    if ( previous_response_num in c$redis_state$pending )
        {
        Log::write(Redis::LOG, c$redis_state$pending[previous_response_num]);
        delete c$redis_state$pending[previous_response_num];
        }
    previous_response_num += 1;
    }

# Log this one
Log::write(Redis::LOG, c$redis);
delete c$redis_state$pending[c$redis_state$current_response];

So that extra logic is only necessary because we can manually disable responses. This sort of logic can vary a fair bit depending on what you care about in your protocol. You can imagine that many protocols will have some nuance to them that would require similar logic, though.

The logs

Remember the Fibonacci logs from last time? Well, now we can have the response after!

GET :1:factorial_50 - F -
SET :1:factorial_1 1 F OK
SET :1:factorial_2 2 F OK
SET :1:factorial_3 6 F OK
SET :1:factorial_4 24 F OK
SET :1:factorial_5 120 F OK
SET :1:factorial_6 720 F OK
SET :1:factorial_7 5040 F OK
SET :1:factorial_8 40320 F OK
SET :1:factorial_9 362880 F OK
SET :1:factorial_10 3628800 F OK
SET :1:factorial_11 39916800 F OK
SET :1:factorial_12 479001600 F OK
SET :1:factorial_13 6227020800 F OK
SET :1:factorial_14 87178291200 F OK
SET :1:factorial_15 1307674368000 F OK
SET :1:factorial_16 20922789888000 F OK
SET :1:factorial_17 355687428096000 F OK
SET :1:factorial_18 6402373705728000 F OK
SET :1:factorial_19 121645100408832000 F OK
SET :1:factorial_20 2432902008176640000 F OK
SET :1:factorial_21 51090942171709440000 F OK
SET :1:factorial_22 1124000727777607680000 F OK
SET :1:factorial_23 25852016738884976640000 F OK
SET :1:factorial_24 620448401733239439360000 F OK
SET :1:factorial_25 15511210043330985984000000 F OK
SET :1:factorial_26 403291461126605635584000000 F OK
SET :1:factorial_27 10888869450418352160768000000 F OK

The server said OK for each of them and never errored.

Here’s what an error would look like:

SET bad1 "unclosed double quotes T ERR Protocol error: unbalanced quotes in request

This was from an inline request with unbalanced quotes. You can see T in the fourth column referring to the err field - so it was an error. It had a protocol error message. Cool.

Conclusions

So, the client and server have an obvious relationship. One sends a command, the other responds. We want to associate those two things together. There are a bunch of considerations though, not all of which are easy to solve.

There is one hole that seems pretty difficult to account for: synchronization. That means the analyzer can get in a very weird state from some unexpected input, then recover to a known state. But, now our counts are going to be off, since we’ll never send an event into Zeek script to keep track of that. These sorts of edge cases may be more or less important to deal with elegantly, depending on your use case and your protocol.

Most protocols need some different handling to associate a request and response. In fact, the basis for the code I wrote for Redis is based on HTTP. But, at a certain point, you have to consider your specific protocol and its edge cases. That requires extra work and a keen eye to spot each case. Solving those problems may be hard (or impossible!), but knowing about them can help you create a solution that mitigates any potential issues.