Empty log fields

We said we'd use "na" for unset record fields instead of "-", but now
I'm pondering over what version of that *exactly*:

    (1) NA
    (2) N/A
    (3) na
    (4) n/a

I'm voting for (4). Other opinions?

Robin

(1) NA
(2) N/A
(3) na
(4) n/a

I'm voting for (1).

(Sorry Robin :slight_smile:

    Matthias

I vote 2. :slight_smile: isn't democracy grand?

I think I like #4 best too.

  .Seth

Haha, now that we're getting a nice spread of opinions on this, I'll toss a couple more in the pot…

  5. nil
  6. null

Since this value really represents a value not being set, nil/null would make sense.

  .Seth

$\{\phi\}$ for the empty set

The correct LaTeX symbol for the empty set is actually $\emptyset$. Let's have
UTF-8 logs so that we can use "∅" (U+2205) for the empty set ;-).

SCNR,

    Matthias

We could also base64 encode a transparent 1x1 PNG image and insert
that there ...

Robin

I propose a haiku:

the data
given packet form
never here

More seriously: n/a would break a scheme that used '/' as a separator character, and would be a little tougher to deal wit

*, and would be a little tougher to deal with than null / nil / other character strings, I believe.

--Gilbert

Ok, let's do it the way any good democracy deals with different
opinions[1]: we don't do any the original suggestions.

I can live with Seth's "nil" as well. Deal?

Robin

[1] No, that's *not* filibustering.

I can live with Seth's "nil" as well. Deal?

Deal.

    Matthias

I can live with Seth's "nil" as well. Deal?

I guess I can live with that but I wish that ascii included Matthias' suggestion of the null character. Weirdly, I think that would be best since empty string and no value would both be represented by single a single byte. nil it is.

[1] No, that's *not* filibustering.

Yeah, congress doesn't filibuster either. They only need to *threaten* that they will.

  .Seth

Sorry for going back to this another time, but I just made the change
and doesn't really like the result. We now have tons of "nil" in there
just because there are so many non-set fields. "-" looks much better.

The here's another suggestion: let's set empty fields simply to
"(empty)". How about that? That looks much better and empty fields are
much less frequent than unset fields.

Robin

The here's another suggestion: let's set empty fields simply to
"(empty)". How about that?

I like it because it is self-descriptive, but isn't that a little
verbose? I don't have really compelling alternatives though, maybe
"(0)", "()", or "(|)" as generic empty set representation?

    Matthias

It's also for empty strings, and I don't find any of the very
intuitive I have to say.

Robin

It's also for empty strings, and I don't find any of the very
intuitive I have to say.

Yeah, I agree. It's hard to find one that is expressive and terse! One
more try though :-). What about (") ?

    Matthias

How about (-) (set of empty)? Would be kind of logical in my opinion (admittedly only for sets/vectors and not for strings).

Johanna

I’d really prefer that it be left at a single hyphen, as it cuts down on log size. It’s also a convention that a ton of other programs use. The only acceptable alternative to me would be totally empty field as it still parsable because it’s between the delimiters. You guys are debating what the visual output of the log files should be by manipulating the raw output when you’re really debating how programs like bro-cut should output empty fields. For me, the logs are database data, and it would be silly to write out “nil” in a database, (the DB will understand the lack of data to be NULL). You want the logs to be a data model, and how they are presented to an end user should be dictated by the accessing program (view).

What you motivate is precisely the need for binary logs, which aim to
ship with 2.1. This address both the log size and representation issues,
as null values are a NUL byte and empty values their type-specific
binary equivalent.

Clearly, it makes much more sense to use the binary log format when
sending them to a database. Going further, one would create a custom
database backend that writes the logs directly from the Bro process to
the database, without the intermediate step of serializing them to the
binary format. In 2.1, we have a CouchDB backend that demonstrates this
usage.

Unfortunately, for ASCII logs there is a trade-off between clarity and
conciseness. While omitting the null/empty representation entirely is
the most space-efficient way to go, it may break text-based tools that
expect a strictly columnar format and have no notion of field separator.
Moreover, if a user needs to separate the cases of null (no value there)
vs. empty (e.g., the empty string ""), we need two separate
representations. Some users may also find an explicit clue about missing
values less confusing.

I propose something new: in addition to allowing the field separator to
be customized, we allow similar redefinitions for null and empty values.
By default, they are the same character, namely the dash, but can be
easily redef'ed.

    Matthias