Bro's escaping of non-printable characters behaves unexpected

Paul_Pearce · February 18, 2015, 12:20am

Hello everyone,

I'm encountering a problem where I am unable to reconstruct original
inputs from bro log files. This example summarizes the problem:

Paul_Pearce · February 18, 2015, 1:15am

Hello,

That was a poor example, as it used \0 which is special cased by the
bro escape functionality.

This problem also extends beyond non-printable to non-ascii (unicode)
characters. Here's another example with a unicode character for the
registered sign ® (\xc2\xae).

johanna · February 18, 2015, 2:12am

Hello Paul,

I think the reason that the ascii writer of the logging framework of Bro
does not support arbitrary binary data is, that it was conceived as a
framework for writing human-readable log files, not arbitrary binary data.

If you want to write binary data to log files, I would recommend just
base64-encoding it before using the encode_base64 bif.

If you are ok with just using the standard methods for writing to files
outside of the logging framework, you can put them into binary mode, as
you probably are aware.

Johanna

Paul_Pearce · February 18, 2015, 5:15am

Hey Johanna,

Thanks for taking the time to respond.

I think the reason that the ascii writer of the logging framework of Bro
does not support arbitrary binary data is, that it was conceived as a
framework for writing human-readable log files, not arbitrary binary data.

I'm going to push back a bit on characterizing this as supporting
arbitrary binary data. These are unicode characters appearing in URIs
($http$URI) that I'm encountering in actual network traffic. I'm
actually encountering them somewhat frequently. The problem manifests
itself in the standard http.log, as well as the extensions I'm working
on.

I realize the RFC does not permit unicode in URLs, but given that they
do occur in practice (browsers will just silently handle them), this
seems like something worth supporting.

I'll also point out that Bro's ascii logging facilities do currently
support logging these characters, they simply do so in an
unrecoverable/non-canonical way. What I'm proposing is
standardization/cleanup for the escaping that Bro already performs.

Thanks.
-Paul

Christian_Struck · February 18, 2015, 1:22pm

Hey Paul,

I realize the RFC does not permit unicode in URLs, but given that they
do occur in practice (browsers will just silently handle them), this
seems like something worth supporting.

I think what you are looking for is this.

Thanks.
-Paul

Best

Christian

Topic		Replies	Views
(no subject) Zeek	2	110	May 6, 2022
http-body and binary content Zeek	4	82	May 6, 2022
How to convert name field in smb_files.log to "readable" string? Zeek	5	107	May 6, 2022
Writing logs to both ACII and JSON Zeek	11	109	May 6, 2022
Encrypting bro logs before storing to disk Zeek	4	73	May 6, 2022

Bro's escaping of non-printable characters behaves unexpected

Related topics