smtp log strangeness

While parsing smpt logs I notice a bunch of strange data contained in my from/to/subject fields

Example:

“subject”:"=?utf-8?q?CBO_drops_the_March_base=E2=80=A6line?="

“subject”:"=?Windows-1252?Q?Automatic_reply:_CBO_drops_the_March_base=85line?=",

“from”:"\u0022NAMEOFPERSON\u0022 first.middle.last@something.com"

Why am I getting all of this extra info in these fields?

I am printing logs as JSON not CSV.

Thanks in advance

Hi,

Why am I getting all of this extra info in these fields?

The subject headers seem to look that strange to support other encodings
than ASCII (see
Unicode and email - Wikipedia).
The from header seems to include the display-name (see
RFC 5322 - Internet Message Format). As Bro logs the
content of the headers without further processing, you are getting this
extra info.

Regards,
Jan

Yep! There is a hacky script I wrote a while to deal with this stuff too (we need to integrate it into the analyzer at some point though)
  https://github.com/sethhall/bro-junk-drawer/blob/master/smtp-decode-encoded-word-subjects.bro

If you load that script, it adds another field to smtp.log named "decoded_subject".

  .Seth