While parsing smpt logs I notice a bunch of strange data contained in my from/to/subject fields
Example:
“subject”:"=?utf-8?q?CBO_drops_the_March_base=E2=80=A6line?="
“subject”:"=?Windows-1252?Q?Automatic_reply:_CBO_drops_the_March_base=85line?=",
“from”:"\u0022NAMEOFPERSON\u0022 first.middle.last@something.com"
Why am I getting all of this extra info in these fields?
I am printing logs as JSON not CSV.
Thanks in advance
Jan
#2
Hi,
Why am I getting all of this extra info in these fields?
The subject headers seem to look that strange to support other encodings
than ASCII (see
https://en.wikipedia.org/wiki/Unicode_and_email#Unicode_support_in_message_header).
The from header seems to include the display-name (see
https://tools.ietf.org/html/rfc5322#section-3.4). As Bro logs the
content of the headers without further processing, you are getting this
extra info.
Regards,
Jan
Yep! There is a hacky script I wrote a while to deal with this stuff too (we need to integrate it into the analyzer at some point though)
https://github.com/sethhall/bro-junk-drawer/blob/master/smtp-decode-encoded-word-subjects.bro
If you load that script, it adds another field to smtp.log named "decoded_subject".
.Seth