I've been thinking about how to handle this for a while. The data that is being written into the log is technically already UTF-8, it's just that non-ascii bytes are escaped.
I think we can deal with this by making a switch for the logs to make them "UTF-8". It would incur a bit of overhead because each string would have to be scanned for valid UTF-8 characters before being written and then only non-valid bytes would be escaped.
Does the json log writer make this simpler for users? I think bro writes out valid json for this,
so any json parser should give you proper UTF-8 strings.
It writes out valid JSON but strings aren't handled as well as they could. It's why I was saying that non-ascii bytes are escaped according to the json spec, but that has other problems.
I've been thinking about how to handle this for a while. The data that
is being written into the log is technically already UTF-8, it's just
that non-ascii bytes are escaped.
I think we can deal with this by making a switch for the logs to make
them "UTF-8". It would incur a bit of overhead because each string
would have to be scanned for valid UTF-8 characters before being written
and then only non-valid bytes would be escaped.
.Seth
I see..
So, I need to write non-ascii bytes that are escaped to utf-8.
I want to make the logs to be readable even if it would make a bit overhead.
Is there some sample bro script to do it?
It's hard to do it because I'm newbie about bro script.