script to extract elastic search mapping from header of bro-logs

Hello,

many of us use Elastic Search as a sink for bro-logs. I am thinking
about written a script to extract the correct mapping from the bro
header.

This would mean:
* mapping data types:
  string, addr, enum -> string
  int, count, port -> long
  interval, double -> double
  time -> epoch_millis
* setting 'not_analyzed' for types like addr where this makes no sense
* handle container types (table, set, vector)

Any ideas? Has anyone done this before?

Franky

Hi,

in case you are talking about importing a Bro ASCII log into the database
- I did something like that for Postgres once. My script automatically
created tables with the right types (including stuff like inet), and
converted sets and vectors to postgres arrays.

Source is at https://github.com/0xxon/bro-utils

Johanna

Hi,

ElasticSearch gets difficult, because there's a lot of context-specific
data that should be captured too, especially when it comes to indexing.
For example, I liked to index domain names with a reverse-path
tokenization on '.' as the delimeter, so that www.ncsa.illinois.edu will
show up in searches for "edu," "illinois.edu," "ncsa.illinois.edu," and
"www.ncsa.illinois.edu." Capturing this context can be very tricky, and
I don't think that it's currently available in the ASCII logs.

I'd be curious if anyone has thoughts on how to improve this.

  --Vlad

Frank Meier <franky.meier.1@gmx.de> writes: