ElasticSearch plugin

Is there anyone here relying on the elasticsearch writer plugin in the bro-plugins repository? It doesn't appear to work with current versions of elasticsearch anymore and it has always had trouble at sites with high rates of logging.

If we don't get much of a response on this we will be deprecating and/or removing the elasticsearch writer. There should be more reliable mechanisms available soon anyway by either writing to a Kafka server and then forwarding to ElasticSearch or writing files as JSON and the forwarding to ElasticSearch.

Thanks,
  .Seth

I never used it with Bro, However, I am really interested too.

Not I...straight up using rsyslog to pipe to Logstash.

James

I would be interested in this working, as it does not work with later versions of Elastic.

I use it a whole bunch, but it is quite clunky…

Part of me wishes bro would just write JSON to syslog, so that we could use the native rsyslog queuing and output modules (much more widely supported).

Any chance that could be easily implemented?

Cheers,

JB

You can tell bro to write to the json logs as usual, and then use rsyslog with the imfile module.

I hate sucking IOPs out of my boxes if i can help it… Is there no clean way to write directly to rsyslog? I can crank the allowable message size up fairly large, and then either write directly to a local file, or simply ship off box.

Writing to a file, only to immediately tail that file seems a bit clunky if you ask me, but what do I know :).

Thoughts?

Cheers,

JB

I believe supporting dots in field names again is on the fix list for Elastic Stack v5 which is currently in development, so that part might at least get fixed on the Elastic end. Technically I believe that fix is a plus even for folks using another plugin such as Kafka as those folks still potentially had to do something to rewrite the field names. I can't speak for anything else that might be broken in the plugin.

Here is the reference to that bit in the Elastic Blog: https://www.elastic.co/blog/elasticsearch-5-0-0-alpha3-released

I could still see cases where someone could have a low volume Bro + local ES where it might not be desirable or necessary to run a bunch of other stuff in between Bro and ES, but maybe it just isn't too big a deal then to just let Logstash do the work of reading local log files assuming that one is OK with still writing normal Bro log output in addition to ES. An example might be if I wanted to do something akin to a stand-alone Security Onion node, but with Bro and Elastic instead of Bro and ELSA.

I'm not using that functionality currently and will probably look at something like Kafka, but I already have a log volume where the overhead of running something like Kafka probably makes sense.

~Gary

I have bro output json, then use logstash to ship to redis where another set of logstash servers pull it out to process and insert into elasticsearch. One of the filters is to remove the dots so I can upgrade to elasticsearch 2. I plan to replace the first logstash with filebeats.

-Landy

What if you hate using logstash because liblognormalize is much, much faster than a regex engine, like logstash? I’d much prefer to simply get the data into rsyslog (which should be trivial), then use rsyslog queueing and batching which is much more flexible, IMO. Use RELP to reliably forward, send to kafka, do whatever, but once you’ve got it into rsyslog you’ve got pretty solid queueing which you can use, along with a whole host of output modules.

I’d just love to see a better pure syslog integration, without having to write to disks, then read from the disks. It starts potentially cause problems when you’re capturing 10Gb/s+.

Cheers,

JB

I've actually had instances where it "appeared" that Bro couldn't process conn events fast enough and Bro "appeared" to drop events further down the pipleline or outright crash under certain high stress situations. I suspected it was related to not being able to write the conn.log events to disk fast enough as I'd see a huge spike in calls to write the conn.log file and subsequent dips in the counters for all of the protocol based logs. I was writing the counters from the workers to an external stats server whenever a script called its specific write log event, so I could see the spikes and correlate them to drops in other areas, but I could never definitively prove that disk-IO for log writes were the bottle-neck as opposed to some other processing task as some of the system monitors on the master side would drop stats when it was too busy. Ultimately I largely addressed the causes of the processing spikes (DDOS attacks etc) upstream from Bro, but I could see the potential for wanting to directly forward events to an external location and not write them locally at all instead of trying to scale a single Bro master to handle writing hundreds of thousands or more events per second to disk.

~Gary

Seth Hall <seth@icir.org> writes:

Is there anyone here relying on the elasticsearch writer plugin in the
bro-plugins repository? It doesn't appear to work with current
versions of elasticsearch anymore and it has always had trouble at
sites with high rates of logging.

I think we should be a bit cautious here. Let's not forget that this is
really an ElasticSearch and NSQ writer. I've had very good success with
NSQ at high rate, so I don't really see much value to the second
argument.

If we don't get much of a response on this we will be deprecating
and/or removing the elasticsearch writer. There should be more
reliable mechanisms available soon anyway by either writing to a Kafka
server and then forwarding to ElasticSearch or writing files as JSON
and the forwarding to ElasticSearch.

Do we know what the specific problems are with new versions of
ElasticSearch? Since the writer is just writing out JSON, either it's
doing something that's not compatible (which I'd think would be an easy
fix), or there's an issue with the JSON writer, which would affect
people regardless of how they get their logs to ElasticSearch.

The only concrete issue I've heard of is 'no periods in field names',
which I believe there are fixes for here:

https://github.com/danielguerra69/bro-debian-elasticsearch/tree/master/bro-patch

I think the better solution would simply be to make the record separator
redef-able in the formatter. I can *maybe* see the argument for using
'.' instead of '$' in the ASCII logs, but since the other separators are
user-definable, I think this one should be as well.

As far as which is more reliable, I think that should be up to the users
to decide. Personally, I'd rather use NSQ for a number of reasons
(easier to setup and manage, latency is over an order of magnitude less
compared with Kafka, etc.), and there are issues with JSON output to the
disk as well (unnecessary IOPs as someone mentioned).

This is already out of the Bro source code; I see more benefits than
downsides to leaving it in the bro-plugins repo.

I do agree that a RELP writer would be a great addition, and then we
could just use their great collection of output modules:

http://www.rsyslog.com/doc/v8-stable/configuration/modules/idx_output.html

  --Vlad

So it’s settled then!! When will the RELP writer be done?!? :slight_smile:

Cheers,

JB

I know we talked about this at one point, I think the real fix is to log nested records natively in json.

The ascii writer needs to expand nested fields, but the json writer doesn't, so it can natively log a conn record as

{id: {orig_h: "1.2.3.4", orig_h: 123, resp_h: "5.6.7.8", resp_p: 456}, ... }

For what it’s worth, using the de_dot filter in logstash with the following config converts the fields to be nested, and didn’t even require any changes to any of my kibana queries or dashboards. Everything just worked. ElasticSearch is happy and I can upgrade to v2 now and nothing changed from the user’s point of view. All I did was tack this on the end of my filter config file on my logstash servers.

filter {
de_dot {
nested => true
}
}

Of course, I wouldn’t complain about bro just nesting correctly in JSON. :slight_smile:

-Landy

I think we should be a bit cautious here. Let's not forget that this is
really an ElasticSearch and NSQ writer. I've had very good success with
NSQ at high rate, so I don't really see much value to the second
argument.


Are you proposing that you'll take over responsibility for the module?

I think it would make sense to have a separate NSQ module too if you find value in that. That way if/when ES or NSQ specific tweaks (or other HTTP-based outputs) come into play we aren't creating a mess of various configuration options in a single module.

I think the better solution would simply be to make the record separator
redef-able in the formatter. I can *maybe* see the argument for using
'.' instead of '$' in the ASCII logs, but since the other separators are
user-definable, I think this one should be as well.

This already exists in topic/seth/log-framework-ext and hopefully will be getting merged soon along with some other logging framework changes I did recently.

  .Seth

Hah! Interesting.

I wanted to briefly thank everyone that has participated in this thread so far. It's really worthwhile to hear where people are struggling and see how everyone has addressed things for their own situation. We are still working on making it easier to do the sort of integration that everyone is working toward and should hopefully be addressing some of the pain points in the 2.5 release.

  .Seth

Hi All,

I have been playing with elastic for a while. It works well and
besides the dot there are a few script changes needed to
avoid name/type confusion. A few have been solved but
I use these changes in my docker image on this subject.
Mapping is also very important to make things work. After
this you are ready to dump. For the kibana config I used
elasticsearchdump (a alpine elasticdump). I preconfigured
kibana with searches, visualisations and dashboards.
In the ideal world, I would write to kafka combined with
an elastic-river for kafka. Graylog is implented like this.
But compiling the kafka plugin ends in complains, it needs
more time and reading installing etc. TODO ...
Currently I’m quite happy with my elastic combination,
it is way faster when there are no errors, and elastic does
a lot with the current git. Elastic is memory hungry and prefers
to run on 3 nodes.

Regards,

Daniel

For the details on docker check this (I had to split them because
of dockerhub compile time) .
#docker-compose
https://github.com/danielguerra69/bro-debian-elasticsearch/blob/master/docker-compose.yml

#docker image (check develop for your source experiments)
https://hub.docker.com/r/danielguerra/bro-debian-elasticsearch/

#preperations
https://github.com/danielguerra69/debian-bro-develop

#compiling bro
https://github.com/danielguerra69/bro-debian-elasticsearch

bro script changes<<<<<

RUN sed -i "s/version: count \&log/socks_version: count \&log/g" /usr/local/bro/share/bro/base/protocols/socks/main.bro
RUN sed -i "s/\$version=/\$socks_version=/g" /usr/local/bro/share/bro/base/protocols/socks/main.bro
RUN sed -i "s/version: string \&log/ssl_version: string \&log/g" /usr/local/bro/share/bro/base/protocols/ssl/main.bro
RUN sed -i "s/\$version=/\$ssl_version=/g" /usr/local/bro/share/bro/base/protocols/ssl/main.bro
RUN sed -i "s/version: count \&log/ssh_version: count \&log/g" /usr/local/bro/share/bro/base/protocols/ssh/main.bro
RUN sed -i "s/\$version =/\$ssh_version =/g" /usr/local/bro/share/bro/base/protocols/ssh/main.bro
RUN sed -i "s/version: string \&log/snmp_version: string \&log/g" /usr/local/bro/share/bro/base/protocols/snmp/main.bro
RUN sed -i "s/\$version=/\$snmp_version=/g" /usr/local/bro/share/bro/base/protocols/snmp/main.bro

mapping script <<<<<<<

#!/bin/bash
until curl -XGET elasticsearch:9200/; do
  >&2 echo "Elasticsearch is unavailable - sleeping"
  sleep 5
done

&2 echo "Elasticsearch is up - executing command"

curl -XPUT elasticsearch:9200/_template/fixstrings_bro -d '{
  "template": "bro-*",
    "index": {
      "number_of_shards": 7,
      "number_of_replicas": 1
    },
    "mappings" : {
      "http" : {
        "properties" : {
          "status_msg" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "user_agent" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "uri" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      },
        "conn" : {
          "properties" : {
            "orig_location" : {
              "type" : "geo_point"
            },
            "resp_location" : {
              "type" : "geo_point"
            }
          }
      },
      "files" : {
        "properties" : {
          "mime_type" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      },
      "location": {
        "properties" : {
          "ext_location" : {
            "type" : "geo_point"
          }
        }
      },
      "notice" : {
        "properties" : {
          "note" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      },
      "ssl" : {
        "properties" : {
          "validation_status" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "server_name" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      },
      "dns" : {
        "properties" : {
          "answers" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "query" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      },
      "intel" : {
        "properties" : {
          "sources" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "seen_indicator_type" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "seen_where" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      },
      "weird" : {
        "properties" : {
          "name" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "query" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      }
    }
  }'