NSQ plugin getting deprecated in 2.5

I saw a notice in the 2.5 release notes and I read through the June ’16 conversation about the elasticsearch plugin. I wanted to add my $0.02. For people whom are trying to analyze large traffic flows it becomes imperative to not rely on the disk subsystem for transport. Our current flow looks like:

Bro -> NSQ -> Logstash-> ElasticSearch

We tried to use the Redis plugin first but it was not built in a way that makes it possible to use with Logstash (I have two or three open issues on github). Moving to NSQ was the only way we could really deploy the service. I’m open to switching to a different messaging broker, but I think it is a bit over-ambitious to deprecate a plugin that works perfectly well (for NSQ at least) without having a viable alternative (RELP, a better Redis plugin, a dedicated NSQ plugin).

Thanks
- Munroe

Hi Munroe,

Too bad its deprecate. There is a running docker example

https://hub.docker.com/r/danielguerra/bro-debian-elasticsearch/

In the new repo the best way to it would be using the kafka plugin.
From kafka you can use an elasticsearch river.

Regards,

Daniel

You make it sound like it being deprecated has more meaning than someone decided to label it as such.

- Munroe

I don't know enough about NSQ/ElasticSearch to say much about the
quality of the plugin. Is there a consensus that it works fine with
NSQ, but not with ElasticSearch? The older thread seems to suggest
that. Note, the problem with the record field separators is addressed
by now, Bro 2.5 comes with this new option:
https://www.bro.org/sphinx-git/scripts/base/frameworks/logging/main.bro.html?highlight=log%3A%3Adefault_scope_sep#id-Log::default_scope_sep

I'm wondering if there's anybody who'd be interested in taking over
ownership of the plugin. We are planing to move bro-plugins/* into
separately distributed Bro packages anyways, using the new Bro package
manager. If somebody wanted to take ownership of the plugin that way,
they could just starting maintaining a package for it. An option could
also be turning it into a NSQ-only plugin?

Robin

I was thinking the same thing.

I think the issues with the elasticsearch plugin was that bro -> remote ES never worked well in practice and users were better off using something else to get logs into ES.

But bro -> local NSQ was rock solid for the people that used it.

Also, another thing to keep in mind is that there are only a few lines in the entire plugin that are actually specific to NSQ, with a few strings moved into options it could possibly be turned into a generic http/json log writer.

I love kibana as a frontend and elasticsearch. In 2.4.1 it
worked fine with proportional elastic power and nginx.
When elastic is underpowered for the speed you are
creating logs its obvious to have problems (timeout)
The efficiency of elasticsearch is not so well… java.

I think its better to focus on kafka and use an elastic river.
This is how graylog works and that goes pretty well.

It would be handy to have an unique-name/type combination,
it doesn’t matter in what kind of log.
(ssh.log version, string http.log version,integer)
Every collision in this causes lots of logging in elastic.
A script sanity check would be great for this.

Maybe elastic wants to be owner of this plugin ?
Splunk also provides a bro plugin + config.

Daniel Guerra

I can try to summarize the current status of the plugin, to give this
discussion some additional context:

* A large stream of log output to NSQ is working. I was pushing about 1
   billion log lines/day for months with no issues.

* ElasticSearch output stopped working with ElasticSearch version 2.0,
   since they changed the delimiter rules. However, this should be
   fixable with the change Seth introduced for 2.5 (and perhaps we
   should update that to be the default?)

* A medium to large stream of log output to ElasticSearch requires a
   lot of tuning and I think is still problematic. I think memory slowly
   creeps up in most cases (ElasticSearch starts garbage-collecting, and
   stops responding for a while). I haven't done work with ElasticSearch
   2.0 to see how that affects this. Perhaps splitting out the logger
   node will help with this? I'm not sure.

* I think that the main issue that Seth was referencing is that the log
   writer doesn't check the response code from NSQ or ElasticSearch. If
   the server responds with a 500 or other error code, it might make
   sense to retry sending the messages a couple of times? Right now,
   they just get dropped, so this can be a lossy log writer.

So, I'm a bit hesitant to deprecate this, since I think it still works
for NSQ, and it still works (in some cases) for ElasticSearch.
Ironically, it works better for ElasticSearch in 2.5 than it would for
2.4.1, since the delimiter configuration option was introduced.

That being said, I'm also hesitant to take this on myself, simply
because we don't have an ElasticSearch cluster at NCSA.

I think it makes sense to generalize this as an HTTP/JSON log writer,
but we still need to tackle the question of what we do with messages
that fail to be delivered.

Generalizing it might be a bit tricky. For example, ElasticSearch needs
to post to http://1.2.3.4:9000/$log_name, while NSQ needs to add a
line containing the log_name before each log line.

  --Vlad

"Azoff, Justin S" <jazoff@illinois.edu> writes:

That wasn't really NSQ that required that, it was whatever was pulling the records out of NSQ and pushing them into ES that wanted that.

I think the new logging ext stuff that was added for kafka would make that extra record redundant now.

You're right, that could be skipped, but you run into the issue of having only a single queue which could cause trouble if one log type is overwhelming everything else.

  .Seth

For NSQ the destination queue is part the url that is POSTed to and can still be per log stream.

The plugin currently sends it all to one queue, but it could work the same as the kafka plugin does with one queue per log stream.

For NSQ the destination queue is part the url that is POSTed to and can still be per log stream.

Yep, that was Vlad's point about that being added to the URL when sending to NSQ. :slight_smile:

The plugin currently sends it all to one queue, but it could work the same as the kafka plugin does with one queue per log stream.

I think what makes the most sense here would be to fork off the ElasticSearch plugin and create an NSQ specific plugin. If someone wanted to go crazy with options, I could imagine even making a generic HTTP writer plugin as you suggested earlier. I suspect that it would be quite hard to get that right for any number of different HTTP endpoints. It probably makes more sense to just tailor for whatever is receiving logs on the other end.

  .Seth