Bro Log ingestion

Hello,

Requirement:
I’m trying to find the most efficient way to ingest all of Bro’s logs, where Bro is running on multiple servers, and get a single server/point of query/mining/reporting, etc. Servers are running Red Hat 6.5 and Bro 2.3 built from source with file extraction enabled (HTTP protocol for exe files). All Bro logs and extracted files seem to be by default owned by root:root, but I’d like to have them available to a non-root group once on the single server/point/interface to the analyst.

(My apologies if this has been covered, but I do not know where to search other than just ask or google it. )

Current setup
Red Hat is running fine, Bro 2.3 with file extraction is working fine. So no worries, I just need the best methodology to implement for ingesting all the Bro logs (and extracted files) to a single point for analysis/mining/querying/reporting etc.

Research
Looking around and doing some reading, I’ve found two possible solutions ELSA and LOGSTASH although I don’t know them very well and / or what their capabilities are either. But I’d like to know if they are viable, especially given my scenario, or if there is something better. Also, a how-to so I can set it up.

I look forward to your reply, thanks!

JW

You might want to skip on the Logstash piece and push the data directly to
ElasticSearch per [1] unless you have a specific requirement. From there
you could use Kibana [2] or whatever to interface with data stored in
ElasticSearch.

[1] https://www.bro.org/sphinx/frameworks/logging-elasticsearch.html
[2] http://www.elasticsearch.org/overview/kibana/

Thanks Steven, I’ll take a look at those.
I’m assuming my central point server would then need Apache with ElasticSearch and Kibana installed. I’m sure more questions will come as I start looking into this. Thanks again for the info!

Jonathon,

As a nit-pick, just because the files are owned by root, doesn’t mean they aren’t world-readable. J The absolute simplest solution to allow the logs to be viewable by non-root users is to scp them to a centralized server, but I’m guessing you want something a little fancier than that.

If you can do it, go with free Splunk. If you can afford it, go with paid Splunk.

Otherwise:

For log viewing with Elasticsearch Kibana works great, but, you could also check out Brownian: https://github.com/grigorescu/Brownian.

For log storage, if you want to consider something other than Elasticsearch, VAST is an option! https://github.com/mavam/vast There’s no GUI, so that might be a downer for you.

As far as Elasticsearch architecture goes, using Bro to write directly into Elasticsearch is definitely the easiest option. The only concern with this setup is that if Elasticsearch gets busy, nobody is happy. Elasticsearch has a tendency to drop writes when it is too occupied. This combined with the fact that (to the best of my knowledge) the Elasticsearch writer is a ‘send it and forget it’ could result in some hardship if you under build your Elasticsearch cluster or you undergo a period of unusually high utilization.

Seth has some interesting stuff using NSQ that he has written, but I’m not sure that it is technically ‘supported’. His NSQ stuff allows you to send the events to Elasticsearch at a rate that Elasticsearch is comfortable with.

Lastly, you could use the Logstash agent to send logs to a Redis server, which buffers the logs for additional Logstash agents to pull from and parse to insert into Elasticsearch. At the moment, I think that this is the most redundant setup. If you want as many logs to make it into Elasticsearch as possible while keeping the Bro side of things as simple as possible, this is likely the way to go. The downside is that this can require quite the large amount of infrastructure… and the only way to find out exactly how much your environment will need is to build it and see. It also requires that you keep up to date in knowledge on 3 pieces of software and how they interact…

Hopefully that helps at least a little!

-Stephen

I’m not sure it’s an option for you, but I’m using Splunk to ingest logs from multiple Bro sensors. It’s a great way to compliment the other data I have in Splunk and after creating some field extractions, it becomes really easy to search the data or create statistics of the data.

John Landers

I am using logstash.

I have Bro 2.3 running on a sensor and the logs are sent to a collector via syslog-ng. There, they are written to disk where they are read by logstash and sent to elasticsearch. I use logrotate to gzip these files once they get close to about a gig and keep them just in case ES craps out or I need to process them in other ways. I use squert (www.squertproject.org) to browse them once in ES but kibana would probably be a more versatile tool.

I process anywhere from 1800-2500 entries/second on a 8core box with 96GB ram running FreeBSD.

If you want to quickly PoC something take a look at securityonion (http://blog.securityonion.net/).

I’m not sure it’s an option for you, but I’m using Splunk to ingest logs from multiple Bro sensors. It’s a great way to compliment the other data I have in Splunk and after creating some field extractions, it becomes really easy to search the data or create statistics of the data.

John Landers

Jonathon,

As pointed out, a Redis solution may be an ideal open-source route, e.g. http://michael.bouvy.net/blog/en/2013/11/19/collect-visualize-your-logs-logstash-elasticsearch-redis-kibana/

Quite the responses, thanks!

Here are my thoughts.

I saw your post Doug, and on some of our projects we can use Security Onion w/Bro and ELSA, but in this case it must be a RHEL based solution. The solution Stephen R. demo’d with the Kibana approach [1] is pretty nice. But it brought an issue to my attention. It appears that Logstash needs to startup listening on a different port, 9292. I’m wondering if I missed something or why Kibana wouldn’t simply run as a plugin or additional module under apache on port 443. We are in a highly regulated network, and if I stand up an Apache server (where all the Bro logs are going to be placed), and the Apache server is listening on a non secure (!443) port such as 9292, then it causes flags to be thrown up everywhere and always kills my project. Additional thoughts on that?

Stephen H, not a nit-pick at all, great post! =) My method for moving the logs from all the seensors to a central collector at this point are still in the works. My best route is probably to use ‘rsync’. The problem I have right now is that Bro logs and extracted files have 600 permissions when they are created. The cause is simply the umask for root on the servers, which is set to 077. Since the servers are configured (correctly) to not allow SSH by root, then my rsync proposal also died since all the files are accessible by root only. Also, I’m unable to change the umask of root (regulations not know how) so short of creating an every minute chmod 644 cronjob, I’m scratching my head on how to get the logs over to the collector/ apache server.

You make an excellent point though " The downside is that this can require quite the large amount of infrastructure… and the only way to find out exactly how much your environment will need is to build it and see. It also requires that you keep up to date in knowledge on 3 pieces of software and how they interact…"
The knowledge and infrastructure count / increase is a large flag that will prohibit that endeavor (but great to know about).

Both you, John L., and Will H. indicate Splunk though as your solution which gives me another option. But I have the same “question about ingestion” =) How did you get the logs from the multiple sensors to the “ingestion / collector server”? Rsync, SCP, owner / permission issues? I’m interested for sure. But…the cost is a big no-no as well. As Will H. indicated the cost can go up based on usage, I do need a truly open-source free solution, so I am now leaning back to ElasticSearch / LogStash unless I missed something.

Paul H. , you get to use FreeBSD… … Man do I miss FreeBSD! Give me packages or give me death, haha. Ever since we were forced to use RHEL I miss it more and more! But to your comments, this sentence really caught my attention: “…the logs are sent to a collector via syslog-ng…” Then you said “There, they are written to disk where they are read by logstash and sent to elasticsearch”. Since I’m leaning in the Logstash / ElasticSearch method, based on above thoughts, can you share a bit more on how you set up the syslog-ng, logstash, elasticsearch? That seems to be really close to meeting my requirement. I’m assuming you installed them as source and set them in the rc.conf to enabled YES to startup on boot. I’m more interested in the details of the conf files on with what arguments the daemons start up and especially how you were able to get the syslog-ng piece working between the sensor and the collector.

[1] http://www.appliednsm.com/parsing-bro-logs-with-logstash/

Thanks again to all, this is great stuff.

JW

Quite the responses, thanks!

Here are my thoughts.

I saw your post Doug, and on some of our projects we can use Security
Onion w/Bro and ELSA, but in this case it must be a RHEL based
solution. The solution Stephen R. demod with the Kibana approach [1]
is pretty nice. But it brought an issue to my attention. It appears
that Logstash needs to startup listening on a different port, 9292. Im
wondering if I missed something or why Kibana wouldnt simply run as a
plugin or additional module under apache on port 443. We are in
a highly regulated network, and if I stand up an Apache server
(where all the Bro logs are going to be placed), and the Apache server
is listening on a non secure (!443) port such as 9292, then it
causes flags to be thrown up everywhere and always kills my project.
Additional thoughts on that?

To set this straight...logstash itself doesn't listen on any port unless configured to do so. The Elasticsearch engine behind it does, you'd need to have the backend Elasticsearch engine able to listen on port 9200, and your client workstation will need to be able to connect to it on that port. As for Kibana, it works just fine with any current Apache install.

Stephen H, not a nit-pick at all, great post! =) My method for moving
the logs from all the seensors to a central collector at this point
are still in the works. My best route is probably to use rsync. The
problem I have right now is that Bro logs and extracted files have 600
permissions when they are created. The cause is simply the umask for
root on the servers, which is set to 077. Since the servers are
configured (correctly) to not allow SSH by root, then my rsync
proposal also died since all the files are accessible by root only.
Also, Im unable to change the umask of root (regulations not know
how) so short of creating an every minute chmod 644 cronjob, Im
scratching my head on how to get the logs over to the collector/
apache server.

Rsyslog on my sensors have been excellent to pipe to a listening Logstash instance (high ports mean I can run as standard user). Conversely, you can use a cheesy hack of "sudo /usr/bin/tail -f conn.log

logger -d -n remote.syslog.ip -P <logstash port> -u /tmp/ignored".

This worked as I was getting my rsyslog instance able to read the conn.log file. Since rsyslog is running as root it's able to read the bro files.

  

You make an excellent point though " The downside is that this can
require quite the large amount of infrastructure… and the only way
to find out exactly how much your environment will need is to build it
and see. It also requires that you keep up to date in knowledge on 3
pieces of software and how they interact…"
The knowledge and infrastructure count / increase is a large
flag that will prohibit that endeavor (but great to know about).

Both you, John L., and Will H. indicate Splunk though as your
solution which gives me another option. But I have the same
"question about ingestion" =) How did you get the logs from the
multiple sensors to the "ingestion / collector server"? Rsync, SCP,
owner / permission issues? Im interested for sure. But.....the cost is
a big no-no as well. As Will H. indicated the cost can go up based on
usage, I do need a truly open-source free solution, so I am now
leaning back to ElasticSearch / LogStash unless I missed something.

Paul H. , you get to use FreeBSD... <drool>... Man do I miss FreeBSD!
Give me packages or give me death, haha. Ever since we were forced to
use RHEL I miss it more and more! But to your comments, this sentence
really caught my attention: "...the logs are sent to a collector via
syslog-ng.." Then you said "There, they are written to disk where they
are read by logstash and sent to elasticsearch". Since Im leaning in
the Logstash / ElasticSearch method, based on above thoughts, can you
share a bit more on how you set up the syslog-ng, logstash,
elasticsearch? That seems to be really close to meeting my
requirement. Im assuming you installed them as source and set them in
the rc.conf to enabled YES to startup on boot. Im more interested in
the details of the conf files on with what arguments the daemons
start up and especially how you were able to get the syslog-ng piece
working between the sensor and the collector.

[1] http://www.appliednsm.com/parsing-bro-logs-with-logstash/ [7]

orig_bytes, orig_ip_bytes, resp_bytes, and resp_ip_bytes using the logstash entries from the above link are not treated as integers, so you'll need this in your filter entry in your logstash.conf:

mutate {
        convert => [ "resp_bytes", "integer" ]
        convert => [ "resp_ip_bytes", "integer" ]
        convert => [ "orig_bytes", "integer" ]
        convert => [ "orig_ip_bytes", "integer" ]
}

Let me know if you need any assistance...I have a full working complete set up of a single backend host running logstash/elasticsearch/kibana, with a syslog server piping firewall hits to it, and an IDS piping Bro's conn log, and snort IDS logs to it.

James

As it relates to Splunk, you can consume the data in a number of ways. I use a universal forwarder – agent on the box – and configure it to monitor the logs I want to consume (conn.log, dns.log, files.log, etc.) in the Bro “current” working directory.

So, as Bro logs it to file, it gets replicated to the Splunk indexer by the agent. Once the file roles, I don’t care anymore. Though if you wanted to ingest old logs, that would pretty easy to accomplish as well. (Just reference splunk documentation on the inputs.conf config file.)

John Landers

We also feed our Bro logs into Splunk and have been pretty happy with that. We have a pretty good idea of what our daily volume looks like, and have been able to plan comfortably around that. We’ve only been bitten by unusually large spikes in volume once or twice in the couple of years that we’ve been Splunking our data.

Excellent information James. Thanks also for the vote of confidence too John, but you guys are making it harder, haha. It seems I need more information to determine the best course as the opinions are varied over using Splunk or LogStash.

James, couple questions on your post.

So if I understand correctly, ElasticSearch is what listens (as a Virtu Apache module I’m assuming?), LogStash merely feeds ElasticSearch the logs. Getting logs to the server that is running LogStash and ElasticSearch is where Rsyslog-vs-Splunk-vs-whatever else comes into play…correct?

You indicated “Rsyslog on my sensors have been excellent to pipe to a listening Logstash instance (high ports mean I can run as standard user).” Does this mean you have LogStash listening on a high port where rsyslog connects too? If so, this would be a problem for me. In my over regulated environment, the logs have to be transferred on a low port, preferrably on a known standard port (such as ssh/22), and the logs must be transferred on an encrypted channel. This is the main reason I initially wanted to use rsync, which uses ssh, encrypts the connection, and obviously runs on a known/standard low port, 22. The problem being that rysnc runs with permissions of the thread owner, in this case a non-root user. And since root is not allowed to SSH into a box, I cannot use rsync. So… can you elaborate a bit more on what ports you are using (or is it random high ports), and if its encrypted, or if you have any other thoughts on how I can solve the movement of the Bro logs in a secure manner?

Once I have a good solution for getting the Bro logs over to the collector/apache server, I’d be real excited to discuss some more details about logstash.conf and configuring it to feed ElasticSearch.

Any additional thoughts from the group are welcome, thanks again for the assistance thus far!

Ok…so elasticsearch is it’s own, self contained application, as is Logstash. Logstash formats the data to go into elasticsearch (or bro can go direct). Kibana is the web front end that you to get to the elasticsearch backend.

This link http://logstash.net/docs/1.4.2/ contains all the inputs, codecs, filters, and outputs that logstash supports. I’ve been the udp input for going to the remote logstash/elasticsearch server. I chose to have Logstash listening on a high port, but you can just as easily choose something else. I am not sure on encryption, so you’ll want to look through that list.

Also, you can test all of this in one shot by downloading the logstash tar.gz, starting it with the embedded elasticsearch, then start the web agent…it’s pretty cool for testing.

James

If you need a protocol that plays well with Logstash, can operate on a custom ‘low port’ (sigh…), and also supports encryption, you have described lumberjack. Lumberjack is supported by Logstash Agent (when you run full blown logstash on the Bro boxes as forwarders) or a smaller, more slim application called Logstash-Forwarder. You’re going to limit yourself a lot by using that configuration though. First of all, you’re going to need to have a very good working knowledge of TLS. The Logstash-Forwarder and Agent are both very picky about their TLS and requests to have options allowing for users to circumvent some of the checks have been largely met with opposition.

As far as getting Logstash to connect to your elasticsearch cluster, the documentation is very clear. Depending on which version of elasticsearch you’re using, the steps will vary. If you want some help configuring it, I could help you offline… seems like a pretty big distraction for me to send Logstash and Elasticsearch configuration tips on this mailing list J