I would appreciate recommendations for a DB server that is most suited for ingesting and digesting Bro logs.
I know of some use cases involving splunk and the Splunk Bro app, but price and performance wise (10GBps incoming traffic) it does not seem to be the best solution out there.
Does anyone have any experience with Bro and ElasticSearch | Redis | MySQL ?
I am looking into different solutions and would appreciate your thoughts.
Elastic Search is fantastic. Very good displaying of information, and the newest version has alerts, some graph analysis and basic machine learning. Let me know if you need help getting started.
I have done some proof of concept work with PostgreSQL (mostly in AWS RDS) and have been very happy with the results so far. Of course the rub is you need to set up the schema, but it is pretty straightforward to ingest after that from the JSON.
What I’ve done is load JSON into a text field of a temp table, then cast that as JSON on insert (there was a little trick to getting this right that I don’t recall off the top of my head). My load process is currently out of service but I can try to look up my code for this if you need it.
You could do a document database that would handle the JSON gracefully, but then you’re constantly paying the parse tax. Works great if you don’t actually want to use your data, though.
If you use standard bro text files you’ve got more parsing to do but it’s certainly doable. I like having JSON bro output to avoid that heavy lifting.
I’ve put bro data in Solr,.ElasticSearch, HDFS, Splunk, and Mongodb with success but for different use cases. What are you looking to do with the data?
The Apache Metron project supports bro logs natively and can index in hdfs, solr, or elasticsearch. If you don’t want to buy into the entire project (a bit of a heavy lift if you don’t already run Ambari and Hadoop or aren’t interested in security data analytics) there may be reusable components that are helpful. Let me know if you’re interested in digging in and I can help. A part of this project is the kafka writer plugin, used as a buffer between bro and an indexed store. https://packages.bro.org/packages/view/7388aa77-4fb7-11e8-88be-0a645a3f3086
This isn’t meant to be a commercial, I’ve heard great things about bro data going into Postgres and redis as well.
I like kibana as frontend. So the choice would be elastic. I switched to elassandra. Elastic is way to slow for bro. With a file buffer or a broker like Kafka all goes well.If you use elastic, split the bro types e.g. Conn ssl etc. This is to avoid mapping collisions.
MySQL is a great database, consider timebased databases, because after 100mil records the performance goes down.
RE: Clark, not to hijack the thread but that isn’t true. Assuming you’re referring to the note in the plugin that says “Metron currently doesn’t support IPv6 source or destination IPs in the default enrichments” this just means there isn’t a built-in example enrichment that supports IPv6. The platform itself has full IPv6 support end to end without issue, I have been doing it for years. If you want to chat more on this we should talk elsewhere.
Thank you for the clarification! That’s a great reason to “hijack” the convo.
I admit I was rather shocked to read that and I’m glad to hear I was mistaken. I have been intrigued by Metron and would like to take a closer look. Alas the Hadoop requirement is a pretty heavy lift - it would be my only actual Hadoop requirement (in a peta-scale data analytics environment, btw…) Though in fairness I’d like to have a legitimate reason to do more than play with Hadoop and conclude it doesn’t fit our other use cases.
I’ve had some success using Graylog. I send BRO logs via rsyslog to a Graylog collector and utilize pipeline processing rules in Graylog for message enrichment. https://github.com/alias454/graylog-bro-content-pack.
I’ve decided to simultaneously deploy several solutions with the same traffic and benchmark them in retrospect.
Candidates are oracle db, elk and splunk.
Since no writer exists for all of the above DB’s, I will use the kafka writer and use kafka queue as a middle man for each of the database consumers.
I will update when results are in.
Feel free to respond with any further insights