Get the license usage down in Splunk when indexing Bro logs

Hi all,

We’re currently working on deploying Bro sensors to various offices and I’ve come to realise that the Bro logs are quite ‘expensive’ when it comes to Splunk licenses. To say the least.

We have discussed various solutions but most of them fall down on us losing the ability to correlate events unless we shift all the logs in to Splunk.

At the moment we’re running it pretty much ‘out of the box’ so we can save some GB’s per day to turn of certain scripts, but it will probably not be enough.

Someone mentioned that turning on JSON logging instead of the standard logging on Bro could save considerable amounts of space on your SIEM. Have any of you guys tested this and can you back that statement up?

I was hoping that someone else had encountered this before and had come up with some solution(s) to this issue?

Thanks in advance, Mike

The JSON logs will always be larger than the default tab delimited. With JSON every log event includes the "column" names versus a set of headers in he delimited format.

Well if you are collecting net flows and conn.logs, you could get rid of one. If you are recapturing syslogs with Bro and sending them to Splunk, you could trim duplication there. If you are finding a ton of certificate information in your bro logs, you might realize some cost savings there. But I don’t have much advice beyond don’t send the same info twice and look for large amounts of data that you don’t really use in Splunk, like maybe certificates.

Some people choose to implement Bro log filters that can result in a significant reduction in log volume. For example, if you filter all S0 connections originating from outside of your organization (and you also happen to listen outside of a firewall) this could reduce a substantial amount of log volume.

In order to reduce logging load for some of our logging plugins we’ve applied filters that do things like drop S0 connections from conn, only send traffic to/from the Internet, or only send traffic to/from sensitive zones. I have an example of one of our configs here (the example also filters out IPv6 in a way that works for pre-2.5 - now there are is_v{4,6}_subnet() functions to handle this).


I think I know where Mike’s misunderstanding comes from - the JSON logs are larger original size (for license volume), but will use less space on indexer disk than the default TSV because the extractions are search-time instead of index-time.

To build on some of the suggestions, you may also want to tier your logging. You can send high value logs which can kick off an investigation to Splunk while sending less valuable logs which will help in investigations but not start them to something like ELK or ELSA / ODE. This will add a little overhead to your operations and involves more moving parts but can significantly reduce your licensing costs.


Many thanks for your reply - that looks really useful. Many thanks for your examples, I’ll have a look at them.

Cheers, Mike


Many thanks for your reply.

That’s the one! :wink: It will not make much use for our license usage - but will save space on the disk then.

Cheers, Mike


I’m starting to lean towards this as being the next step to look into once I’ve looked at the suggestions from the list.

More suggestions are of course very welcome :wink:

Cheers, Mike


I’ve been looking at the various ways of dropping the S0 connections from the outside but haven’t found a simple way of doing this. My Google-Fu seems to have abandoned me, hopefully just temporarily.

Would you mind sharing some examples on how you and your organisation implemented this?

Cheers, Mike