problem ingesting bro json logs into splunk

We are getting a spurious sourcetype when ingesting bro json logs into splunk.

Specifically, we are getting a sourcetype of bro_00. There is no log file named this, and the splunkforwarder is just pushing the raw logs for indexing into splunk. There is no massaging of the log data. Anyone know why this sourcetype is popping up?

Do you have the Splunk installed? (https://splunkbase.splunk.com/app/1617/)

The TA will dynamically create sourcetypes based on the log name.

Dynamic source typing based on log filename

Match: conn.log, bro.conn.log,

md5.bro.conn.log, whatever.conn.log

[BroAutoType]
DEST_KEY = MetaData:Sourcetype
SOURCE_KEY = MetaData:Source
REGEX = ([a-zA-Z0-9-]+)(?:.[0-9-])?(?:.[0-9:-])?.log
FORMAT = sourcetype::bro_$1
WRITE_META = true

There are no 00.log files in Bro, so the automatic generation of the sourcetype bro_00 makes no sense. It does not follow the standard sourcetype pinning that all the other log files generate. find . -name “00*” in the parent logs directory reports zero logs of this type. This only occured when we moved off of Bro standard log format to JSON format.

Do you have the Splunk installed? (https://splunkbase.splunk.com/app/1617/)

The TA will dynamically create sourcetypes based on the log name.

Dynamic source typing based on log filename

Match: conn.log, bro.conn.log,

md5.bro.conn.log, whatever.conn.log

[BroAutoType]
DEST_KEY = MetaData:Sourcetype
SOURCE_KEY = MetaData:Source
REGEX = ([a-zA-Z0-9-]+)(?:.[0-9-])?(?:.[0-9:-])?.log
FORMAT = sourcetype::bro_$1
WRITE_META = true

We’ve used bro and splunk at our organization for a couple years now. We utilize the Splunk props and transforms configs to ingest the bro log in the format we want or with the additional attributes and aliases.

transforms.conf:

[remove_hash_comments]

REGEX = ^#.*

DEST_KEY = queue

FORMAT = nullQueue

[bro_conn_extractions]

DELIMS = “\t”

FIELDS = ts,uid,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,service,duration,orig_bytes,resp_bytes,conn_state,local_orig,local_resp,missed_bytes,history,orig_pkts,orig_ip_bytes,resp_pkts,resp_ip_bytes,tunnel_parents,orig_cc,resp_cc,sensorname

Props.conf:

[bro_conn]

REPORT-bro_conn_extract = bro_conn_extractions

TRANSFORMS-sourcetype = remove_hash_comments

SHOULD_LINEMERGE = false

TRUNCATE = 0

KV_MODE = none

MAX_TIMESTAMP_LOOKAHEAD = 20

TIME_FORMAT = %s.%6N

Inputs.conf:

[monitor:///your_log_path/log/bro/conn.log]

index = bro_conn

sourcetype = bro_conn

_TCP_ROUTING = primary_indexers

brett

Ah, that's for the tab delimited logs, not the json logs though. I actually did it that way for years, I even have a python program that helps you generate the config:

https://github.com/JustinAzoff/bro_scripts/blob/2.0/generate_splunk_configs.py

But, I wouldn't use this method - the splunk TA app for bro is better.

As far as I know the transforms/props method only does the field lookups at search time, not at index time like the TA app configures.

Whenever the bro logs change and a column is added or removed, all those search time field lookups break.

The Bro TA is assuming TSV extractions. The move to JSON probably is causing the Splunk auto-sourcetyper to do some funky things.

[source::…bro..log]
SHOULD_LINEMERGE = false
TRUNCATE = 0
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s.%6N
TRANSFORMS-BroAutoType = BroAutoType, TrashComments
INDEXED_EXTRACTIONS = TSV
FIELD_HEADER_REGEX = ^#fields\t(.
)
FIELD_DELIMITER = \t
FIELD_QUOTE = \t

Sorry hope I’m not hijacking- quick question very closely related to this…is the Splunk app for Bro that Brandon linked to here supposed to parse out all the various bro 2.4.1 log types’ fields correctly?
In other words, is the latest version of the Splunk app fro Bro/TA supposed to work properly for parsing out Bro log fields with they way the log fields/columns etc. are now in Bro 2.4.1? I think the Splunk Add-on for Bro IDS was written for Bro 2.1 or 2.2…do changes that were made in subsequent versions of Bro such as 2.4.1 break the fields being parsed out in Splunk when using this Splunk Add-on for Bro/Bro TA in Splunkbase? Or does Splunk need to update the add-on to work properly with Bro 2.4.1?

Thank you,

-Drew

The problem with the Spunk app is that indexing is occuring at time of ingest. This causes the indices of the Bro data to grow extremely fast. Using json and not the Bro app means that the data is indexed by Splunk, resulting in far smaller indices on the splunk indexing servers. This is specifically why we moved away from TSV and to JSON, since it was nuking disk storage for those indices…

The Bro TA is assuming TSV extractions. The move to JSON probably is causing the Splunk auto-sourcetyper to do some funky things.

[source::…bro..log]
SHOULD_LINEMERGE = false
TRUNCATE = 0
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %s.%6N
TRANSFORMS-BroAutoType = BroAutoType, TrashComments
INDEXED_EXTRACTIONS = TSV
FIELD_HEADER_REGEX = ^#fields\t(.
)
FIELD_DELIMITER = \t
FIELD_QUOTE = \t

Odd, I'd expect it to be about the same. The indexed data should be the same, and even though every json record includes the field names, they compress well.

It's possible that the bro app indexing the fields individually is what makes the indexes larger... if you do something like

    id_resp_p=6379

(or whatever the field shows up as for you)

does that find the records immediately, or does it have to scan through all the data?

without individual field indexes you would have to do something like

    6379 id_resp_p=6379

and hope that speeds it up, if you're trying to do something like

    id_orig_p=80

Then this will be pretty slow:

    80 id_orig_p=80

Drew,

It should work just fine, assuming TSV headers are present, as it keys off the headers for the extractions.

I would assume your index volume (license usage) is significantly greater though?

You’re right, the raw to indexed ratio is abysmal with TSV (we’re getting 0.29:1).

It is the TS IDX files in Splunk that grow out of control when using the Bro TSV app. Hope this helps for anyone interested.

The problem with the Spunk app is that indexing is occuring at time of ingest. This causes the indices of the Bro data to grow extremely fast. Using json and not the Bro app means that the data is indexed by Splunk, resulting in far smaller indices on the splunk indexing servers. This is specifically why we moved away from TSV and to JSON, since it was nuking disk storage for those indices…

Odd, I’d expect it to be about the same. The indexed data should be the same, and even though every json record includes the field names, they compress well.

It’s possible that the bro app indexing the fields individually is what makes the indexes larger… if you do something like

id_resp_p=6379

(or whatever the field shows up as for you)

does that find the records immediately, or does it have to scan through all the data?

without individual field indexes you would have to do something like

6379 id_resp_p=6379

and hope that speeds it up, if you’re trying to do something like

id_orig_p=80

Then this will be pretty slow:

80 id_orig_p=80

I do wonder if it’s even faster having the pre-search-time extractions in the tsidx files. I suppose if you’re going for a specific IP, the bloom filters may help?

I’ve been really hesitant to move to JSON, simply because of the added raw volume impact on licensing. Bro is already over 250GB/day for us using TSV files.

Heh. We have a multi-TB license for splunk… Bro is one of the largest consumers of that license…

I do wonder if it’s even faster having the pre-search-time extractions in the tsidx files. I suppose if you’re going for a specific IP, the bloom filters may help?

I’ve been really hesitant to move to JSON, simply because of the added raw volume impact on licensing. Bro is already over 250GB/day for us using TSV files.

I’ve attached a modified version of the Splunk TA for bro, that accommodates bro logs in json format. Let me know if you have any problem with it.

Thanks,
Steve

TA-bro_json.tar.gz (4.14 KB)