Good day everyone,
I wanted to share what I have done with regards to uploading my bro clusters logs to Azure HDInsight Blob storage, then each day creating HIVE tables from the logs. My bro cluster averages around 55-60Gbps, so sorting through logs via zgrep, or even Elastic, is far from ideal. I found that looking for a specific file ID took around 30 minutes to search all logs for the day, but when using HDInsight it was under a minute.
Now I am sure there are some big data scientists on this forum, so forgive my newbness on Hadoop (HDInsight is Hortonworks Hadoop), as well as a basic python programmer, so that script is simple.
It is my hope that this work helps someone else, or at least gets them started. I will see if I can sanitize my bro cluster build documents and send that out in the hope that it also helps.
Here we go:
First I modify each bro log to be Hadoop name convention friendly (doesnt like colons), then each hour I upload my log files with the below python script: