Hi all,
I’m using bro 2.5.1 for network security monitoring , the message queue is kafka componment (the bro-to-kafka plugin version is v0.5.0, librdkafka version is v0.9.5).
Now i have encountered an error when network traffic up to 1.6Gbps, the error message is segment fault from src/Event.cc#90
, bro crashed.
The following listed is our test environment informations:
CPU: 32 core Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Memory: 64G
NIC: Speed 10000Mb/s
Storage: 2TB SATA, 100GB SSD
Below listed information is backtrace from core dump. (more on gist)
#0 SetNext (this=0x0, n=0x7fe292ebd490) at /opt/download/bro/src/Event.h:21 #1 EventMgr::QueueEvent (this=0xc302c0 , event=event@entry=0x7fe292ebd490) at /opt/download/bro/src/Event.cc:90 #2 0x00000000005fe6a7 in QueueEvent (obj=0x0, mgr=0x0, aid=0, src=0, vl=0x7fe2e2bedb80, h=…, this=) at /opt/download/bro/src/Event.h:88 #3 Reporter::DoLog (this=0x29aabb0, prefix=prefix@entry=0x908cd7 “error”, event=…, out=0x0, conn=conn@entry=0x0, addl=addl@entry=0x0, location=location@entry=true, time=time@entry=true, postfix=postfix@entry=0x0, fmt=fmt@entry=0x7fe36c719d70 “Kafka send failed: %s”, ap=ap@entry=0x7fe36aa3eaf8) at /opt/download/bro/src/Reporter.cc:350 #4 0x00000000005fee8f in Reporter::Error (this=, fmt=fmt@entry=0x7fe36c719d70 “Kafka send failed: %s”) at /opt/download/bro/src/Reporter.cc:76 #5 0x00007fe36c717fa9 in logging::writer::KafkaWriter::DoWrite (this=0x6369270, num_fields=, fields=, vals=0x69d2080) at /opt/download/bro/aux/plugins/kafka/src/KafkaWriter.cc:156 #6 0x000000000089e495 in logging::WriterBackend::Write (this=0x6369270, arg_num_fields=, num_writes=1000, vals=0x6dc7bf0) at /opt/download/bro/src/logging/WriterBackend.cc:301 #7 0x0000000000662180 in threading::MsgThread::Run (this=0x6369270) at /opt/download/bro/src/threading/MsgThread.cc:371 #8 0x000000000065eaa8 in threading::BasicThread::launcher (arg=0x6369270) at /opt/download/bro/src/threading/BasicThread.cc:205 #9 0x00007fe36e8ce2b0 in ?? () from /lib64/libstdc++.so.6 #10 0x00007fe36ed2ce25 in start_thread () from /lib64/libpthread.so.0 #11 0x00007fe36e03634d in clone () from /lib64/libc.so.6
Varibles on frame 1
(gdb) f 1 #1 EventMgr::QueueEvent (this=0xc302c0 , event=event@entry=0x7fe292ebd490) at /opt/download/bro/src/Event.cc:90 90 tail->SetNext(event); (gdb) info args this = 0xc302c0 event = 0x7fe292ebd490 (gdb) info locals done = (gdb) p head $1 = (Event *) 0x7fe3540c81c0 (gdb) p tail $2 = (Event *) 0x0 (gdb) p event
$3 = (Event *) 0x7fe292ebd490
During test, the variable tail
is NULL pointer always when bro crashed, however the variable head
is NULL or not.
on my research, in the huge network traffic scenario, KakfaWriter write log to kafka exceed the limit of
configuration queue.buffering.max.messages(default is 100000)
or queue.buffering.max.kbytes(default is 4000000, 4G in other words)
in librdkafka,
and QUEUE_FULL error raised by librdkafka, then KafkaWriter call Reporter::Error to report the runtime error.
so KafkaWriter::DoWrite lead too many action to call Reporter::Error function.
I guess the issue cause with concurrency access to the varible tail
without lock, then it set to be a NULL pointer, but i don’t know why.
Then call the function SetNext
with the NULL pointer, segment fault was raised.
The above is my guesswork, maybe there is another reason.
Wish someone could help.
Best regards,
Myth