Hi,
Just wondering if anyone has run into this problem. I'm running Robin's 1.4 cluster code on a stand-alone AMD dual-core machine that is monitoring a 200 Mbit connection. I've been running this setup for a couple months, and it has been working well.
I noticed that Bro seemed to be missing some packets, and Seth told me to look for DroppedPackets in the notice.log. From what I recall, I was dropping upwards of 90% of the packets after filtering. The strange thing was the primary bro process CPU usage seemed to be low (20-30%), even though it was dropping most of the packets. I would have expected CPU to be high in trying to keep up. Is there some throttling mechanism to prevent the CPU from being maxed out?
I turned on some restrict filters to bring the DroppedPackets down. I turned off various things like HTTP, HTTPS, DNS, IMAP, SMTP, and a few others. After that, the ratio of (packets dropped after filtering) to received was less than one percent.
So that is the history and here is my problem. When I start the cluster, the above mentioned ratio will be less than 1%. It remains less than 1% for several days to a week. For no explicable reason, it will jump up as high as 99% and stay stuck there. This has happened twice in the last couple weeks. Whatever caused this problem also caused the logs to stop being rotated. They are showing a timestamp of June 7th (MST). When I run a cluster status, it shows the cluster is running, and I can see both bro-1.4-robin processes running, but their CPU usage has dropped down to 0.00%. I think the CPU usage for one of those processes had typically been around 16-20% when the dropped ratio was less than 1%.
Restarting the cluster should clear the problem again for a few days. Is there any other troubleshooting I can do before restarting to determine the cause of the problem?
Below are a few lines showing the high ratio of dropped packets. These are some of the last lines logged, so based on the timestamps, everything stopped around 01:48 6/8/09 GMT.
1244425212.229043:DroppedPackets:NOTICE_FILE:bro::::::::::317741 packets dropped after filtering, 387325 received, 1117174 on link::@26-c775-10c8ed
1244425222.229050:DroppedPackets:NOTICE_FILE:bro::::::::::16000 packets dropped after filtering, 68358 received, 192124 on link::@26-c775-10c91b
1244425236.737126:DroppedPackets:NOTICE_FILE:bro::::::::::311615 packets dropped after filtering, 341963 received, 979510 on link::@26-c775-10c939
1244425247.488693:DroppedPackets:NOTICE_FILE:bro::::::::::79770 packets dropped after filtering, 172843 received, 477468 on link::@26-c775-10c981
1244425261.942659:DroppedPackets:NOTICE_FILE:bro::::::::::315110 packets dropped after filtering, 315358 received, 878859 on link::@26-c775-10c987
1244425690.711860:DroppedPackets:NOTICE_FILE:bro::::::::::10878167 packets dropped after filtering, 10878305 received, 31316595 on link::@26-c775-10c994
Here are some earlier logs showing what the ratio normally looks like.
1243649373.579172:DroppedPackets:NOTICE_FILE:bro::::::::::1752 packets dropped after filtering, 333996 received, 612544 on link::@88-1bd3-bd5af
1243649383.579295:DroppedPackets:NOTICE_FILE:bro::::::::::1333 packets dropped after filtering, 342521 received, 627890 on link::@88-1bd3-bd5c2
1243649393.579339:DroppedPackets:NOTICE_FILE:bro::::::::::920 packets dropped after filtering, 326511 received, 605614 on link::@88-1bd3-bd5d1
1243649403.579424:DroppedPackets:NOTICE_FILE:bro::::::::::1336 packets dropped after filtering, 318679 received, 615016 on link::@88-1bd3-bd5ec
Tyler