We’re currently using Endace DAG capture cards to feed directly to bro, snort, and a rolling packet capture.
The network we’re currently looking at has a high number of retransmissions (at one point we counted 45% of traffic being retransmissions).
Bro is currently logging each packet as a separate connection in conn.log, and is failing to run the protocol analyzers correctly (i.e. it’ll detect it as FTP, but will only log the action, not the login, response).
What’s weird is that if I run bro against the rolling pcap, it works correctly. This problem only occurs when bro is listening to the device directly.
This problem is still occurring with 2.3.1, so I’m at a loss. I enabled the capture-loss module, and it’s reporting 0%. The capture card doesn’t seem to be dropping anything either.
Seen anything similar or have any suggestions for troubleshooting/fixing?
If I were to bet, I'd guess it has something to do with how the Endace card is load-balancing packets across your bro workers. If the retransmission packets are ending up on different workers than the original session, then each worker will think it's got a new session, and log it accordingly.
How do you have the Endace card configured? (for the 9.2X2 I have, n_tuple_select is the pertinent config option.)
I think if it were getting multiple copies of each packet, it’d be logging those as well. The dup3 on the card just duplicates the stream so each process receives the same traffic. If a process attaches to the stream (dag0:0, :2, and :4) that stream can’t be attached from another process.
Looking back we had some time over the weekend when the traffic was a little slower and it didn’t have any errors. Likewise, we had a test at a previous site (mockup of this one) that didn’t have the issue, and we’ve been using this setup for a few years, this is just the first I’ve ever seen this issue. It’s just really weird.
I wonder how I could check to see if there’s something causing bro to break when talking to the card? It’s just weird that I can’t recreate it anywhere else.
I just had an epiphany when thinking about your response: the site we’re dealing with configured the SPAN incorrectly. Looking back through, I’m seeing every packet twice. I think they configured it to SPAN every port on that switch, including the SPAN. I’m going to have them fix that, maybe use the VLANs as the source as I had originally directed them.
I had them correct the issue, they were in fact sending us duplicates of packets.
The issue still persists, though. I’m not sure what else to check?
Is there anything like a timeout that could be set that’d be causing it to assuming the connection never completed?
I’ve just found out one of our other teams has been having a similar problem, but with 2.3 only. They had discussed it with Liam, but never found a resolution. It’s weird I’m having the same issue with both 2.1 and 2.3
Could you send me some lines from conn.log?