Hi,
I have managed to get TNAPI/PF_RING configured and working with PF_RING-aware libpcap. http://www.ntop.org/TNAPI.html
Looks like this will be very well suited to the Multiprocessing version of Bro.
- At the device driver level, RSS functionality (also, Flow Director in Intel) allows packets multiplexed to different Receiver Queues (and also allows packets belonging to a particular connection be sent to the same RX_Queue) on an I/OAT-supporting network card.
- By virtue of TNAPI, these multiple RX_Queues get polled concurrently (by one kernel thread per queue), and sent to PF_RING (along with information about which queue the packet came from).
- PF_RING provides a user API which can be used by user-applications like Bro to directly read from the multiple RX_Queues of a network interface by using notation like eth0@1, eth0@2, etc. for RX_Queues 1 and 2 belonging to interface eth0. By assigning one thread to one RX_Queue, we ensure that all packets from one connection are being processed by the same core.
PF_RING and TNAPI can be used to drastically improve the performance of any multiprocessing application, but need to be properly tuned and used by the application. Performance stems from the fact that for Bro, the packets can bypass the kernel’s network stack altogether; one thread polling per RX_Queue thanks to TNAPI; and PF_RING avoiding the mmap from Kernel space to User space by directly copying payloads from RX_Queue rings.
Configuration wise, it took a bit of work to change Bro’s configure files to use a PF_RING-aware libpcap instead of the libpcap that Bro ships with. When running TNAPI and PF_RING, there is a clear performance improvement in the kernel’s ability to receive packets at a higher packet rate (results on the TNAPI website, I also verified). But using PF_RING with the existing Bro leads to a performance degradation of Bro because Bro runs on one user-thread, and when all these packets reach user-space on different user-threads, they need to be processed by the core that is running Bro. But from my knowledge on TNAPI/PF_RING and intuition, multi-threaded Bro can be adapted to PF_RING and will lead to huge gains in performance.
Here’s the summary of results of a brief experiment that I performed on a 8-core Intel Xeon with32 GB RAM running on Linux and with an Intel 82598EB 10Gbps ethernet card:
Goal: Compare conventional Bro installation against Bro with TNAPI and PF_RING (I called it Bro-Ring)
Conclusion: Bro-Ring shows a performance drop.
Observations: The values in the table show for varying packet-rates, how many packets were accepted by the machine running Bro (rest were lost).
Packets/sec | Bro-Ring | Bro |
- | - | - |
34000 | 1368791 | 1368003 |
50000 | 1368546 | 1367707 |
65,000 | 1368614 |
|
120000 |
| 1224761 |
130000 |
| 1168734 |
166000 | 596667 |
|
170000 | 561702 |
|
171000 | 681104 |
|
173000 | 618100 |
|
175000 | 740137 |
|
178000 | 864706 |
|
210000 |
| 753700 |
215000 |
| 728450 |
230000 | 494637 |
|
240000 |
| 636287 |
(Note: there was a difference in tcpreplay’s input parameter packet-rate and the actual packet rate achieved, so I could not supply exact values for packet rate)
Sunjeet Singh