Issue: load balancer PF_RING drops 25% of incoming packets

Hey All,

I am running Bro with configured PF_RING load balancing.

Strangely only 16 Million of my 21 Million packet input pass through the PF_RING kernel module. Nevertheless they are then distributed correctly on the Bro processes.

How can I avoid this loss of 5 Million packets and how can I verify that PF_RING is configured correctly?

I use Intel Corporation I350 Gigabit Network Connection as NICs. They work with the igb drivers.

The input rate is 0.5Gb/s = 60k to 80k packets/s and currently I am working without the ZeroCopy drivers

It is verified that all of my 21 Million packets are received by my NIC’s driver.

The PF_Ring module itself exists and BRO is running with load balancing.

Looking forward to your response and hope to solve this problem with you. Below you will find more detailed information about my system.

If you need something else let me know.

Best,

Enno

Additional information:

One interesting fact: I cannot run “make” in “PF_RING/userland/examples”, because

gcc: error: …/libpcap/libpcap.a: No such file or directory

PF_RING/userland looks like this. Indeed “libpcap” is missing

c++ examples examples_zc fast_bpf go lib libpcap-1.7.4 Makefile snort tcpdump-4.7.4

PF_RING module

[root@slinky-3-4 ~]# cat /proc/net/pf_ring/info

PF_RING Version : 6.5.0 (dev:9e221bc0b91040afee98f3e3c22ce83226f63f3e)

Total rings : 0

Standard (non ZC) Options

Ring slots : 32768

Slot version : 16

Capture TX : No [RX only]

IP Defragment : No

Socket Mode : Standard

Total plugins : 0

Cluster Fragment Queue : 0

Cluster Fragment Discard : 0

Used NICs

[rosinger@slinky-3-4 ~]$ lspci | egrep -i --color ‘network|ethernet’

02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

NICs Drivers

[rosinger @slinky-3-4 ~]# ethtool -i eno2

driver: igb

version: 5.2.15-k

firmware-version: 1.61, 0x80000cd5, 1.1067.0

bus-info: 0000:02:00.1

supports-statistics: yes

supports-test: yes

supports-eeprom-access: yes

supports-register-dump: yes

supports-priv-flags: no

Enno Rosinger
Student DualStudy Business Informatics - Application Management

+49 617 22685124 Office
Hewlett-Packard-Straße 1| 61352 Bad Homburg | Germany
enno.rosinger@hpe.com

HPE_logoemail

Strangely only 16 Million of my 21 Million packet input pass through the PF_RING kernel module. Nevertheless they are then distributed correctly on the Bro processes.
How can I avoid this loss of 5 Million packets and how can I verify that PF_RING is configured correctly?

What are you using to measure the difference in packet counts? Where is the 21 and 16 coming from?

Can you add this to your local.bro and see what it logs to capture_loss.log after 30 minutes or so?

    @load misc/capture-loss

I use Intel Corporation I350 Gigabit Network Connection as NICs. They work with the igb drivers.
The input rate is 0.5Gb/s = 60k to 80k packets/s and currently I am working without the ZeroCopy drivers
It is verified that all of my 21 Million packets are received by my NIC’s driver.
The PF_Ring module itself exists and BRO is running with load balancing.

Looking forward to your response and hope to solve this problem with you. Below you will find more detailed information about my system.
If you need something else let me know.

Best,
Enno

Additional information:

One interesting fact: I cannot run “make” in “PF_RING/userland/examples”, because
gcc: error: ../libpcap/libpcap.a: No such file or directory

PF_RING/userland looks like this. Indeed “libpcap” is missing
c++ examples examples_zc fast_bpf go lib libpcap-1.7.4 Makefile snort tcpdump-4.7.4

This should fix your build issue:

    cd PF_RING/userland
    ln -s libpcap-1.7.4 libpcap

Hi Justin,

Thank you for the fast reply.

21 Million received packets: Bro receives it's traffic on an isolated network (where the traffic is generated another server by TCPreplay). I manually take the stats of received packets of the NIC before and after a replaying by issuing "ifconfig eno2(interface-name)" .
16 Million handled packets: I use broctl and issue the command "netstats" to see the number of each worker process' received packets. If you make a sum out of that you will come to 16 Million (NOTE: now 18 Million, as I upgraded to Zero Copy drivers since the last mail).

###Ifconfig on Bro system###
###Before replaying###
[root@slinky-3-4 kernel]# ifconfig eno2
[...]
        RX packets 25758824 bytes 20353552393 (18.9 GiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 182 bytes 36558 (35.7 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [...]

###After replaying###
[root@slinky-3-4 kernel]# ifconfig eno2
[...]
        RX packets 47447181 bytes 37400251832 (34.8 GiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 268 bytes 54486 (53.2 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [...]

That makes 47447181 - 25758824 = 21.688.357 received packets

###netstats in broctl on Bro system###
### after replaying ###
[BroControl] > netstats
worker-1-1: 1469577816.953862 recvd=5088052 dropped=0 link=5088052
worker-1-2: 1469577817.153796 recvd=4205599 dropped=0 link=4205599
worker-1-3: 1469577817.353889 recvd=4562288 dropped=0 link=4562288
worker-1-4: 1469577817.554795 recvd=4546975 dropped=0 link=4546975

The sum of this is 18.402.914 packets, which are seen by BRO as "on the link".

Thanks to your help on the build issue I can also support this number with the stats of pfcount (NOTE: This is another run - slightly different numbers ) ##PFcount result Absolute Stats: [18'416'555 pkts total][0 pkts dropped][0.0% dropped]
[18'416'555 pkts rcvd][17'225'248'719 bytes rcvd][58'886.73 pkt/sec][440.62 Mbit/sec] ========================= Actual Stats: [0 pkts rcvd][722.14 ms][0.00 pps][0.00 Gbps]

As you requeted the capture_loss stats. I currently do not understand what the issue is with this.
I hope you can help me track down the cause for this numbers ...
###first caputer loss file###
#path capture_loss
#open 2016-07-26-16-48-14
#fields ts ts_delta peer gaps acks percent_lost
#types time interval string count count double
1469576894.926898 900.000078 worker-1-4 1156978 1683888 68.708726
1469576894.926602 900.000073 worker-1-1 1396713 1911004 73.087916
1469576894.977632 900.000080 worker-1-2 1055436 1544723 68.32526
1469576895.027647 900.000080 worker-1-3 1218489 1710519 71.235046
#close 2016-07-26-17-00-0

###second caputer loss file###
#open 2016-07-26-17-03-37
#fields ts ts_delta peer gaps acks percent_lost
#types time interval string count count double
1469577794.926695 900.000093 worker-1-1 0 0 0.0
1469577794.977721 900.000089 worker-1-2 0 0 0.0
1469577795.027754 900.000107 worker-1-3 0 0 0.0
1469577794.927012 900.000114 worker-1-4 0 0 0.0
#close 2016-07-26-17-05-030

Looking forward to your response. It already helps me a lot to have more support on this issue.

Best,
Enno

Ah, do I understand that to mean that pfcount is also showing that only 18 million packets are received as well? If that is the case you should probably reach out to the pf_ring people and see if they have any ideas.

If pfcount and bro both agree on the number of packets received, the problem is probably not within bro.

The one thing I can think of is that you have not disabled offloading with something like

    for i in rx tx sg tso ufo gso gro lro; do ethtool -K en02 $i off; done

So that while the system is receiving 21 million packets, they are being reassembled into only 18 million.

Hey Justin,

Thanks for these good advices.
I'll test if it helps turn off the offloading, because it could indeed be, that the packets are reassembled.
Do you think the PF_RINGs packet difference could also be caused by a bad configured irq-affinity? Philosnef suggested that in a separate mail.

And I agree that BRO seems to be working fine, since the numbers match pretty well.
I am looking forward to the pf-ring people's response and see what their opinion is.

Thanks for the support again. I'll let you know if I could fix it.

Best,
Enno

Hey Everyone,

With your hint to disable the offloading features, I can now see all my packets as required and Bro is distributing them evenly.
I consider this Issue fixed for now. Thanks for all the help and advices - I appreciate it.

Best,
Enno