How can I reduce my packet loss - bro version 2.6.3

Hello,

My packet loss on my workers is pretty high. I have done things like CPU pinning but its still high. Can you please assist me in how I can reduce this to under 1%. Below are some of my settings.

cat capture_loss.log – Percent lost ranges from 2.7 to about 67.69

#fields ts ts_delta peer gaps acks percent_lost

#types time interval string count count double

1569863104.831317 900.000030 worker-1-3 114384 3799935 3.010157

1569863104.851327 900.000002 worker-1-1 162671 3677320 4.423629

1569863104.841705 900.000062 worker-1-9 100444 3393374 2.960004

1569863104.843460 900.000058 worker-1-11 148576 4171807 3.56143

1569863104.855034 900.000116 worker-1-16 165242 3769560 4.383589

1569863104.937666 900.000094 worker-1-23 124377 3891351 3.196242

1569863104.811991 900.000040 worker-1-12 339309 3176448 10.682026

1569863104.853635 900.000036 worker-1-7 304519 3266968 9.32115

1569863105.107706 900.000013 worker-1-8 296921 3475658 8.542872

1569863117.781723 900.739890 worker-1-15 635032 1385280 45.841418

1569863114.375886 900.000010 worker-1-17 631085 2596009 24.309816

1569863118.295945 900.001869 worker-1-6 369130 545290 67.694254

1569863160.141238 900.000229 worker-1-22 774134 1785146 43.365305

1569864004.845052 900.001592 worker-1-11 108257 3871564 2.796208

1569864004.860404 900.000040 worker-1-14 111798 3327087 3.360237

1569864004.937679 900.000013 worker-1-23 148738 3568913 4.167599

1569864004.951235 900.000218 worker-1-19 96672 3509661 2.754454

1569864004.976291 900.000009 worker-1-21 152430 3736550 4.079432

1569864005.211316 900.000025 worker-1-4 176180 3226673 5.460113

1569864005.193565 900.000005 worker-1-10 148535 3986455 3.725992

1569864004.811996 900.000005 worker-1-12 289760 3487997 8.307347

1569864005.107761 900.000055 worker-1-8 270405 3340848 8.093903

1569864014.375894 900.000008 worker-1-17 517282 2462898 21.002981

1569864018.295953 900.000008 worker-1-6 362966 571162 63.548695

1569864060.141335 900.000097 worker-1-22 601802 1520920 39.568288

cat node.cfg (below is the worker config in node.cfg. As you can see I pinned 23 CPUs)

[worker-1]

type=worker

host=localhost

interface=af_packet::ens2f0

lb_method=custom

#lb_method=pf_ring

lb_procs=23

pin_cpus=5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

af_packet_fanout_id=25

af_packet_fanout_mode=AF_Packet::FANOUT_HASH

I have 32 CPUs on this server and CPU model name is - AMD Opteron™ Processor 6386 SE

CPU MHz: 2800.000

CPU max MHz: 2800.0000

CPU min MHz: 1400.0000

Please assist. Thanks.

Thanks,

People have been having issues with older opterons like that for a long time. They have a lot of cores, but the single core performance is about half that of a more recent CPU.

With 32 real cores (assuming this is a dual socket system) I’d try running closer to 28 workers which gives you 20% more capacity over 23.

After that, you need to look at the conn.log to determine where your capture loss is coming from by looking at the missed_bytes column. You may have some elephant flows that are accounting for the majority of that loss.