Netmap plugin issue

Hello,

I am trying to build netmap plugin with Bro 2.5 and having the following error:

[root@bro netmap]# ./configure --bro-dist=/home/bro/bro-2.5 --install-root=/var/bro --with-netmap=/home/bro/netmap-corelight_updates

Build Directory : build

Bro Source Directory : /home/bro/bro-2.5

– The C compiler identification is GNU 4.8.5

– The CXX compiler identification is GNU 4.8.5

– Check for working C compiler: /bin/cc

– Check for working C compiler: /bin/cc – works

– Detecting C compiler ABI info

– Detecting C compiler ABI info - done

– Check for working CXX compiler: /bin/c++

– Check for working CXX compiler: /bin/c++ – works

– Detecting CXX compiler ABI info

– Detecting CXX compiler ABI info - done

– Performing Test cxx11_header_works

– Performing Test cxx11_header_works - Success

– Bro executable : /home/bro/bro-2.5/build/src/bro

– Bro source : /home/bro/bro-2.5

– Bro build : /home/bro/bro-2.5/build

– Bro install prefix : /var/bro

– Bro plugin directory: /var/bro

– Bro debug mode : false

– Could NOT find Netmap (missing: NETMAP_INCLUDE_DIR)

CMake Error at CMakeLists.txt:20 (message):

Netmap headers not found.

– Configuring incomplete, errors occurred!

See also “/home/bro/bro-2.5/aux/plugins/netmap/build/CMakeFiles/CMakeOutput.log”.

Any suggestions?

Thanks,

Andy

Unfortunately in the plugin that comes with 2.5, you need netmap installed on the system you're building on. We're going to be making changes for the 2.6 release so that nothing is required from netmap at build time and it is just built by default and included in Bro.

If you install netmap, does the problem go away?

  .Seth

Yes, the issue was resolved and was able to install the plugin.
However, we have two 10 gig NICs on the Bro worker node and netmap cannot allocate memory for the second interface.

[root@sec-bro04 ]# lb -i em1 -B 1024 -p broem1:10
881.534306 main [637] interface is em1
881.534384 main [658] requested 1024 extra buffers
883.371747 main [772] successfully opened netmap:em1 (tx rings: 512)
883.371818 main [783] obtained 1024 extra buffers

[root@sec-bro04 ]# lb -i em2 -B 1024 -p broem2:10
014.454468 main [637] interface is em2
014.454555 main [658] requested 1024 extra buffers
014.501811 nm_open [920] NIOCREGIF failed: Cannot allocate memory em2
014.501828 main [768] cannot open netmap:em2

In my case only 5 instances are running per NIC; cannot run 10/NIC as it crashes. I modified the lb_procs to 5 in node.cfg.

However, I am not seeing any packet forwarded or dropped. Do you see that on the running instances ?

I can get 10 instances to run on the x520 if I comment out the IGB worker in node.cfg. I’m wondering if the issue has to due with memory allocations done when the netmap kernel module loads. Do we need to tweak them in modprobe.d to account for two instances requesting X number of buffers?

"data_forward_rate_Mbps": 330.0390,
"data_drop_rate_Mbps": 0.0000,
"packet_forward_rate_kpps": 71.5200,
"packet_drop_rate_kpps": 0.0000,

-Dave

I did tweak it by putting this in /etc/modprobe.d/netmap.conf :

options netmap default_pipes=1000

options netmap ring_num=1024

options netmap buf_num=655360

options netmap if_num=1024

options netmap ring_size=100000

options netmap buf_size=4096

options netmap if_size=4096

However, when I run:

[sec-bro04 ~]$ lb -i em1 -B 1000 -p broem1:5 &

[sec-bro04 ~]$ lb -i em2 -B 1000 -p broem2:5 &

Ring stats :

Feb 10 16:03:54 sec-bro04 lb:

{“ts”:1486771434.006337,“input_interface”:“netmap:em1”,“output_interface”:“netmap:broem1{0/xT@1”,“packets_forwarded”:0,“packets_dropped”:0,“data_forward_rate_Mbps”:0.0000,

“data_drop_rate_Mbps”:0.0000,“packet_forward_rate_kpps”:0.0000,“packet_drop_rate_kpps”:0.0000,“overflow_queue_size”:0}

Feb 10 16:03:54 sec-bro04 lb: {“ts”:1486771434.668234,“input_interface”:“netmap:em2”,“output_interface”:“netmap:broem2{0/xT@1”,“packets_forwarded”:0,“packets_dropped”:0,“data_forward_rate_Mbps”:0.0000,

“data_drop_rate_Mbps”:0.0000,“packet_forward_rate_kpps”:0.0000,“packet_drop_rate_kpps”:0.0000,“overflow_queue_size”:0}

Netmap doesn't currently mark interfaces as promiscuous when it connects. If you manually mark the interface promisc, do you get packets?

  .Seth

Yea!! I totally forgot about that...
But the packet_drop is very high! Is it cause of the tweaks in modprobe.d ?

Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.253931,"input_interface":"netmap:em1","output_interface":"netmap:broem1{0/xT@1","packets_forwarded":66870,"packets_dropped":112000,"data_forward_rate_Mbps":9.5147,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":0.9570,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.253931,"input_interface":"netmap:em1","output_interface":"netmap:broem1{1/xT@1","packets_forwarded":56280,"packets_dropped":90700,"data_forward_rate_Mbps":11.9930,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":1.5820,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.253931,"input_interface":"netmap:em1","output_interface":"netmap:broem1{2/xT@1","packets_forwarded":51795,"packets_dropped":83500,"data_forward_rate_Mbps":10.2431,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":0.8180,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.253931,"input_interface":"netmap:em1","output_interface":"netmap:broem1{3/xT@1","packets_forwarded":535398,"packets_dropped":924500,"data_forward_rate_Mbps":175.7472,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":13.4440,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.253931,"input_interface":"netmap:em1","output_interface":"netmap:broem1{4/xT@1","packets_forwarded":228411,"packets_dropped":370900,"data_forward_rate_Mbps":13.0369,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":2.8260,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.253931,"interface":"netmap:em1","packets_received":2522287,"packets_forwarded":938754,"packets_dropped":1581600,"non_ip_packets":54442,"data_forward_rate_Mbps":220.5349,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":19.6260,"packet_drop_rate_kpps":0.0000,"free_buffer_slots":1000}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.255037,"input_interface":"netmap:em2","output_interface":"netmap:broem2{0/xT@1","packets_forwarded":200176,"packets_dropped":297700,"data_forward_rate_Mbps":3.0338,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":2.2230,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.255037,"input_interface":"netmap:em2","output_interface":"netmap:broem2{1/xT@1","packets_forwarded":18104,"packets_dropped":46000,"data_forward_rate_Mbps":0.1089,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":0.0960,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.255037,"input_interface":"netmap:em2","output_interface":"netmap:broem2{2/xT@1","packets_forwarded":18740,"packets_dropped":20400,"data_forward_rate_Mbps":0.5375,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":0.4710,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.255037,"input_interface":"netmap:em2","output_interface":"netmap:broem2{3/xT@1","packets_forwarded":26775,"packets_dropped":31500,"data_forward_rate_Mbps":0.0908,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":0.0700,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.255037,"input_interface":"netmap:em2","output_interface":"netmap:broem2{4/xT@1","packets_forwarded":235348,"packets_dropped":358200,"data_forward_rate_Mbps":3.4327,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":2.6540,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}
Feb 10 19:04:59 sec-bro04 lb: {"ts":1486782299.255037,"interface":"netmap:em2","packets_received":1255048,"packets_forwarded":499143,"packets_dropped":753800,"non_ip_packets":47454,"data_forward_rate_Mbps":7.2036,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":5.5130,"packet_drop_rate_kpps":0.0000,"free_buffer_slots":1000}

I have the same observations Andy.

{“ts”:1486786393.408004,“interface”:“netmap:eth6”,“packets_received”:3816916,“packets_forwarded”:2495606,“packets_dropped”:1213100,“non_ip_packets”:2014,“data_forward_rate_Mbps”:93.6190,“data_drop_rate_Mbps”:0.0000,“packet_forward_rate_kpps”:29.4030,“packet_drop_rate_kpps”:0.0000,“free_buffer_slots”:100000}

Wouldn’t a 30% packet loss result in a high number of weird.log messages as well as a high capture_loss? Bro is reporting under 1% for that worker.

1486786916.930127 600.000012 MID_INT-9 21 120263 0.017462
1486786916.899102 600.000025 MID_INT-2 26 113792 0.022849
1486786916.913207 600.000046 MID_INT-5 17 114020 0.01491
1486786917.062056 600.000040 MID_INT-1 37 122988 0.030084
1486786916.899164 600.000046 MID_INT-6 20 117978 0.016952
1486786916.898184 600.000043 MID_INT-8 10 117535 0.008508
1486786916.899106 600.000023 MID_INT-10 31 135819 0.022824
1486786916.899611 600.000023 MID_INT-3 19 130912 0.014514
1486786916.902911 600.000014 MID_INT-7 24 144454 0.016614
1486786916.897984 600.000029 MID_INT-4 25 106400 0.023496

-Dave

Hmm, that’s interesting. For me Bro is reporting the capture loss which kind of matches the overall netmap stats and it is very high:

1486814690.289371 900.000065 sec-bro04-1-1 4983 28153 17.699712

1486814690.296848 900.000132 sec-bro04-1-5 29050 69353 41.887157

1486814690.283136 900.000080 sec-bro04-1-4 221424 242109 91.456328

1486814690.315410 900.000052 sec-bro04-1-2 26613 65599 40.569216

1486814690.300392 900.000025 sec-bro04-1-3 7591 34491 22.00864

1486815590.289398 900.000027 sec-bro04-1-1 15437 42078 36.68663

1486815590.315530 900.000120 sec-bro04-1-2 913 9650 9.46114

1486815590.283290 900.000154 sec-bro04-1-4 40906 49390 82.822434

What's more interesting for me here is that the packet_drop_rate_kpps is 0. What could have happened is that the packets were lost because Bro processes hadn't been started yet. It seems to me from this line is that ~100Mbps of traffic is flowing with no packets being lost.

  .Seth

During the time where the data was collected for this capture-loss log, what was the output of lb showing? Did it show any bursts of loss?

  .Seth

Capture_loss event:
1486803890.282458 900.000048 sec-bro04-1-4 28622 35953 79.60949

Lb logs:
Feb 11 01:04:50 sec-bro04 lb: {"ts":1486803890.238885,"input_interface":"netmap:em1","output_interface":"netmap:broem1{4/xT@1","packets_forwarded":6664358,"packets_dropped":177568,"data_forward_rate_Mbps":3.1411,"data_drop_rate_Mbps":0.0000,"packet_forward_rate_kpps":2.0600,"packet_drop_rate_kpps":0.0000,"overflow_queue_size":0}

On other interfaces it does show some loss but nothing substantial.

My LB stats are similar, but Bro isn’t reflecting a loss:

Feb 11 04:58:02 mid-csignsm-01 lb[3144]: {“ts”:1486807082.815681,“interface”:“netmap:eth6”,“packets_received”:739258266,“packets_forwarded”:737718676,“packets_dropped”:1213100,“non_ip_packets”:371283,“data_forward_rate_Mbps”:758.6328,“data_drop_rate_Mbps”:0.0000,“packet_forward_rate_kpps”:102.7250,“packet_drop_rate_kpps”:0.0000,“free_buffer_slots”:100000}

The next cycle of capture_loss shows less than 1% loss for the 10 workers:

2017-02-11T05:01:56-0500 600.000119 MID_INT-2 5 102927 0.004858
2017-02-11T05:01:56-0500 600.000015 MID_INT-4 7 107577 0.006507
2017-02-11T05:01:56-0500 600.000053 MID_INT-9 6 101979 0.005884
2017-02-11T05:01:56-0500 600.000016 MID_INT-3 1 304887 0.000328
2017-02-11T05:01:56-0500 600.000543 MID_INT-10 5 186552 0.00268
2017-02-11T05:01:56-0500 600.000009 MID_INT-5 6 101433 0.005915
2017-02-11T05:01:56-0500 600.000005 MID_INT-6 5 110256 0.004535
2017-02-11T05:01:56-0500 600.000085 MID_INT-7 1 98164 0.001019
2017-02-11T05:01:56-0500 600.000041 MID_INT-8 4 99979 0.004001
2017-02-11T05:01:57-0500 600.000047 MID_INT-1 3 90591 0.003312

I also noticed that Andy’s LB output is slightly different. His displays the free buffers as “overflow_queue_size” where my output is “free_buffer_slots”.

Also Andy, your overflow queue size is “0”, did you define one and its been depleted? Creating one, or increasing the size may help with the Bro dropped packets.

-Dave

[ n00b. well i ran bro over a decade ago. ]

a ganeti cluster running ganeti 2.15 on deb8 and ubuntu16

i run bro in a vm on one of the nodes. as it is on the bridged lan, it
sees all the traffic to all vms whose primary is on the same node.
this is sweet. but i want to see the traffic to the vms whose primary
are on the other nodes.

so what is the minimial hack i can run on other nodes to stream pcaps
to that bro instance so that the whole cluster is feeding to one bro
instance? i would prefer a simple hack to run on the host opsys, but
could create more guest vms iff i had to.

the cluster has a second inter-node lan i could use to avoid pcapping
the pcap transport.

[ no, i prefer not to mirror off the switch ]

randy

Continuing to see impressive performance with Bro+Netmap:

data_forward_rate_Mbps":1484.1698"
data_drop_rate_Mbps":0.0000"

And of the 10 workers, the greatest capture_loss reported by Bro is well under 1%:

2017-02-13T09:01:56-0500 600.000013 MID_INT-8 1221 3368533 0.036247

-Dave

Yay, that's great! At least now you can feel more certain that the capture loss is either misreported or you have a SPAN port or packet broker that is having trouble. You could also check the interface hardware counters here and there to see if you are having any loss on the NIC. (ethtool -S)

  .Seth

Those are different lb log lines. The lines with overflow_queue_size are regarding the output pipes that send packets off the Bro (or other) processes. The line that has free_buffer_slots is regarding the interface being sniffed and it means that those are buffers (each buffer holds a single packet) that can be used if a pipe isn't being flushed quickly enough. If you have free buffers and packets begin to get backed up, the free_buffer_slots number on the physical interface will begin to go down and the overflow_queue_size on the pipe or pipes getting backed up will begin to rise.

I'm planning on writing a more extensive guide on all of this soon.

  .Seth

Yay, that’s great! At least now you can feel more certain that the capture loss is either misreported or you have a SPAN port or packet broker that is having trouble. You could also check the interface hardware counters here and there to see if you are having any loss on the NIC. (ethtool -S)

.Seth

Seth, so are you thinking that LB is mis-reporting packet loss? This is the ethtool stats for the capture NIC:

NIC statistics:
rx_packets: 55681830
tx_packets: 0
rx_bytes: 35600025423
tx_bytes: 0
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 43367780
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 193958
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_pkts_nic: 18460871971
tx_pkts_nic: 0
rx_bytes_nic: 10234659126400
tx_bytes_nic: 0
lsc_int: 9
tx_busy: 0
non_eop_descs: 0
broadcast: 277894
rx_no_buffer_count: 0
tx_timeout_count: 0
tx_restart_queue: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 0
rx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_flow_control_xoff: 0
rx_csum_offload_errors: 0

This number isn't really that high compared to how many packets came into the nic, but you may want to play with pinning the lb process to a core and changing it's nice level. That might improve how many packets are lost on the nic.

lb shouldn't be misreporting the number of packets that it is accepting and forwarding along to the netmap pipes.

  .Seth