Hello,
I have been using the Bro 2.4 to test the performance of SFC driver. I have observed the following issue because of which I am unable to proceed with any analysis -
There seems to be a memory leak somewhere as there are times when Bro runs out of memory too soon. These are the instances when drops are also seen too soon even at very low packet rates.
When Bro is started, the available free memory keeps going down till a point where the server is extremely sluggish and there are drops being seen –
An instance of Bro running out of memory (with 16 workers, no cpu pinning and having sent 155K pps for 7-8 minutes)–
[root@dellr620c skathare]# free -m
total used free shared buffers cached
Mem: 32129 31917 211 3 1 376 à 211MB : that’s very low, considering the system started with some 26GB free memory (and this drop happens just within the first 2 minutes of running the traffic). System becomes very slow at this point and, of course, it has started dropping packets already.
-/+ buffers/cache: 31539 589
Swap: 1907 1687 219
[root@dellr620c skathare]#
Swap: 1907 1764 142
[root@dellr620c skathare]# cat /proc/meminfo
MemTotal: 32900200 kB
MemFree: 193384 kB
MemAvailable: 480956 kB
Buffers: 2464 kB
Cached: 471260 kB
SwapCached: 74860 kB
Active: 23439908 kB
Inactive: 3120012 kB
Active(anon): 23179296 kB
Inactive(anon): 2914628 kB
Active(file): 260612 kB
Inactive(file): 205384 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 1953076 kB
SwapFree: 152264 kB
Dirty: 22548 kB
Writeback: 8 kB
AnonPages: 26017216 kB
Mapped: 15200 kB
Shmem: 4692 kB
Slab: 190556 kB
SReclaimable: 93648 kB
SUnreclaim: 96908 kB
KernelStack: 4288 kB
PageTables: 71380 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 15638376 kB
Committed_AS: 29119672 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 374920 kB
VmallocChunk: 34342144308 kB
HardwareCorrupted: 0 kB
AnonHugePages: 243712 kB
HugePages_Total: 2700
HugePages_Free: 46
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 345024 kB
DirectMap2M: 18483200 kB
DirectMap1G: 16777216 kB
[root@dellr620c skathare]#top
top - 22:48:06 up 70 days, 19:43, 5 users, load average: 18.25, 13.49, 10.04
Tasks: 17 total, 1 running, 16 sleeping, 0 stopped, 0 zombie
%Cpu(s): 31.5 us, 2.9 sy, 0.7 ni, 43.7 id, 21.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 32900200 total, 32731868 used, 168332 free, 336 buffers
KiB Swap: 1953076 total, 1262956 used, 690120 free. 14248 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND P
20504 root 20 0 404644 32724 1464 D 0.0 0.1 6:32.62 bro 31
20552 root 20 0 404692 50108 1496 D 52.1 0.2 6:37.18 bro 29
20564 root 20 0 404824 50960 1508 D 49.8 0.2 6:37.16 bro 27
20574 root 20 0 404652 48748 1476 D 52.0 0.1 6:36.52 bro 24
20567 root 20 0 404684 49948 1456 D 0.0 0.2 6:33.32 bro 23
20561 root 20 0 421440 66672 1412 D 51.9 0.2 6:37.14 bro 18
20569 root 20 0 404708 31904 1508 D 41.1 0.1 6:34.77 bro 16
20495 root 20 0 404620 49936 1408 D 27.1 0.2 6:34.91 bro 13
20515 root 20 0 404684 46324 1500 D 21.9 0.1 6:33.25 bro 13
20548 root 20 0 404704 50188 1504 D 43.6 0.2 6:35.16 bro 13
20474 root 20 0 404736 32704 1508 D 0.0 0.1 6:32.79 bro 12
20502 root 20 0 404636 29300 1464 D 52.0 0.1 6:36.13 bro 12
20539 root 20 0 404748 32784 1484 D 44.7 0.1 6:34.08 bro 11
20537 root 20 0 404668 29284 1464 D 0.0 0.1 6:32.03 bro 4
20559 root 20 0 404684 32644 1444 R 54.6 0.1 6:38.12 bro 3
20542 root 20 0 404728 32704 1504 D 25.1 0.1 6:33.84 bro 1
20289 root 20 0 196768 412 412 S 0.0 0.0 0:00.03 solar_clusterd 0
After stopping the BRO workers (especially after the manager is killed/stopped), memory recovers –
top - 22:53:01 up 70 days, 19:48, 5 users, load average: 3.06, 8.83, 9.20
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.3 sy, 2.2 ni, 96.6 id, 0.8 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 32900200 total, 6702252 used, 26197948 free, 8308 buffers à This is almost what the system originally started with – 26GB
KiB Swap: 1953076 total, 237216 used, 1715860 free. 528032 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND P
20289 root 20 0 196768 1028 484 S 0.0 0.0 0:00.03 solar_clusterd
At very high packet rates, the available free memory keeps going down very fast and starts dropping packets. At lower packet rates, the drop in available free memory is comparatively slower, but it is still there and packets are dropped eventually. When the BRO workers are stopped, the available free memory recovers. During the few successful times when I have been able to go till 150Kpps without seeing any packet drops, the available free memory remained a constant at ~23G. It remained at this for the entire duration of the test (more than an hour ) and no drops were seen.
The above data is a few days old. When I tried running BRO again today, I saw the memory drop from 18G to 4G in just a matter of few seconds after starting BRO (16 workers, each pinned to a CPU). Is it possible that Bro is accumulating some per-flow state and not freeing it? If so, is there any tuning that should be done to avoid this?
Appreciate any help on this!