Memory leaks on git eds2245

I'm finding that Bro leaks memory heavily whenever it drops packets,
such that if overwhelmed, it will consume all memory on a beefy box in
minutes. Even under a slight load, there appears to be memory
leakage. Here's the tail end of a valgrind:

==4477== 479,944 (128 direct, 479,816 indirect) bytes in 1 blocks are
definitely lost in loss record 5,304 of 5,306
==4477== at 0x4C274A8: malloc (vg_replace_malloc.c:236)
==4477== by 0x554FF21: CRYPTO_malloc (in /lib/libcrypto.so.0.9.8)
==4477== by 0x55E3D96: X509_STORE_new (in /lib/libcrypto.so.0.9.8)
==4477== by 0x5AAED4: BifFunc::bro_x509_verify(Frame*, ValPList*)
(bro.bif:3449)
==4477== by 0x59B88E: BuiltinFunc::Call(ValPList*, Frame*) const
(Func.cc:463)
==4477== by 0x5888BD: CallExpr::Eval(Frame*) const (Expr.cc:4649)
==4477== by 0x578169: AssignExpr::Eval(Frame*) const (Expr.cc:2598)
==4477== by 0x63AFAF: ExprStmt::Exec(Frame*, stmt_flow_type&) const
(Stmt.cc:369)
==4477== by 0x633E00: StmtList::Exec(Frame*, stmt_flow_type&) const
(Stmt.cc:1404)
==4477== by 0x59C9A0: BroFunc::Call(ValPList*, Frame*) const (Func.cc:320)
==4477== by 0x553A45: EventHandler::Call(ValPList*, bool)
(EventHandler.cc:73)
==4477== by 0x5531B4: EventMgr::Dispatch() (Event.h:46)
==4477==
==4477== 974,513 (5,760 direct, 968,753 indirect) bytes in 72 blocks
are definitely lost in loss record 5,306 of 5,306
==4477== at 0x4C27CC1: operator new(unsigned long) (vg_replace_malloc.c:261)
==4477== by 0x4FD9AD: yyparse() (parse.y:610)
==4477== by 0x50D437: main (main.cc:745)
==4477==
==4477== LEAK SUMMARY:
==4477== definitely lost: 285,262 bytes in 14,318 blocks
==4477== indirectly lost: 2,282,276 bytes in 39,468 blocks
==4477== possibly lost: 139,163 bytes in 2,955 blocks
==4477== still reachable: 9,741,975 bytes in 133,386 blocks
==4477== suppressed: 0 bytes in 0 blocks
==4477== Reachable blocks (those to which a pointer was found) are not shown.
==4477== To see them, rerun with: --leak-check=full --show-reachable=yes
==4477==
==4477== For counts of detected and suppressed errors, rerun with: -v
==4477== Use --track-origins=yes to see where uninitialised values come from
==4477== ERROR SUMMARY: 222225 errors from 373 contexts (suppressed: 4 from 4)

Seth/Gregor, could this still be SSL-related? Martin, any chance you
could try without SSL to see if it shows the same behaviour?

Robin

I would guess so. The valgrind output definitely hints at that. I just don't know SSL or its analyzer at all, so I'm afraid I might not be much help :frowning:

cu
Gregor

(BTW, the memory problems I have/had weren't "real" leaks. One a SSL connection was done Bro would free the memory for it again. The problem is that many SSL connections can live for days and thus they ultimately consume memory like a "real" leak would).

I implemented the code yesterday to stop analyzing connections with the skip_further_processing bif and it caused Bro to peak using more memory on the tracefile I was using it with than not stopping analysis of connections. One thing the SSL scripts are currently doing that I probably need to change is after logging the SSL log, I should probably do "delete c$ssl". The certificate and certificate chain are stored in there. Actually, as I think about it more that's probably most of the problem.

We may want to look into the real traffic implications of calling the skip_further_processing bif eventually too though. I was pretty disheartened to see more memory used from calling that than not calling it. Perhaps it results in more memory use to remember which connections to ignore? I suppose I wasn't checking completion time which is probably where the savings should mostly come from.

  .Seth

I implemented this and started running it on live traffic on a cluster and so far it seems to be holding up much better than it was previously. I'll have a better feel about it tomorrow but the initial indications seem to be that this is fixing most of the problem.

  .Seth

(BTW, the memory problems I have/had weren't "real" leaks. One a SSL
connection was done Bro would free the memory for it again. The problem
is that many SSL connections can live for days and thus they ultimately
consume memory like a "real" leak would).

I implemented the code yesterday to stop analyzing connections with the skip_further_processing bif and it caused Bro to peak using more memory on the tracefile I was using it with than not stopping analysis of connections. One thing the SSL scripts are currently doing that I probably need to change is after logging the SSL log, I should probably do "delete c$ssl". The certificate and certificate chain are stored in there. Actually, as I think about it more that's probably most of the problem.

skip_further_processing() actually just sets the skip flag which means that no further data is delivered to the analyzers, but the analyzers aren't removed.
The disable_analyzer() bif will actually remove the analyzer, however it needs an analyzer_id, so the SSL analyzer would somehow need to add it's analyzer_id to one of it's events.

We may want to look into the real traffic implications of calling the skip_further_processing bif eventually too though. I was pretty disheartened to see more memory used from calling that than not calling it. Perhaps it results in more memory use to remember which connections to ignore? I suppose I wasn't checking completion time which is probably where the savings should mostly come from.

How much difference in memory usage did you see? And how much memory usage do you see in general? My Bros usually need about 100-300MB and with SSL I will eventually get to 1GB or more (not on all nodes and it will often take hours until it starts get there).
(Disabling SSL altogether will reduce the "baseline" usage by about 25%--50%)

However, it still puzzles me that skip_further_processing didn't help. In my case memory consumption was ramping up over several minutes so either data is still delivered to the analyzer during this ramp-up or there are additional reasons for the memory consumption?

cu
Gregor