Returning local variables and garbage collection

Hi,

   I'm debugging a custom policy script that was causing Bro to use too
   much memory. The script has been stripped down just to do some
   logging and nothing more, but the memory usage is still pretty "high"
   i.e., WAY higher then with compared to having conn.bro loaded, for
   instance.

   The only particular thing the script is currently doing is just to
   return local variables. Basically, I'm considering them similar to
   ptr in C (but I might have misunderstood their semantic, tho). For
   instance, I've something like the following:

      type custom_conn_t: record {
         id: conn_id;
         # custom type involved here!
         ...
      };

      function conn_init(c: connection)
      {
         local __c: custom_conn_t;

         __c$id = c$id;
         # assign other __c's fields from c

         return __c;
      }

      function foobar(c: connection)
      {

         local __c: custom_conn_t;

         __c = conn_init(c);

         do_log(__c);
      }

      event X(c: connection)
      {
         foobar(c);
      }

   I'm monitoring live-traffic so it's pretty hard to provide a
   representative trace. However, conn.bro produces a low memory
   footprint (~50MB over 3/4 hrs) and it gets stable pretty soon. The
   aforementioned script reached 200MB in less time and it keeps
   growing.

   I'm just wondering what it happens when I return __c in conn_init().
   I'm expecting a new object to be created and the local one declared
   in conn_init to be destroyed. Then, eventually, whenever the newly
   created __c is not needed anymore (say, after do_log, or, however,
   after event X returns), I'd expect it to be free'd by the garbage
   collector.
   
   Or, is the object the same and just internal refcnt are increased or
   decreased? If so, it shouldn't really make any difference as refcnt
   should be going to 0 after X finishes.

   However, I'm experiencing and (almost linear) increasing memory
   consumption, and that's weird (bug?). Any idea?

TIA, bye
Lorenzo

   Or, is the object the same and just internal refcnt are increased or
   decreased? If so, it shouldn't really make any difference as refcnt
   should be going to 0 after X finishes.

This is indeed what happens. Non-atomic objects are passed around as
references, with reference counts adjusted as necessary.

   However, I'm experiencing and (almost linear) increasing memory
   consumption, and that's weird (bug?). Any idea?

Not sure right now, the code excerpts you showed look ok. One thing
to do is running with profiling.bro, that will let Bro generate a
file prof.log with various memory statistics. Feel free to send me
the output if it's too cryptic.

If that doesn't help, some leak checking/profiling could help
illuminating what's going on, see

Robin

P.S.: Are you creating any cyclic reference structures?

Robin,

> Or, is the object the same and just internal refcnt are increased or
> decreased? If so, it shouldn't really make any difference as refcnt
> should be going to 0 after X finishes.

This is indeed what happens. Non-atomic objects are passed around as
references, with reference counts adjusted as necessary.

   This is what I was thinking, so there shouldn't be any problem.

to do is running with profiling.bro, that will let Bro generate a
file prof.log with various memory statistics. Feel free to send me
the output if it's too cryptic.

   Perfect, thanks. I'm actually returning "custom" type, i.e., those
   for which Bro doesn't know anything about, internally. That is, there
   are no corresponding RecordVal declaration nor "initialization" by
   means of internal_type("...")->AsRecordType(). Could Bro mess things
   up if those are missing?

   In addition, the memory consumption lowered down when I removed the
   handler for connection_timeout (my code is called when a bunch of
   connection_* events are triggered). However, I just suppose this
   happens because less events of this type are raised.

P.S.: Are you creating any cyclic reference structures?

   I don't think so, but I'll double check.

TIA, bye
Lorenzo

Robin,
   Perfect, thanks. I'm actually returning "custom" type, i.e., those
   for which Bro doesn't know anything about, internally.

By "custom", you mean declared as a record on the script-level,
right? That's fine, Bro knows how to handle them when passing
around. The internal stuff ("internal_type()" etc.) is only needed
for types which the C++ accesses itself in some way (e.g., because
it wants to modify an instance of that type).

   In addition, the memory consumption lowered down when I removed the
   handler for connection_timeout

The prof.log output should indicate whether there are a lot of
connections hanging out in memory for some reason, which could
potentially be a problem.

Robin