How to count concurrent connections

Hi,

I am currently trying to count concurrent connections. I'd like to use
a script like this:

    redef ignore_checksums = T;
    redef capture_filters += { ["tcp-setup"] = "tcp" };

    global conncounter_file = open_log_file ("conncounter");
    global total_conn_count = 0;
    global concurrent_conn_count = 0;
    
    event connection_established (c: connection) {
            ++total_conn_count;
            ++concurrent_conn_count;
            if (total_conn_count % 1000 == 0) {
                    print conncounter_file, fmt ("%.06f total: %08d max concurrent: %d",
                          network_time(), total_conn_count, concurrent_conn_count);
            }
    }
    
    event connection_state_removed (c: connection) {
            --concurrent_conn_count;
    }

However, the numbers I get soon become negative resp. I get a runtime
error - counter negative. A quick check showed me that
connection_state_removed gets thrown up to four times per connection
in only the first few minutes of my trace.

I then tried to replace connection_state_removed() with
connection_reset() and connection_finished(). However I am not
convinced this is enough because even after more then 90 minutes trace
time concurrent_conn_count is still increasing significantly (~1300
per minute on a 1 Gig uplink).

So my question now is: which events are thrown when exactly? Do I have
to track the established connections in the scripting layer? Is there
a way to just query for the size of the bro-internal connection
tracker?

BTW: I am using a header trace. In my opinion this shouldn't make a
difference, but maybe ...

Thanks for help!
  Bernhard

So my question now is: which events are thrown when exactly?

The event you're looking for is new_connection(). That one is raised
for all connections for which Bro instantiates internal state, i.e.,
it's the counterpart of connection_state_remove().

The other connection events are only raised for a subset of all
connections. connection_established() for those with a full 3w
handshake, connection_finished() for regular tear-downs,
connection_reset() for connection aborted with a reset, etc.

Is there
a way to just query for the size of the bro-internal connection
tracker?

Actually there is: the built-in resource_usage() returns a record
which, among other stuff, contains the numbers of TCP, UDP, ICMP
connections in memory. Caveat: I'm just realizing that this
reporting doesn't take the connection-compressor into account, which
means that by default the values will be too small for TCP
connections. Turning off the compressor with
use_connection_compressor=F will fix that for the cost of some
performance decrease (both cpu and memory).

BTW: I am using a header trace. In my opinion this shouldn't make a
difference, but maybe ...

No, it shouldn't.

Robin

The event you're looking for is new_connection(). That one is raised
for all connections for which Bro instantiates internal state, i.e.,
it's the counterpart of connection_state_remove().

No, it is not :slight_smile: I only want fully established tcp connections. I
tried out new_connection() however, and it gives me about 8 times more
connections than there are fully-established tcp-connections (450k vs.
60k). By the way, I got my numbers now by using
connection_established() to detect new connections,
connection_state_remove() for decreasing the counter and a set of
conn_id to ensure that a connection is removed only once. The price -
of course - is the memory consumption of the extra table.

Actually there is: the built-in resource_usage() returns a record
which, among other stuff, contains the numbers of TCP, UDP, ICMP
connections in memory.

I tried out the built-in resource_usage() as well, it gives pretty
much the same results as the new_connection() approach:

1184669769.879156 total: 00116000 concurrent: 63310 max_TCP_conns: 63311 num_TCP_conns: 63310
1184669770.121984 total: 00117000 concurrent: 63796 max_TCP_conns: 63797 num_TCP_conns: 63796
1184669770.398366 total: 00118000 concurrent: 64256 max_TCP_conns: 64256 num_TCP_conns: 64256

However, sometimes, odd things happen. Like here, where
resource_usage()$max_TCP_conns almost doubles for a short period of
time (this is still in the startup phase):

1184669770.658614 total: 00119000 concurrent: 64683 max_TCP_conns: 64684 num_TCP_conns: 64683
1184669770.969641 total: 00120000 concurrent: 65106 max_TCP_conns: 73977 num_TCP_conns: 65106
1184669771.274491 total: 00121000 concurrent: 65511 max_TCP_conns: 83514 num_TCP_conns: 65511
1184669771.570219 total: 00122000 concurrent: 65973 max_TCP_conns: 93163 num_TCP_conns: 65973
1184669771.870853 total: 00123000 concurrent: 66452 max_TCP_conns: 102929 num_TCP_conns: 66452
1184669772.109635 total: 00124000 concurrent: 66873 max_TCP_conns: 112785 num_TCP_conns: 66873
1184669772.382840 total: 00125000 concurrent: 67299 max_TCP_conns: 122752 num_TCP_conns: 67299
1184669772.672518 total: 00126000 concurrent: 67767 max_TCP_conns: 67768 num_TCP_conns: 67767

After looking into the code this seems to happen exactly when the
underlying PDict object does a table resize.

Bye,
  Bernhard

No, it is not :slight_smile: I only want fully established tcp connections. I

(Ok, sorry, I misunderstood you and thought that you do want to
count all connections.)

connection_established() to detect new connections,
connection_state_remove() for decreasing the counter and a set of
conn_id to ensure that a connection is removed only once.

Yeah, that's actually the best approach then.

I tried out the built-in resource_usage() as well, it gives pretty
much the same results as the new_connection() approach:

Right, that makes sense because it also counts all currently active
connection independent of their state.

1184669772.382840 total: 00125000 concurrent: 67299 max_TCP_conns: 122752 num_TCP_conns: 67299
1184669772.672518 total: 00126000 concurrent: 67767 max_TCP_conns: 67768 num_TCP_conns: 67767

After looking into the code this seems to happen exactly when the
underlying PDict object does a table resize.

Yepp, that looks indeed like a bug in the accounting code. While the
resize is on its way, there are actually two tables kept internally
and it seems the code calculates the max size wrong during this
time.

Robin