bloomfilter_counting_init parameterization ?

So I am trying to use bloomfilter_counting_init for keeping a count of uniq IPs seen within a subnet and instead of relying on a table or a set, I was toying with an idea of using bloomfilter_counting_init.

However, I am not clear on the parameterization below:

global bloomfilter_counting_init: function(k: count , cells: count , max: count , name: string &default=""): opaque of bloomfilter ;

What should be the length of the cells for storing 65536 IPs ?

Is k=3 a good value or I need something else ? Could someone elaborate on how to decide these parameters.

I looked at /btest/bifs/bloomfilter.bro but not quite clear.

thanks,
Aashish

global bloomfilter_counting_init: function(k: count , cells: count ,
max: count , name: string &default=""): opaque of bloomfilter ;

The counting Bloomfilter is very similar to a regular Bloom filter,
except that the underlying bit vector now consists of "cells," i.e.,
sequences of bits that represents a counter. With 4 bits per cells, you
can count from 0 to 2^4-1 = 15. Consider this scenario:

Matthias,

I am encountering some big tables in my scan-detection heuristics and which grow due to scanners:

So was thinking of this possibility to use counting bloomfilters instead of tables and sets. After-all we are still looking for cardinality of tables and sets for identifying scanners.

for example:

1) global distinct_peers: table[addr] of set[addr]

then ....
.....

  if ( orig !in distinct_peers )
    distinct_peers[orig] = set() &mergeable;

  if ( resp !in distinct_peers[orig] )
    add distinct_peers[orig][resp];

  local n = |distinct_peers[orig]|;

and if n > N - its a scanner !!!

SO I was wondering can the following be somehow represented as combinations of counting bloomfilters:

  1) global distinct_peers: table[addr] of set[addr]

  and/or
  
  2) global distinct_backscatter_peers: table[addr] of table[port] of set[addr]

Aashish

Here is an example proof-of-concept policy of what I am tryig to explore:

======================= bloom-scan.bro ==========

module Scan;

global src: opaque of bloomfilter ;
global dst_port: opaque of bloomfilter ;

event bro_init()
{

        src = bloomfilter_counting_init(3, 128, 100000000);
        dst_port = bloomfilter_counting_init(3, 128, 100000000);
}

function check_bloom (c: connection)
{

        local orig = c$id$orig_h;
        local resp = c$id$resp_h ;
        local resp_p = c$id$resp_p ;

        if (resp_p == 40884/tcp || resp_p == 40876/tcp)
                return ;

        bloomfilter_add (src, orig);
        bloomfilter_add (dst_port, fmt("%s%s", resp, resp_p));

        local src_counts = bloomfilter_lookup(src, orig) ;
        local dst_counts = bloomfilter_lookup(dst_port, fmt("%s%s", resp, resp_p)) ;

  #### idea here is that a remote scanner is going to be hitting a lot of local hosts
  #### so footprint (conn counts of the remote scanner) is going to be dis-propotionate to
  ### footprint of local host+port

        if (src_counts > 30 && dst_counts < 5)
                print fmt ("possible_scanner: %s -> %s on %s ( counts: %s, %s)", orig, resp, resp_p, src_counts, dst_counts);

}

event partial_connection(c: connection)
       {
       Scan::check_bloom(c);
       }

event connection_attempt(c: connection)
       {
       Scan::check_bloom(c);
       }

event connection_half_finished(c: connection)
       {
       # Half connections never were "established", so do scan-checking here.
       Scan::check_bloom(c);
       }

event connection_rejected(c: connection)
       {
       Scan::check_bloom(c);
       }

event connection_reset(c: connection)
       {
               Scan::check_bloom(c);
       }

event connection_pending(c: connection)
       {
       if ( c$orig$state == TCP_PARTIAL && c$resp$state == TCP_INACTIVE )
               Scan::check_bloom(c);
       }

Nevermind my email!

I found: src/probabilistic/cardinality-counter.bif

Thanks,
Aashish