I'm the process of devising Bro's new Bloom filter interface and would
like to have your feedback on the BiFs. This is what I came up with so
far:
(1) bloomfilter_init(fp: double, capacity: count, max: count): opaque
of bloomfilter
(2) bloomfilter_add(bf: opaque of bloomfilter, x: any)
(3) bloomfilter_lookup(bf: opaque of bloomfilter, x: any) : count
(4) bloomfilter_merge(bf1: opaque of bloomfilter, bf2: opaque of
bloomfilter) : bool
Initialization involves (1), where "fp" represents the desired
false-positive rate and "capacity" the maximum number of unique
elements the bloom filter supports until "fp" can no longer be
guaranteed. The "max" argument will default to 1 and represents the
maximum counter value one can attach to each element.
To add an element, one uses (2). The first use of this function
"locks" the Bloom filter type to the value type of "x". Subsequent
invocations of this function require that the type of "x" matches the
value from the very first invocation.
Lookup using (3) works analogous to (2). The return value represents
the counter associated with "x". For basic membership queries (max=1)
the return value takes on only 0 and 1.
One can also merge related Bloom filters using (4). This requires that
"bf1" and "bf2" have the internal types as well as parameterization,
as the union of two Bloom filters needs to equi-sized underlying bit
vectors.
Matthias