Bloom filter interface

I'm the process of devising Bro's new Bloom filter interface and would
like to have your feedback on the BiFs. This is what I came up with so
far:

(1) bloomfilter_init(fp: double, capacity: count, max: count): opaque
of bloomfilter
(2) bloomfilter_add(bf: opaque of bloomfilter, x: any)
(3) bloomfilter_lookup(bf: opaque of bloomfilter, x: any) : count
(4) bloomfilter_merge(bf1: opaque of bloomfilter, bf2: opaque of
bloomfilter) : bool

Initialization involves (1), where "fp" represents the desired
false-positive rate and "capacity" the maximum number of unique
elements the bloom filter supports until "fp" can no longer be
guaranteed. The "max" argument will default to 1 and represents the
maximum counter value one can attach to each element.

To add an element, one uses (2). The first use of this function
"locks" the Bloom filter type to the value type of "x". Subsequent
invocations of this function require that the type of "x" matches the
value from the very first invocation.

Lookup using (3) works analogous to (2). The return value represents
the counter associated with "x". For basic membership queries (max=1)
the return value takes on only 0 and 1.

One can also merge related Bloom filters using (4). This requires that
"bf1" and "bf2" have the internal types as well as parameterization,
as the union of two Bloom filters needs to equi-sized underlying bit
vectors.

    Matthias

I already talked to Matthias but for the record, this looks good to
me.

One more question:


Could we just make the existing copy bif work with it? Now that I know how that's implemented internally though I'm already skeptical of the idea. :slight_smile:

  .Seth

Is this merging bf2 into bf1? Would it make sense to return a new
"opaque of bloomfilter" instead?

I was aiming for non-mutable semantics: the function leaves both bf1
and bf2 untouched, but merely returns bf1 + bf2, where "+" means union
of the underlying bit vectors. If this turns out to be a memory
problem after profiling, we could still add a void-returning in-place
merge function.

That said, we could also use the name "bloomfilter_unify" or
"bloomfilter_union", as we may end up with intersection at some point
(although I do not have a use case for it yet).

     Matthias

So it sounds like having the merge function returning bool was an accident? Seems like we're all on the same page. :slight_smile:

  .Seth

So it sounds like having the merge function returning bool was an accident?

Exactly, well-spotted!

    Matthias