Questions about TM cutoff implementation

I'm in the process of adding flow cutoff functionality to VAST, in the
same vein as we have in the TM. Several questions arose during the
implementation, and the TM paper did not help me getting answers:

    - Scope: what exactly is the flow size? Because TM uses the
      connection 5-tuple, I'd go for the cumulative transport-layer
      bytes. That is, for TCP/UDP the full payload size, and for ICMP
      the variable-size data portion. However, one could also imagine
      counting bytes starting at the network layer.

    - Directionality: is there a single cutoff value for both directions
      of the flow? Or does the cutoff mean N bytes per direction, so
      that the trace ends up with O(2N) bytes?

    - Eviction: how does TM evict old/stale entries? Naturally it makes
      sense to age out old entries upon connection termination. For TCP,
      it would be after seeing FIN or RST, and for ICMP maybe getting the
      response to a corresponding request. For UDP, there seems to be
      need for a timer-based approach.

      Moreover, the connection state table must not grow arbitrarily and
      be capped at some size. What is a typical number of concurrent
      flows at, say, UCB? In the case of scanning, adversarial activity,
      or more general any sudden increase in the flow count due to
      numerous (small) connections, we need to evict existing flow
      counters. Choosing a random entry would likely end up with one of
      the smaller flows being evicted, which seems like a robust
      stateless design. Alternatively, one could come up with a scheme
      that walks the flow table periodically (say every 100k packets)
      after it has reached its maximum size and then evict all entries
      less than a given threshold.

Any thoughts on the above points would be much appreciated.

     Matthias

[Forwarding on behlaf of Naoki - to the greater list as well]

Hi Matthias,

For at least the first point, the parts of the code to look at are Storage.cc and Fifo.cc . In Storage.cc, search for c->getSuspendCutoff(), and in Fifo.cc, search for the Fifo::addPkt method. I wrote some comments about those parts of the code a while back in my code base. To answer your first question, it looks like tm does connection cut-off by packet bytes. Specifically, the caplen parameter of pcap_pkthdr (look at Connection.cc or FifoMem.cc and search for tot_pktbytes+=header->caplen ), the amount of data available during capture.

Best,
Naoki

tot_pktbytes+=header->caplen ), the amount of data available during
capture.

Thanks for providing the entry points. It looks like the TM definition
of "flow cutoff" is to include the full size (header->caplen) as opposed
to just transport-layer payload bytes. It's a probably a matter of
taste, but I find it more intuitive to think about cutoff as
transport-layer bytes because the cutoff has fundamentally to do with a
connection on not packets. Conversely, artifacts such as TCP
retransmissions make it more complicated to reason about it in this way.

     Matthias

    - Scope: what exactly is the flow size? Because TM uses the
      connection 5-tuple, I'd go for the cumulative transport-layer
      bytes. That is, for TCP/UDP the full payload size, and for ICMP
      the variable-size data portion. However, one could also imagine
      counting bytes starting at the network layer.

The nice thing about heavy tails is that this sort of thing doesn't tend
to matter. Just pick a definition that's convenient and you'll get pretty
much the same results as for other (reasonable) definitions.

    - Directionality: is there a single cutoff value for both directions
      of the flow? Or does the cutoff mean N bytes per direction, so
      that the trace ends up with O(2N) bytes?

It would be good to do this per-direction, since it can often happen that
one direction exhibits a heavy tail but the other doesn't. In that case,
you can retain more information about the connection (in particular,
control messages) by managing the directions separately.

    - Eviction: how does TM evict old/stale entries? Naturally it makes
      sense to age out old entries upon connection termination. For TCP,
      it would be after seeing FIN or RST, and for ICMP maybe getting the
      response to a corresponding request. For UDP, there seems to be
      need for a timer-based approach.

You should say a bit more about what's meant by eviction or old/stale
entries. Is this regarding evicting in-memory state, or on-disk? For the
former, the heavy-tailed premise behind the Time Machine says it's fine
to flush something that hasn't done anything for a while.

      Moreover, the connection state table must not grow arbitrarily and
      be capped at some size. What is a typical number of concurrent
      flows at, say, UCB?

Here you need to tease apart pending flows from active flows, since
there can be many more of the former due to scanning and backscatter.
That's what motivated the "connection compressor", per
http://www.icir.org/vern/papers/high-volume-ccs04.pdf .
I believe the number of concurrent established/productive flows isn't
very high (since most flows tends to be quite short-lived), but
I don't have an order of magnitude figure handy.

In the case of scanning, adversarial activity,
      or more general any sudden increase in the flow count due to
      numerous (small) connections, we need to evict existing flow
      counters.

Why do you need to evict these early? What's the bottleneck stressor
that emerges if you don't? Is this about managing the flow table itself?

    Vern

It would be good to do this per-direction, since it can often happen that
one direction exhibits a heavy tail but the other doesn't.

Asymmetry is a very good point, I've switched to this notion.

You should say a bit more about what's meant by eviction or old/stale
entries.

Here, I mean the hash table that tracks per-connection state for
deciding when the cutoff has been reached. The state encompasses (i) the
last time a flow has been active and (ii) it's cumulative size. An entry
occupies o(8+8) bytes of state plus the size of the 5-tuple o(8+8+2+2+1)
in total, yielding o(37) bytes per entry modulo padding by the compiler.
I've set the default maximum table size to 1M entries.

> In the case of scanning, adversarial activity,
> or more general any sudden increase in the flow count due to
> numerous (small) connections, we need to evict existing flow
> counters.

Why do you need to evict these early? What's the bottleneck stressor
that emerges if you don't? Is this about managing the flow table itself?

The reason why I thought they should be evicted early, is that in the
case of scanning the corresponding flow table entries don't constitute
active connections and just make it reach its maximum size faster. Once
the maximum has been reached, I evict random entries. So I was thinking
it's perhaps worthwhile to reduce the probability of evicting active
flows by evicting inactive ones earlier.

But since you state that the number of active connections is much lower
and pending ones, the approach I currently take already works and
doesn't need any change: randomly evicting flow table elements will bite
the inactive flows with higher probability. (Even if we evicted an
active flow, the worst case is that we'll record twice the specified
cutoff.)

     Matthias