$history extensions - zero windows, logarithmic counts

I'm working on two enhancements to the $history tracking for connections
that thought I'd tee them up for comments.

(1) A new history element, 'W'/'w', which means that a TCP receiver
    advertised a zero window, indicating that the corresponding process
    was unable to keep up with the incoming data. (This element is omitted
    in cases where zero windows aren't problematic: initial SYNs, and after
    FINs or RSTs.)

(2) A notion of "logarithmic counts" for history events: for certain
    events ('C' = checksum, 'T' = retransmission, and 'W' = zero window)
    the count is repeated on the 10th/100th/1000th/etc. occurrence. So a
    history value of 'ttt' means that the responder sent somewhere between
    100 and 999 retransmissions. This is useful because for large
    connections, a single checksum error, retransmission, or zero window
    is much less significant for analyzing performance issues than a whole
    bunch of these.

Comments?

    Vern

I really like those ideas, especially the logarithmic count.

How much would it cost to have an event fired when those thresholds are crossed?

I really like those ideas, especially the logarithmic count.

Cool :-).

How much would it cost to have an event fired when those thresholds are crossed?

Nice thought. I think it would be too expensive if done for the first
instance, but for each of the backed-off instances it ought to be rare
enough that it's not a problem. So maybe something like:

  ## Generated each time a reporting threshold (10, 100, 1000, ...)
  ## is crossed, starting with 10.
  event multiple_tcp_zero_windows(c: connection, is_orig: bool,
          threshold: count);
  event multiple_tcp_checksum_errors(c: connection, is_orig: bool,
          threshold: count);
  event multiple_tcp_retransmissions(c: connection, is_orig: bool,
          threshold: count);

?

    Vern

I think I like these, the only small concern I have is...

(2) A notion of "logarithmic counts" for history events: for certain
    events ('C' = checksum, 'T' = retransmission, and 'W' = zero window)
    the count is repeated on the 10th/100th/1000th/etc. occurrence. So a
    history value of 'ttt' means that the responder sent somewhere between
    100 and 999 retransmissions. This is useful because for large
    connections, a single checksum error, retransmission, or zero window
    is much less significant for analyzing performance issues than a whole
    bunch of these.

Here we will not have cases where some repetitions are logarithmic, and
some (like for R) are not. I guess that makes sense, but I can see it
potentially being confusing.

Johanna

Here we will not have cases where some repetitions are logarithmic, and
some (like for R) are not. I guess that makes sense, but I can see it
potentially being confusing.

Yeah, I chewed on that too, but I don't see a better solution. The semantics
of repeated R are different, too (per the recent $history thread, it entails
differing sequence numbers), so I think once that's the case, then it's
not all that much more confusing if the significance of a repetition has
different semantics too.

    Vern

I think this is a useful feature. I’m a bit unclear on the logarithmic counts. Take, for instance SaDtTtT. If I’m reading this correctly, I think that means 10-99 retransmissions from orig, followed by 10-99 from resp, then more retransmissions from orig (enough to reach a total of 100-999), and similarly more from resp. However, I could also interpret it as 10-99 from orig, 10-99 from resp, 10-99 from orig, 10-99 from resp.

Another question I had was that most of these are TCP-specific. Would checksum apply to UDP as well?

One downside of the logarithmic approach is that it makes it hard to search for, since searching for ‘t.*t’ means one thing for small conns, and another for large conns. As you say, if what I care about is the overall number compared to the number of packets, that feels more like a percentage. To me, it’d seem more natural to use something like “0t” means “of the total number of packets from the originator, 0-9% were retransmissions,” “1t” means 10-19%, etc.

What I’m left debating is whether adding numerical data to history is the right approach, though. missed_bytes is a separate field, but it feels similar. If we did something like the log approach for that, we’d lose exact counts, but we’d have granularity on the direction. Maybe we add the new letters, but don’t repeat them and also add new fields for exact bytecounts?

–Vlad

it unclear on the logarithmic
counts. Take, for instance SaDtTtT. If I'm reading this correctly, I think
that means 10-99 retransmissions from orig, followed by 10-99 from resp,
then more retransmissions from orig (enough to reach a total of 100-999),
and similarly more from resp.

Correct in principle. (1) These would be 1-9 followed by enough to
get to 10-99, since a single retransmission is already a 't' / 'T', and
(2) lower letters are responders rther than originators.

However, I could also interpret it as 10-99
from orig, 10-99 from resp, 10-99 from orig, 10-99 from resp.

Nope. The counter doesn't reset at any point, it's cumulative.

Another question I had was that most of these are TCP-specific. Would
checksum apply to UDP as well?

Right, it would apply to UDP too, just like is done presently for
the boolean indicator.

As you say, if what I care about is the overall
number compared to the number of packets, that feels more like a
percentage.

Well, I think this is yes-and-no. For one, the overall percentage might
be quite small and still have a large impact on what's supposed to be a
high-speed transfer - particularly if it means that a connection entered
an extented timeout-and-back-off - so I don't know if there would be a
natural point of inspection for it. (It could also quite large but no big
deal because the connection is a runt.)

To me, it'd seem more natural to use something like "0t" means
"of the total number of packets from the originator, 0-9% were
retransmissions," "1t" means 10-19%, etc.

I'm inclined to wait on refinements like this. Let's first see whether
having log-counter-style histories leaves people wanting more before
qualitatively changing the nature of the history field, or adding new
fields.

Maybe we add the
new letters, but don't repeat them and also add new fields for exact
bytecounts?

I'm not following this. If we add new letters that don't repeat *and* we
add new fields, why do we need the letters given that the fields are there?

    Vern

> it unclear on the logarithmic
> counts. Take, for instance SaDtTtT. If I'm reading this correctly, I
think
> that means 10-99 retransmissions from orig, followed by 10-99 from resp,
> then more retransmissions from orig (enough to reach a total of 100-999),
> and similarly more from resp.

Correct in principle. (1) These would be 1-9 followed by enough to
get to 10-99, since a single retransmission is already a 't' / 'T', and
(2) lower letters are responders rther than originators.

Ah, right. Thanks for clearing that up.

> Maybe we add the
> new letters, but don't repeat them and also add new fields for exact
> bytecounts?

I'm not following this. If we add new letters that don't repeat *and* we
add new fields, why do we need the letters given that the fields are there?

My thought for this was simply if it mattered *where* in the state history
the trouble occurred. For instance, if I'm seeing retransmissions at the
very end of a connection, that might indicate that one side abruptly
terminated the connection (I'd see this with things like fail2ban inserting
an iptables rule to block a brute-forcer). Similarly, if I see a zero
window at the start of a connection, that would tell me that the buffer was
full due to another connection or connections, as opposed to filling up due
the connection I'm looking at.

I'm having a tough time thinking up additional use-cases without having
some sample data, so perhaps the best course is to add what you proposed,
and then revisit it if we feel like anything's missing.

  --Vlad

My thought for this was simply if it mattered *where* in the state history
the trouble occurred.

I agree that it could ... but I think for at least some situations where
it does, for the logs to help in diagnosing them will require something
well beyond indicator flags. It's interesting to consider what these might
look like, but for now I'd like to get this simpler additional functionality
implemented, as I think it'll already be handy - not pointwise for diagnosing
specific connections, but as manifest more in aggregate, such as "gee when
we talk with a.b.0.0/16 we sure to rack up the checksum errors" or such.

I'm having a tough time thinking up additional use-cases without having
some sample data, so perhaps the best course is to add what you proposed,
and then revisit it if we feel like anything's missing.

Sounds good. I'll aim to have a branch that people can try out ready
in a bit.

    Vern