connection_established behavior

This is related to this ticket: http://tracker.icir.org/bro/ticket/285

I am fairly lost on this, hopefully someone will have insight into the behavior. In the following snippet of code, there's a strange behavior (maybe intentional?) where a packet with SYN/ACK flags set and no other packets related to the connection are seen, the connection_established event will be generated.

====CODE====
      if ( peer->state == TCP_ENDPOINT_SYN_SENT )
        peer->SetState(TCP_ENDPOINT_ESTABLISHED);
      else if ( peer->state == TCP_ENDPOINT_INACTIVE )
        {
        // If we were to ignore SYNs and
        // only instantiate state on SYN
        // acks, then we'd do:
        // peer->SetState(TCP_ENDPOINT_ESTABLISHED);
        // here.
        Weird("unsolicited_SYN_response");
        }

      endpoint->SetState(TCP_ENDPOINT_ESTABLISHED);
====END CODE====

This only seems to be true when the connection compressor is disabled though. The connection compressor *seems* to prevent this effect. I'll include steps to reproduce the problem here and I'll attach the example tracefile to this email...

===POLICY SCRIPT (test.bro)===
@load conn
event connection_established(c: connection)
  {
  print fmt("gah! there shouldn't be a connection established (%s)", id_string(c$id));
  }
===END POLICY SCRIPT===

Here's the output with a connection established that shouldn't happen:
[seth@Blake build (master)]$ ./src/bro -C -f "ip" -r connection_established-problem.trace test use_connection_compressor=F
1285716061.336160 weird: unsolicited_SYN_response
gah! there shouldn't be a connection established (128.146.242.61/3072 > 121.254.178.6/http)

Here's the (lack of) output with the connection compressor enabled:
[seth@Blake build (master)]$ ./src/bro -C -f "ip" -r connection_established-problem.trace test use_connection_compressor=T
[seth@Blake build (master)]$

I also disabled the checksum validation because the TCP header has an invalid checksum in the packet for some reason.

Sorry for the long email, but any thoughts on the behavior of the sans connection compressor behavior?

  .Seth

connection_established-problem.trace (100 Bytes)

The reason is why the behavior differs is, that the "connection" has
only one packet and thus the ConnectionCompressor will not instantiate
the full Connection. If there were a packet from the other endpoint
after the SYN/ACK, then ConnectionCompressor would instantiate a
connection and you should see the same behavior when the packets are
passed to the TCP Analyzer (i.e., you would the Weird and the
connection_established event)

cu
Gregor

I guess then I'm interested to hear everyone's take on this due to spoofed syn packets (remote machines at other sites responding to spoofed traffic appearing to come from your address space). If there is nothing listening at the site's address that a spurious syn-ack is directed, then you could see a lot of connection_established events generated if you aren't running the connection compressor due to no rst packet being sent back. I would expect most sites would be running the connection compressor when running on live traffic, but they seem to be having this issue at OSU and as best I can tell, they're running the connection compressor. I'm still trying to replicate the problem exactly as they're seeing it though. Hopefully I have some more information on this soon.

The problem with this for the guys at OSU is if the connection_established event ends up being generated and the responder is on a list of known-bad sites then they get an alert about a connection to a botnet or something when there wasn't really a connection at all, just a spurious syn-ack.

  .Seth

I haven't yet read this thread closely, so this may be off the mark,
but a couple of quick comments (before I disappear for a while):

  - Instantiating on SYN ACK came about due to coping with Bro
    deployments with split routing, such that they never saw
    initial SYNs for some connections.

  - One way to deal with backscatter would be to inspect $history
    to see that no initial SYN was ever seen.

  - A more principled way to deal with it would be to fix Bro's
    basic notion of connection states. When I designed it, I didn't
    realize that all sorts of non-compliant states could arise (e.g.,
    connections for which only one side is seen). In principle,
    Bro should stop trying to follow the RFC 793 notion of TCP states,
    and instead go with an empirical set. $history allows this but
    in an implicit fashion, rather than with explicit states. The
    latter would be better, though it's not clear to me that it's
    really worth the work.

- Vern

  - Instantiating on SYN ACK came about due to coping with Bro
    deployments with split routing, such that they never saw
    initial SYNs for some connections.

Ah, ok.

  In principle,
    Bro should stop trying to follow the RFC 793 notion of TCP states,
    and instead go with an empirical set.

Thanks for the backstory on this, it makes this much clearer for me. Perhaps I'll file a ticket for someone to look into doing this for 1.7.

The only thing that is still nagging me is that the behavior is different with the connection compressor than it is without it. Does it make sense to do anything to make the connection_compressor and non connection_compressor scenarios end up with the same result?

.Seth

  - Instantiating on SYN ACK came about due to coping with Bro
    deployments with split routing, such that they never saw
    initial SYNs for some connections.

Isn't this controlled by this option:

    # If true, instantiate connection state when a SYN ack is seen
    # but not the initial SYN (even if partial_connection_ok is false).
    const tcp_SYN_ack_ok = T &redef;

Perhaps we should just change the default?

    and instead go with an empirical set. $history allows this but
    in an implicit fashion, rather than with explicit states. The
    latter would be better, though it's not clear to me that it's
    really worth the work.

Not for the time being I would say, as this would be quite a majro
change. However, at some point, we should definitlu look more
closely at the TCP code.

Robin

Ah! I should have looked. I didn't even know that about that option. It would say that it's worthwhile to change the default because it would at least make the output the same with and without connection compressor (and not changing any other settings).

  .Seth

> const tcp_SYN_ack_ok = T &redef;
>
> Perhaps we should just change the default?

Ah! I should have looked. I didn't even know that about that option.

Yeah, I'd forgotten about it too. FWIW, I believe that was added for Bro
deployments with high packet loss rates, since they will sometimes miss
the initial SYN but not the SYN-ACK.

    Vern

  - Instantiating on SYN ACK came about due to coping with Bro
    deployments with split routing, such that they never saw
    initial SYNs for some connections.

Isn't this controlled by this option:

    # If true, instantiate connection state when a SYN ack is seen
    # but not the initial SYN (even if partial_connection_ok is false).
    const tcp_SYN_ack_ok = T &redef;

Perhaps we should just change the default?

Then you would only instantiate on connections with a full handshake.
So, I would keep it as it is!
(It only takes effect if partial_connection_ok=F)

Changing this option wouldn't necessarily change event generation
either, I think. We just have to unify the way the
connection_established event is generated in TCP.cc and Connection
Compressor.

    and instead go with an empirical set. $history allows this but
    in an implicit fashion, rather than with explicit states. The
    latter would be better, though it's not clear to me that it's
    really worth the work.

Not for the time being I would say, as this would be quite a majro
change. However, at some point, we should definitlu look more
closely at the TCP code.

Yeah. That's basically a rewrite of most of the TCP Analyzer. Probably
something to consider at some point though.

cu
gregor

> Perhaps we should just change the default?

Then you would only instantiate on connections with a full handshake.

Right, that was the intent of the original question, wasn't it?

Changing this option wouldn't necessarily change event generation
either, I think.

I think it would. If there's no connection state, there's no
connection_established.

We just have to unify the way the connection_established event is
generated in TCP.cc and Connection Compressor.

Note that the CC doesn't generate connection_established itself. It
acts as a filter: once it lets packets through, TCP.cc will generate
the event.

Also, I'm thinking there might actually be a bug in the CC here (or
at least semantics not originally intended): it has an option
cc_handle_only_syns that, if on, tells it to take care only of
initial SYN packets and forward everything else (e.g., a stray data
packet) to normal processing. The motivation for this was to have
less difference to normal TCP processing when the cc is on, and it's
per default enabled (bro.init calls this mode "connection compressor
light").

However, it turns out that a SYN/ACK counts here and *is* processed
by the CC, rather than being passed on. If we would change that so
only pure SYNs were handled, we shouldn't see a difference between
cc on vs. off anymore in this case.

Robin

Perhaps we should just change the default?

Then you would only instantiate on connections with a full handshake.

Right, that was the intent of the original question, wasn't it?

No, it was about the connection established event, IIRC. In case the
conn compressor is disabled, TCP.cc would generate the
connection_established event on a pure SYN/ACK while the conn compressor
would not do this. (If the "connection" has other packets in it, the
conn compressor would instantiate a connection, pass it to TCP and thus
the connection_established event would be generated.)

The tcp_SYN_ack_ok and partial_connection_ok will prevent any connection
without full handshake to be processed by bro. I don't think that we
want this as default. I think that the current default, namely that bro
tries to parse partial connections is better.

Changing this option wouldn't necessarily change event generation
either, I think.

I think it would. If there's no connection state, there's no
connection_established.

Actually that's true, if there's no Connection instance, there's no TCP
Analyzer so no event can be genreated.

We just have to unify the way the connection_established event is
generated in TCP.cc and Connection Compressor.

Note that the CC doesn't generate connection_established itself. It
acts as a filter: once it lets packets through, TCP.cc will generate
the event.

Also, I'm thinking there might actually be a bug in the CC here (or
at least semantics not originally intended): it has an option
cc_handle_only_syns that, if on, tells it to take care only of
initial SYN packets and forward everything else (e.g., a stray data
packet) to normal processing. The motivation for this was to have
less difference to normal TCP processing when the cc is on, and it's
per default enabled (bro.init calls this mode "connection compressor
light").

Hmm. My intuition would have been that SYN/ACK also count here...

However, it turns out that a SYN/ACK counts here and *is* processed
by the CC, rather than being passed on. If we would change that so
only pure SYNs were handled, we shouldn't see a difference between
cc on vs. off anymore in this case.

Correct. However, one can also argue that the "instantiate if we see a
packet from each endpoint" makes sense in order to not instantiate on
scans or floods, but instantiate if we "just" miss part of the handshake.

I guess the questions are:

* what is the semantic of the connection_established even? If we've
  defined that, we should unify CC and non-CC behavior.

* what to do with partial connections or connections that start with a
  SYN/ACK (and that have further data packets).

* what to do with connections that only have a SYN/ACK.

cu
Gregor