Decapsulating "payload" tunnels

Jon and I have been working on the 2.1 tunnel decapsulation recently and we encountered some major architectural questions. We seem to have the groundwork laid for doing IP encapsulation tunnels (AYIYA, Teredo, 6to4), but I want to support tunnels like SOCKS and HTTP CONNECT which are essentially session payload tunnels since they are tunneling reassembled TCP streams.

This brings up a problem if we want to create logs that are useful forensically because right now any connection to a SOCKS proxy looks like the client is sending all the traffic to the proxy. The HTTP logs will show the client doing HTTP requests to the proxy even though the proxy is really sending them onward to other hosts. In environments with pervasive proxying, this makes the logs much less useful.

Robin, Jon, and I discussed this for a while yesterday and we came up with a proposal where we would extract the payload from the proxy connection and mock up IP headers for the Sessions::DoNextPacket method which looks like the client connecting to the host it's requesting to actually talk to. We would need to extend the DoNextPacket method to provide a short circuit for skipping the TCP reassembly and analysis since it would be reassembled payload bytes immediately after the fake IP header. This would result in two connections showing up in conn.log when there was *really*

There is one other niggle in this. It seems that most proxy protocols (SOCKS and HTTP at least) support requesting a proxy connection by name instead of IP address. I fully expect to be beat up over this, but I think it would be great to be able to support doing a lookup to create the fake ip header.

I'm sure we'll end up sticking configuration options all over the place to turn things off and we'll definitely figure out a good set of things to turn on by default.

Does anyone have reservations with this design? It definitely seems nasty on some levels and Robin pointed out yesterday that it would probably be much better to pass data around with abstracted metadata instead of packets, but packets are what we deal with internally for now so that's what we would have to fake without doing a major redesign.

Robin, Jon: please follow up if there are any points that I didn't make clear enough. :slight_smile:

.Seth

oops…

Thinking about this some more, below's an idea how we could structure
it. It's messy, but we don't have much of a way around that without
doing some major restructuring. But this would at least encapsulate
the messieness somewhat. Note, I haven't fully thought this through,
so there might be more stumling blocks; there often are some
dependencies internally that are hard to spot before starting to work
on the code ...

That said, how about this:

We create a new class TunnelConnection that encapsulates that all the
messy stuff. Interface could look something liek this:

    class TunnelConnection {
        // Associate a (fake) conn ID with the tunnel.
        TunnelConnection(ConnID id, Connection *parent, <whatever else we need>);

        // Feed data in for parsing.
        void NextStream(<payload data>) // See below.

        [... probably more methods ...]

    private:
        Conn* fake_conn;
        Conn* parent;
    };

The TunnelConnection internally creates a new (fake) Connection
object, stores it, and uses it for all the parsing when it needs a
Connection object. But we don't store that Connection in the normal
session tables.

Instead, NetSessions gets a new method:

    TunnelConnection* NewTunnelConnection(ConnID id, <give it what it needs>);

The higher-level analyzers that decapsulate the tunnel use
NewTunnelConnection() to get a tunnel and then feed data in via
NextStream(). That method does whatever's necessary to pass data to
the parsers, faking IP packets if necessary (but see below).

NetSessions tracks all TunnelConnections in their own dictionary
(similar to tcp_conns, udp_conns, icmp_conns) and handles state
management (i.e., removes if the parent connection goes away).

As Seth suggested, we should short-circuit the tunnel analysis to skip
the transport-layer where we don't have one. I'm not totally sure how
to do that best, but one option would be internally add an new TUNNEL
transport-layer besides the standard TCP/UDP/ICMP ones (drawback:
there are a number of locations that currently expect to not see other
transport-layers than the current set).

About faking IP packets: we may not be able to avoid that---but we can
try. :slight_smile: The stream-based Analyzer interface doesn't need a packet,
just data chunks. We might be able to directly feed in there.[1]
(that's why the method above is called NextStream() :).

Robin

[1] Without further work this would break signature matching though.

We create a new class TunnelConnection that encapsulates that all the
messy stuff. Interface could look something liek this:

I think that makes a lot of sense.

one option would be internally add an new TUNNEL
transport-layer besides the standard TCP/UDP/ICMP ones (drawback:
there are a number of locations that currently expect to not see other
transport-layers than the current set).

I think we'll need to be doing this before too long for SCTP anyway so becoming familiar with how painful this could be might not even be such a bad thing in the long run.

The stream-based Analyzer interface doesn't need a packet,
just data chunks. We might be able to directly feed in there.[1]
(that's why the method above is called NextStream() :).

Hah! It's as if someone had been thinking about this eventuality from the beginning. :slight_smile:

[1] Without further work this would break signature matching though.

Oh, good point. We really need signatures so that DPD would work on the proxied data. Are you thinking that it would mostly break the TCP semantics of the signatures? I suspect that we'd be able to statically set some flags for "established" and "tcp".

Another approach to consider might be to back away from using ip-proto in signatures. If SCTP does ever gain traction it would greatly complicate many signatures relying on the specific transport protocol. We could just indicate connection-oriented or packet-oriented signatures.

  .Seth

Hah! It's as if someone had been thinking about this eventuality from
the beginning. :slight_smile:

Who might that have been? :slight_smile:

Oh, good point. We really need signatures so that DPD would work on
the proxied data. Are you thinking that it would mostly break the TCP
semantics of the signatures?

The signature engine uses the initial packet of a connection to
initialize state. Can't tell off the top of my head if we can easily
get around that. In the worst case, we'd need to fake a packet just
for that.

Another approach to consider might be to back away from using ip-proto
in signatures. If SCTP does ever gain traction it would greatly
complicate many signatures relying on the specific transport protocol.
We could just indicate connection-oriented or packet-oriented
signatures.

Would prefer to avoid the latter as it's not the signature that
determines whether matching is packet- or stream-orientedd (but the
transport protocol in use itself). The ip-proto doesn't do anything
else than mathcing the corresponding IP field and using it is
primarily an optimization to avoid payload matching when possible. So
just skipping it is fine I'd think.

Robin

I can't say that I like this idea of conflating tunneling and proxies, which IMO are two very different things. Sometimes abstracting and refactoring makes sense, especially when it can get us two things for less work. I don't think that is the case here, though. It seems to confuse things further and force us to make uncomfortable hacks.

Also, I'm not seeing how this would work in the default case of someone tapping at their border, between the proxy and the target http server. I don't think there is enough information there without access to the internal state of the proxy server. If you were tapping both internal and WAN traffic in different spots as we do, you would have the information maybe. However, I would rather do something in scriptland on the cluster that would correlate connections between internal hosts and the proxy and connections from the proxy to external web servers.

If tunnels and proxies were more semantically similar, I think I would be more onboard. But right now I think we should separate them and just work on handling AYIYA, 6to4 and Torredo tunnels well. In any case, I don't want to make 2.1 depend on solving these problems for proxies. They are a lower priority in my mind, and I think we want to avoid rushing on how we handle proxies.

:Adam