help regarding using bro on application-level byte stream

Hi,

I have a question regarding running Bro on a application-level TCP byte stream, and was wondering which implementation option to choose. Any help is much appreciated! Details below.

I have access to a application-level byte stream (eg: say, a http session consisting of http put and get packets) that I would like to run Bro on it in an online fashion (I specifically plan to use its trace anonymization capabilities). I do not have access to the corresponding TCP byte stream / IP byte stream, but I do have the TCP state information required (source/dest addr, source/dest port). I am wondering how to have Bro process these packets. I can think of the following ways by reading the various docs, but am not sure whether there is anything else I have missed.

  1. Cook up fake link-layer, TCP,IP headers, and feed Bro via a FIFO.

  2. Use Brocolli to send really low-level events (events being “so and so bytes seen on so and so conn”). These events have to be low-level because I am trying to minimize any application-specific parsing before sending to Bro.

  3. Use the Bro source code directly, and somehow instantiate an analyzer directly on the byte-stream. Any state needed (such as connection endpoints) have to be cooked up.

After reading the source code and various docs, I am tending towards (3), since it won’t have the performance hit of a FIFO/broccoli, but am wondering whether the state is seperable enough for me to do this.

Thanks in advance, and if anything is not clear, please let me know,

Jayanth

capabilities). I do not have access to the corresponding TCP byte stream /
IP byte stream, but I do have the TCP state information required
(source/dest addr, source/dest port). I am wondering how to have Bro process
these packets.

Uh, that's a tricky situation!

1. Cook up fake link-layer, TCP,IP headers, and feed Bro via a FIFO.

That seems to be the easiest option for an implementation as you
wouldn't need to dive into Bro but could write the conversion
completely externally. Also, with tools like tcpdump etc. you could
quickly see if things look like they're supposed to. However, I'm
not sure I fully understand in which format your input is in
exactly, so not sure how easy it would be to turn it into fake
packets (e.g., is it already reassembeled or still packetized?).

2. Use Brocolli to send really low-level events (events being "so and so
bytes seen on so and so conn").

Won't really work because Bro doesn't have any events which are so
low-level. All its events are coming out of the packet/payload
analysis, they aren't any which provide input for it. (You could add
some of your on to feed your data into Bro protocol processing via
Broccoli but that wouldn't be too different from faking packets as
in (1).)

3. Use the Bro source code directly, and somehow instantiate an analyzer
directly on the byte-stream. Any state needed (such as connection endpoints)
have to be cooked up.

That's an interesting thought. I don't have an immediate opinion on
how difficult this would be. My guess is that you'd quickly be
running into lots of subtle problems with lacking the state you need
to keep the analysis going and which is hard to cook up. That said,
if you're game to dive into Bro's internals for such a solution, you
could just give it a try. However, I wouldn't spend too much time on
it if it turns out to get problematic (and again at lot of this
depends on how *exactly* your input looks like).

One other thought: which applications are you interested in? If it's
only a few and there happen to be binpac analyzers for them, you
could write a standalone program feeding your data into these binpac
analyzers.

Final note: you mention that you want to rewrite the content: I'm
not very familiar with that part of Bro but I'm guessing it also has
quite a few dependencies on having packets as input.

Robin

Hi Robin,

Thanks for the quick reply!

Just for more context: What I have is a application-level byte-stream in both directions which is already re-assembled and sequenced. I would like to use the trace anonymization (by Ruoming Pang et. al.) which strips out user-sensitive information from a given trace according to a user-provided script. I also need to do this in an online fashion.

  1. Cook up fake link-layer, TCP,IP headers, and feed Bro via a FIFO.

That seems to be the easiest option for an implementation as you
wouldn’t need to dive into Bro but could write the conversion
completely externally. Also, with tools like tcpdump etc. you could
quickly see if things look like they’re supposed to. However, I’m
not sure I fully understand in which format your input is in
exactly, so not sure how easy it would be to turn it into fake
packets (e.g., is it already reassembeled or still packetized?).

Well, actually cooking up the fake headers should be simple, since my data stream is already reassembled, and only needs to wrapped up in the appropriate TCP and IP headers, along with some fake SYNs, SYNACKs, and FINs. I didn’t really like the idea of cooking up fake stuff, since I don’t really want Bro to do analysis on these fake headers. But, as you say, this is probably the simplest option for me.

  1. Use Brocolli to send really low-level events (events being “so and so
    bytes seen on so and so conn”).

Won’t really work because Bro doesn’t have any events which are so
low-level. All its events are coming out of the packet/payload
analysis, they aren’t any which provide input for it. (You could add
some of your on to feed your data into Bro protocol processing via
Broccoli but that wouldn’t be too different from faking packets as
in (1).)

Oh, I see.

  1. Use the Bro source code directly, and somehow instantiate an analyzer
    directly on the byte-stream. Any state needed (such as connection endpoints)
    have to be cooked up.

That’s an interesting thought. I don’t have an immediate opinion on
how difficult this would be. My guess is that you’d quickly be
running into lots of subtle problems with lacking the state you need
to keep the analysis going and which is hard to cook up. That said,
if you’re game to dive into Bro’s internals for such a solution, you
could just give it a try. However, I wouldn’t spend too much time on
it if it turns out to get problematic (and again at lot of this
depends on how exactly your input looks like).

Oh, I see. I have been nosing around the source code to figure this out, and the new DPD framework seems fairly subtle to get right. As you say, I will probably do this for some more time, and then go to the fake header option.

One other thought: which applications are you interested in? If it’s
only a few and there happen to be binpac analyzers for them, you
could write a standalone program feeding your data into these binpac
analyzers.

Well, I would like it to be as general as possible (since the application-level stream is coming from a decrypted SSL connection, which may be in use by any application), which is why I thought of leveraging Bro’s broad support rather than BinPac support. Also, the anonymization script (by Pang et al) relies on the event processing of Bro, and so again, I need to run the trace through Bro to get those events.

Final note: you mention that you want to rewrite the content: I’m
not very familiar with that part of Bro but I’m guessing it also has
quite a few dependencies on having packets as input.

Yes, Pang’s scripts maintain a lot of application-level state in doing the anonymization, which is why I need to run them through Bro.

Once again, thanks for the quick reply.

Thanks,
Jayanth