handle out of order and retransmitted packets in offline trace

Hello, All

I am trying to use the policy script http-rewriter.bro in Bro-1.5.1 to anonymize the HTTP message-body of all HTTP packets in a big dumped trace larger than 100GB ( http-rewriter.bro actually deletes all HTTP message-body and add one new header field named X-Actual-Data-Length, right?) .

I am not sure if Bro itself and http-rewriter.bro has the ability of reordering all tcp packets and deleting tcp retransmitted packets in every connection of the dumped trace?

If they cannot do that, whether I can reorder all packets and delete the retransmitted packets in every connection first by using some tools and then use http-rewriter.bro ? Is this way reasonable? What’s your suggestion about the tools I can use?

Besides, I want to test if special HTTP packets exist. Special packet here means there are more than one HTTP construct(headers + message body) in one packet. When using http-rewriter.bro on several special pakcets I created, it seems that it can delete the message-body correctly for almost all of cases as long as the packets in the connection are in order and complete. Can http-rewriter.bro handle the special cases correctly as what I found?

Expect your answer and thank you very much.

Song Zhao

Hi,

I forgot about the details, but here is the basic idea. The rewritten packets will not reproduce the original TCP segment ordering and retransmission, however, the timestamps will be preserved by creating one output packet that correspond to every input packet timestamp. So if you remove a big chunk of body, you will see a bunch of empty packets (which compress quite well).

Ruoming

Hi,

Thanks for your explanation.But I am still a little confused. Can http-rewriter.bro rewriter all HTTP packets in a TCP connection where out of order and retransmitted packets exist?

Song Zhao

http-rewriter.bro sits above TCP layer and does not see TCP retransmission or out of order packets.

Can Bro itself differentiate these retransmitted and out of order packets? If yes, does http-rewriter.bro use such a Bro’s method?

Besides, can http-rewriter.bro handle the special HTTP packet which, for example, includes 2 or more requests or response or even one and half requests or responses?

Can Bro itself differentiate these retransmitted and out of order packets?

It's not clear what you mean by differentiate. Bro reassembles the
TCP bytestream, correctly acounting for retransmitted and out-of-order
packets.

Besides, can http-rewriter.bro handle the special HTTP packet which, for
example, includes 2 or more requests or response or even one and half
requests or responses?

Per Ruoming's earlier comment, http-rewriter.bro does *not* operate on
individual packets, it operates on the reassembled bytestream. It then
constructs new packets from that bytestream. The timing of these packets
reflects the timing of the original packets, but the *sequencing* of the
packets does not.

    Vern

Is that function of reassembling TCP bytestream embedded in event engine and enabled by default when using http-rewriter.bro, or there is a policy script we need to call to sort out the tcp packets? Thanks.

Is that function of reassembling TCP bytestream embedded in event engine and
enabled by default when using http-rewriter.bro

It's fundamental to how the event engine works.

    Vern

Hi, all

Sorry to bother you guys again. I still have some very basic questions about Bro and http-rewriter.bro.

  1. Is the command to use http-rewriter.bro on captured offline trace is as follows?
    ./bro -r ‘the name of tracefile we want to deal with’ http-rewriter.bro - w 'the name of tracefile where we want to write the resulting packets

  2. If question 1 is yes, will this command call event engine to reassemble the TCP bytestrem (reorder out of order packets and delete retranmitted packets) in the captured trace and then event engine will provide the ressambled byte stream to the upper level where http-rewriter.bro can rewrite them?

  3. Whether http-rewriter.bro and event engine can deal with a big trace(about 400GB) correctly which is merged by several traces?

Expect your answer and thank you very much.

Song

Add one more question:

  1. If the command I use is as the one in question 1, which kind of packets would be filtered? Only TCP packets, right? If so, which ports the packets use would be filtered?
    According to codes of http.bro, global http_ports are 80,81,631,1080,3138,8000,8080 and 8888.
    However, when checking the big trace rewritten by the command in question 1, majority of them are using 20480. Is port 20480 also an http port? Besides, there are still a small portion with port numbers diffrent from all above. So I am confused with the filteration of http-rewriter.bro.

Thanks for your help.

Song Zhao

1. Is the command to use http-rewriter.bro on captured offline trace is as
follows?
     ./bro -r 'the name of tracefile we want to deal with' http-rewriter.bro
- w 'the name of tracefile where we want to write the resulting packets

It's -A, not -w.

2. If question 1 is yes, will this command call event engine to reassemble
the TCP bytestrem (reorder out of order packets and delete retranmitted
packets) in the captured trace and then event engine will provide the
ressambled byte stream to the upper level where http-rewriter.bro can
rewrite them?

Yep.

3. Whether http-rewriter.bro and event engine can deal with a big
trace(about 400GB) correctly which is merged by several traces?

It should be able to, though that code hasn't been stressed all that
much and might wind up having a memory leak (or simply memory that
doesn't get reclaimed), which could cause it to blow up on a really
big input.

4. If the command I use is as the one in question 1, which kind of packets
would be filtered? Only TCP packets, right? If so, which ports the packets
use would be filtered?

http-rewriter loads http-reply.bro, which specifies the filter as:

  tcp src port 80 or tcp src port 8080 or tcp src port 8000

According to codes of http.bro, global http_ports are
80,81,631,1080,3138,8000,8080 and 8888.

Note, that list is used only if you turn on DPD.

However, when checking the big trace rewritten by the command in question 1,
majority of them are using 20480. Is port 20480 also an http port?

Well, other than 80, none of them is a standardized HTTP port. But you
can add 20480 to the list in http-reply.bro to ensure it's captured.

Besides,
there are still a small portion with port numbers diffrent from all above.
So I am confused with the filteration of http-rewriter.bro.

Then in principle you should use DPD. However, I don't know whether
it's integrated with the rewriting framework.

    Vern

  1. Is the command to use http-rewriter.bro on captured offline trace is as
    follows?
    ./bro -r ‘the name of tracefile we want to deal with’ http-rewriter.bro
  • w 'the name of tracefile where we want to write the resulting packets

It’s -A, not -w.

Will there be any difference between -A and -w for the use of http-rewriter.bro? I just used -A to rewrite some examples and it seems that the resulting files are the same as those one using - w.

According to codes of http.bro, global http_ports are
80,81,631,1080,3138,8000,8080 and 8888.

Note, that list is used only if you turn on DPD.

Besides,
there are still a small portion with port numbers diffrent from all above.
So I am confused with the filteration of http-rewriter.bro.

Then in principle you should use DPD. However, I don’t know whether
it’s integrated with the rewriting framework.

The command I used is only " ./bro -r readfile http-rewriter.bro -w writerfile. I don’t know if DPD is turned on. Actually, http.bro is loaded by http-request.bro, which is also loaded by http-reply.bro. In http.bro, I think there are codes about DPD as follows:

DPM configuration.

global http_ports = {
80/tcp, 81/tcp, 631/tcp, 1080/tcp, 3138/tcp,
8000/tcp, 8080/tcp, 8888/tcp,
};
redef dpd_config += { [ANALYZER_HTTP] = [$ports = http_ports] };
redef dpd_config += { [ANALYZER_HTTP_BINPAC] = [$ports = http_ports] };
Dose it mean DPD has been integrated within the rewriting framework? And whether it is reason why the majority of rewritten trace I got is from port 20480 and also from some ports other than 80,8000,8080?

Thanks a lot.

Song

  1. Is the command to use http-rewriter.bro on captured offline trace is as
    follows?
    ./bro -r ‘the name of tracefile we want to deal with’ http-rewriter.bro
  • w 'the name of tracefile where we want to write the resulting packets

It’s -A, not -w.

Will there be any difference between -A and -w for the use of http-rewriter.bro? I just used -A to rewrite some examples and it seems that the resulting files are the same as those one using - w.

According to codes of http.bro, global http_ports are
80,81,631,1080,3138,8000,8080 and 8888.

Note, that list is used only if you turn on DPD.

Besides,
there are still a small portion with port numbers diffrent from all above.
So I am confused with the filteration of http-rewriter.bro.

Then in principle you should use DPD. However, I don’t know whether
it’s integrated with the rewriting framework.

The command I used is only " ./bro -r readfile http-rewriter.bro -w writerfile.

I’m not sure if it still matters, but one used to need to special all options before arguments, so try:

./bro -r readfile -A writerfile http-rewriter.bro

Hi ,Ruoming

I also tried ./bro -r readfile -A writerfile http-rewriter.bro, whose results seem to be the same as those of ./bro -r readfile http-rewriter.bro -A writefile. And is there any difference of the resulting trace between using -A and - w for http-rewriter.bro? I tried some examples and their results seem the same.

Does http-rewriter.bro by default use DPD to find http streams intead of port numbers?
After rewriting a big trace which insists of all kind of streams(TCP and UDP) using http-rewriter.bro, the ports of the resulting trace range widely, including 80,8000,8080,631,1080 and so forth. Interestingly, majority of them are port 20480. Is it because of use of DPD?

Thanks.

Song Zhao

I also tried ./bro -r readfile -A writerfile http-rewriter.bro, whose
results seem to be the same as those of ./bro -r readfile http-rewriter.bro
-A writefile. And is there any difference of the resulting trace between
using -A and - w for http-rewriter.bro?

If you specify both, then you get the untransformed trace in the -w file
and the transformed one in -A. If you specify just one, then that's the
transformed file.

Does http-rewriter.bro by default use DPD to find http streams intead of
port numbers?

I don't know. But you can avoid this question by just wiring in the
ports of interest into the initialization of capture_filters in http-reply.bro.

Interestingly, majority of
them are port 20480.

Note, 20480 = 80 but little endian. This suggests either a bug in how
you're viewing the port numbers, or in how Bro is displaying (or possibly
processing them).

    Vern

Hi,

In the 12G rewritten trace, the port numbers range widely. http-rewriter.bro loads http-reply.bro,which loads http-request.bro,which loads http.bro. The codes about filteration in these policy scripts are as follows:

In http-request.bro:
redef capture_filters += {
[“http-request”] = “tcp dst port 80 or tcp dst port 8080 or tcp dst port 8000”
};
In http-reply.bro:
redef capture_filters += {
[“http-reply”] = “tcp src port 80 or tcp src port 8080 or tcp src port 8000”
};
In http.bro:

DPM configuration.

global http_ports = {
80/tcp, 81/tcp, 631/tcp, 1080/tcp, 3138/tcp,
8000/tcp, 8080/tcp, 8888/tcp,
};
redef dpd_config += { [ANALYZER_HTTP] = [$ports = http_ports] };
redef dpd_config += { [ANALYZER_HTTP_BINPAC] = [$ports = http_ports] };

Any of them sets DPD on? If not, why the port numbers in the rewritten trace range so widely, which range much more widely than the range of global http_ports?
I didn’t load dpd.bro anywhere. After checking the payloads roughly, as far as I found, they all contain HTTP requests or responses. I mean they are really “HTTP streams” whatever the port number is.

Thanks
Song Zhao

Hi, all

I found the reason why the majority of the port numbers in the rewritten trace is 20480 instead of 80 is that in the library <netinet/tcp.h> the variables representing source port and destination port(th_sport and th_dport) don’t store the real port numbers as I think. Actually, it stores port 80 as 20480 and it stores other port numbers differently from they are supposed to be. Anyone knows the reason? Is it a kind of one to one mapping? Or I made a mistake on using it?

Thanks
Song

Actually, it stores
port 80 as 20480 and it stores other port numbers differently from they are
supposed to be. Anyone knows the reason? Is it a kind of one to one mapping?

As I already told you via private email, you are looking at the little-endian
version of 80 rather than the big-endian.

    Vern