Data truncated in signature_match and misc questions

Hi all,

   I'm playing a little bit with Bro and I ran into some issues and I
   don't know whether these are either bugs or things I don't do in the
   proper way. Maybe you guys can help me out :slight_smile:

   1. Basically, I'm trying to do something (apparently :-)) very
      simple: matching any stream whose carrying a sequence of bytes
      of length X. For simplicity, lets say that I just want to match
      any stream which contains at least AAAA.
      
      Stream reassembly is very important for me, but I suppose Bro
      takes care of it when matching against signatures.

      I'm aware the data argument returned to the signature_match event
      handler should contain the part of the data that matched... and
      that's where things got weird (I would have preferred to leverage
      on signature_match events than instead of digging into the
      policy).

      Consider this signature:

         signature test-AAAA
         {
            event "sig-AAAA"
            payload /.*AAAA/
         }

      and this policy file for the signature_match even handler:

         @load signatures
         #@load print-filter

         event signature_match(state: signature_state, msg: string, data:
         string)
         {

            print fmt("[+] signature_match(%s) called", msg);
            print fmt("payload length: %d", byte_len(data));
            print fmt("payload (first 400 bytes): %s", sub_bytes(data, 0, 400));
         }

      The output I got is the following:

      [+] signature_match(sig-AAAA) called
      payload length: 153
      payload (first 400 bytes): HTTP/1.1 404 Not Found^M^JDate: Wed, 28
      May 2008 22:07:28 GMT^M^JServer: Apache^M^JContent-Length:
      270^M^JConnection: close^M^JContent-Type: text/html...

      I don't see any AAAA in there... even if that's the payload which
      triggered the signature of course (as shown by tcpdump as well -- not
      included here).

      The point is that I'd like to extract any matching pattern from the
      payload which triggered the signature. Once the pattern is extracted
      I'd have to iterate over each element of the string do something.

      This was a dead end to me (but I'm surely missing some point,
      tho).

      I also tried with a payload of /.*A{4}/ and /.*[A]{4}/ as I wanted
      to check whether the metacharacters {} worked properly or not. It
      turned out they are ok here (signatures) but they don't work, for
      instance, with gsub.

   2. Does tcp_contents reassembles flows (I don't think so)? I'd use
      tcp_contents right away, but I'd just want to be sure I've no
      splitted matching payload (e.g., AA in one TCP segment and the
      next AA in the second one). That's why I wanted to go with the
      signature thing as this should be automatically taken care of by
      Bro. If the signature approach doesn't work out, tho, I've to
      reassemble packets by myself but it seems to me a waisting of
      times as Bro surely does it (or not?).

   3. I'm not able to see packets that are generated by the same host
      Bro is running on. Is this a normal behavior (performance tuning)?
      If so, is there a way to disable it just for testing purposes?

      I double-checked that the filters were right, of course :-). I ran
      Bro with -f 'tcp' (I'm not concerned about UDP right now, even tho
      I'll consider it later on). Also, I played with capture_filters
      and restrict_filters variables either by refining or redefining
      them.

      Just to be sure I loaded print-filter to re-check the capture
      filter was indeed the one I intended to. It was (tcp). Still, I'm
      not able to get traffic that's sent by the same host where Bro is
      running on (I've a very basic configuration. Only one interface
      eth0 and localnets is set properly with just one local net addr,
      having just one physical net device).

   4. Regex works weird. It seems that {} notation, especially when
      used in conjuction with [^] sometimes works but other doesn't. For
      instance, it doesn't work with gsub (if I didn't screw anything
      up, of course). Any ideas? For instance, something like:

         local tmp = gsub(payload, /[^A]{4}/, " ");

      doesn't work while the {} metachars worked for signature matching.

   I know, lots of questions :slight_smile:

TIA, bye
Lorenzo

A quick comment: when sending along questions/puzzles like these, it
*really* helps to include either full working scripts and/or traces that
you used to demonstrate the problems. Otherwise we wind up just guessing
what might be going on if a similar test case works fine for us.

    Vern

(I think I lost track which of these are already solved (if any) so
I'll just repsonse briefly. Please ask again for the points which
aren't yet clear. Also, when having several independent questions,
it's usually easier to keep track if you mail them to list
separately.)

      Stream reassembly is very important for me, but I suppose Bro
      takes care of it when matching against signatures.

That's right.

      that's where things got weird (I would have preferred to leverage
      on signature_match events than instead of digging into the
      policy).

(Not sure what you're referring to here.)

      [+] signature_match(sig-AAAA) called
      payload length: 153
      payload (first 400 bytes): HTTP/1.1 404 Not Found^M^JDate: Wed, 28
      May 2008 22:07:28 GMT^M^JServer: Apache^M^JContent-Length:
      270^M^JConnection: close^M^JContent-Type: text/html...

      I don't see any AAAA in there... even if that's the payload which

Where in the packet stream is the AAAA (according to tcpdump)?

Generally, the data passed to the signature_match event is the last
chunk of bytes which triggered the signature match. This might
include only parts of the text which matches the pattern and in some
cases even none at all (when other conditions are involved as well;
I wouldn't expect this with your signature).

As Vern wrote, it's easiest to track down the specifics of what
you're seeing if you could send a small trace file along with the
signature.

      The point is that I'd like to extract any matching pattern from the
      payload which triggered the signature. Once the pattern is extracted
      I'd have to iterate over each element of the string do something.

That's not really possible without building further script-level
infrastructure yourself. The problem here is that Bro does not
buffer the connection's payload internally so when you get a
signature match, you don't have access to any earlier data. You'd
need to do this buffering yourself but it depends on the specifics
of your application whether that it feasible.

      to check whether the metacharacters {} worked properly or not. It
      turned out they are ok here (signatures) but they don't work, for
      instance, with gsub.

Again, a trace and sample script would be good which demonstrates
the gsub problem.

   2. Does tcp_contents reassembles flows (I don't think so)? I'd use

It does (though see the options dpd_* in policy/bro.init for
specifics of when Bro reassembles streams).

   3. I'm not able to see packets that are generated by the same host
      Bro is running on. Is this a normal behavior (performance tuning)?
      If so, is there a way to disable it just for testing purposes?

That's as OS issue. Iirc, you indeed don't see packets generated
locally on some OSs, though I don't remember the details here. You
can check with tcpdump whether libpcap applications like Bro see the
packets.

Robin

Hi Robin,

(I think I lost track which of these are already solved (if any) so
I'll just repsonse briefly. Please ask again for the points which
aren't yet clear. Also, when having several independent questions,
it's usually easier to keep track if you mail them to list
separately.)

   Yes, you are right... that would have been much better, my bad :-\
   I suppose it's too late for this "thread", tho (If not, let me know
   and I'll split the questions in different emails ;-))

   This is again a long email but I'm trying to give as much information
   as possible to allow you to reproduce what I got...

> Stream reassembly is very important for me, but I suppose Bro
> takes care of it when matching against signatures.

That's right.

> that's where things got weird (I would have preferred to leverage
> on signature_match events than instead of digging into the
> policy).

(Not sure what you're referring to here.)

   I meant I would have preferred to go for a signature-matching
   approach that reassembles streams for me, than to go for a solution
   which requires me to keep track of streams (with tcp_contents), for
   instance. However, as I read at the end of this email, maybe
   tcp_contents would do as well without requiring extra work to do...

> [+] signature_match(sig-AAAA) called
> payload length: 153
> payload (first 400 bytes): HTTP/1.1 404 Not Found^M^JDate: Wed, 28
> May 2008 22:07:28 GMT^M^JServer: Apache^M^JContent-Length:
> 270^M^JConnection: close^M^JContent-Type: text/html...
>
> I don't see any AAAA in there... even if that's the payload which

Where in the packet stream is the AAAA (according to tcpdump)?

Generally, the data passed to the signature_match event is the last
chunk of bytes which triggered the signature match. This might
include only parts of the text which matches the pattern and in some
cases even none at all (when other conditions are involved as well;
I wouldn't expect this with your signature).

As Vern wrote, it's easiest to track down the specifics of what
you're seeing if you could send a small trace file along with the
signature.

   Here's a summary:

      + HTTP request (I know, nc would have probably been cleaner,
         telnet output omitted)

         $ telnet security.dico.unimi.it 80
         GET /AAAA HTTP/1.0
         <nl>
         <nl>
         
      + packet trace, herein attached (trace.out) has been obtained as

         # tcpdump -s 1500 -w trace.out -ni eth0 tcp and host security.dico.unimi.it

      + the trace contains 13 TCP segments and the pattern AAAA appears
         in the 4th,5th segments (request) and in the 11th (response).

      + the Bro signature is stored in test.sig and (as in the previous
         email), looks like the following:

            signature test-AAAA
            {
               event "sig-AAAA"
               payload /.*A{4}/
            }

         pls, note that I tried also with the pattern /.*AAAA/ -- The
         reason for this is given ahead when providing an example of a
         regex pattern which doesn't work properly (as expected) with a
         Bro built-in function (e.g., gsub).

      + Bro (v1.2.1) has been launched as:

            # bro -r trace.out -s test.sig hostname

         where hostname is the file which loads all the relevant Bro
         policy scripts. It loads test-sig.bro as well which implements
         the signature_match event handler:

            @load alarm
            @load conn
            @load adu
            @load signatures
            #@load print-filter

            redef capture_filters = { };
            redef restrict_filters = { ["idea"] = "host security.dico.unimi.it" };
            redef restrict_filters += { ["tcp"] = "tcp" };

            event signature_match(state: signature_state, msg: string, data: string)
            {

               print fmt("[+] signature_match(%s) called", msg);
               print fmt("[+] payload length: %d", byte_len(data));
               print fmt("[+] payload (retr w/ sub_bytes(data, 0, 400)):\n%s", sub_bytes(data, 0, 400));
            }

      + Bro gives the following output:

         [+] signature_match(sig-AAAA) called
         [+] payload length: 148
         [+] payload (retr w/ sub_bytes(data, 0, 400)):
         HTTP/1.1 404 Not Found^M^JDate: Fri, 30 May 2008 22:20:38 GMT^M^JServer: Apache/1.3.34 (Debian) PHP/4.4.4-8+etch4 mod_ssl/2.8.25 OpenSSL/0.9.8c^M...

   So, at the end:

      + The data returned by the matched signature doesn't contain the
         signature itself.

      + The signature is not triggered for the request I made. The
         request has been issued by the same host where Bro is running on. Of
         course, I'm able to get and see everything with tcpdump and
         wireshark. Just for completeness, I'm running a GNU/Debian
         Linux testing system, libpcap version 0.9.8-3 (debian package),
         tcpdump version 3.9.8 (debian package), wireshark version 1.0.0
         (debian package).

> The point is that I'd like to extract any matching pattern from the
> payload which triggered the signature. Once the pattern is extracted
> I'd have to iterate over each element of the string do something.

That's not really possible without building further script-level
infrastructure yourself. The problem here is that Bro does not
buffer the connection's payload internally so when you get a
signature match, you don't have access to any earlier data. You'd
need to do this buffering yourself but it depends on the specifics
of your application whether that it feasible.

   I know that I cannot automatically extract a pattern by using the
   signature-matching approach. That I've to do it on my own by writing
   a Bro policy script. I did that and it works. It means that I'm able
   to extract the pattern I want from a given payload Bro passes me. Of
   course, if Bro doesn't give me this pattern, no extraction is
   possible (see previous points).

> to check whether the metacharacters {} worked properly or not. It
> turned out they are ok here (signatures) but they don't work, for
> instance, with gsub.

Again, a trace and sample script would be good which demonstrates
the gsub problem.

   Same scenario as above. Signature mechanism turned off. I'm loading
   the following Bro policy script which uses ADU (note: if I got it
   right, ADU should work only with HTTP-based communication. Although
   this is not my final goal -- and I'll go with tcp_contents or with
   the signature approach -- the example has been tested on an
   HTTP-based communication). Script follows:

      @load adu
      #@load print-filter

      redef tcp_content_deliver_all_orig = T;
      redef tcp_content_deliver_all_resp = T;
      redef adu::adu_max_size = 20000;

      redef capture_filters = { };
      redef restrict_filters = { ["idea"] = "host security.dico.unimi.it" };
      
      function do_handle_adu(ename: string, c: connection, a: adu::adu_state)
      {

         local candidate = /(.|\n|\r)*A{4,}/;

         if ( |a$adu| > 0 ) {

            local substrtmp: string;
            local tmp: string;

            if ( candidate in a$adu ) {

               print fmt("[+] %s: candidate in a$adu", ename);
               print fmt("[+] orig payload:\n%s", a$adu);

               tmp = gsub(a$adu, /[^A]{4,}/, " ");
               print fmt("[+] tmp ([^A]{4,} regex): %s\n", tmp);

               tmp = gsub(a$adu, /[^A]+/, " ");
               print fmt("[+] tmp: ([^A]+ regex) %s\n", tmp);

               tmp = gsub(a$adu, /[^AAAA]/, " ");
               print fmt("[+] tmp: ([^AAAA] regex) %s\n", tmp);
            }
         }
      }

      event adu_tx(c: connection, a: adu::adu_state)
      {

         do_handle_adu("adu_tx", c, a);
      }

      event adu_rx(c: connection, a: adu::adu_state)
      {

         do_handle_adu("adu_rx", c, a);
      }

   The output is attached in the file test-adu.output. While I do
   understand the last output ([^AAAA] regex) and the second one
   ([^A]+), I don't quite understand the first one ([^A]{4,}) as I would
   expect everything but the four consecutive A to be substituted with
   spaces. I might have messed up with regex, tho...

> 2. Does tcp_contents reassembles flows (I don't think so)? I'd use

It does (though see the options dpd_* in policy/bro.init for
specifics of when Bro reassembles streams).

   Cool, thanks. So, would handling tcp_contents be enough (given the
   right dpd_* tuning) if I'm just interested in looking for any sequence
   of, say, AAAA in the a TCP stream?

> 3. I'm not able to see packets that are generated by the same host
> Bro is running on. Is this a normal behavior (performance tuning)?
> If so, is there a way to disable it just for testing purposes?

That's as OS issue. Iirc, you indeed don't see packets generated
locally on some OSs, though I don't remember the details here. You
can check with tcpdump whether libpcap applications like Bro see the
packets.

   I have no issues at all when using other libpcap-based application on
   my system (GNU/Debian testing Linux -- details about versions are
   given above).

TIA, bye
Lorenzo

trace.out (1.66 KB)

test-adu.output (1.24 KB)