new IPv6 code

In topic/v6-addr, I finished switching Bro to enable IPv6 support by default and changing the internal representation to use classes. I tried to do some initial rough benchmarking to compare it versus master and master with --enable-brov6. Tests were run on a 16GB pcap (no IPv6 traffic) with local.bro loaded.

master
    120574kb vsize, 207.65s user, 20.36s system

master --enable-brov6
    126458kb vsize, 307.38s user, 20.87s system
    +5% mem, +44% cpu

topic/v6-addr
    121142kb vsize, 309.33s user, 20.62s system
    +0.5%mem, +45% cpu

About 65% of the time difference in either the IPv6-supporting configurations looks to be due to the fact that the size of Conn::Key exceeds UHASH_KEY_SIZE and so falls back on the slower HMAC/MD5 when creating a HashKey of the data, which happens for every packet.

The negligible memory usage difference surprised me since they differ from previous results here: http://tracker.bro-ids.org/bro/ticket/68. Maybe one possibility is that the scripts have changed so much since those benchmarks were run such that there's less state being tracked that involves addrs. Or the pcap I used was likely not as robust.

Thoughts on how to improve the hashing or other plans to investigate memory usage?

And does it seem like a good idea to merge topic/v6-addr into master at this point so that we get more users exercising that code and so we can start working in separate branches for more IPv6 enhancements? All the unit tests pass, but the external baselines need another set of eyes to sanity check them.

+Jon

topic/v6-addr
   121142kb vsize, 309.33s user, 20.62s system
   +0.5%mem, +45% cpu

The memory looks awesome, but that CPU increase really bothers me.

About 65% of the time difference in either the IPv6-supporting configurations looks to be due to the fact that the size of Conn::Key exceeds UHASH_KEY_SIZE and so falls back on the slower HMAC/MD5 when creating a HashKey of the data, which happens for every packet.

Can we increase the size of the H3 hash keys? I don't know enough about it to know what the ramifications of it would be or if it's even possible (but it seems to me like it should be possible from a quick read of h3.h).

Maybe one possibility is that the scripts have changed so much since those benchmarks were run such that there's less state being tracked that involves addrs.

I suspect that's the case.

And does it seem like a good idea to merge topic/v6-addr into master

I'd like to get it in there asap as long as there aren't any huge problems (the cpu utilization does still bother me a bit though).

  .Seth

The negligible memory usage difference surprised me since they differ from previous results here: http://tracker.bro-ids.org/bro/ticket/68. Maybe one possibility is that the scripts have changed so much since those benchmarks were run such that there's less state being tracked that involves addrs. Or the pcap I used was likely not as robust.

I think it's quite likely that we just keep more script-level state in general so that the addresses don't have as much an effect.

If you want to verify that memory usage is indeed going to stay low, I would run master and topic/v6-addr on the same live traffic for a day or so.

cu
Gregor

Very interesting, thanks!

I need to look more closely but a quick note: most of the IPAddr
methods should probably be inlined to avoid the function call
overhead. That might help a bit with the CPU.

Robin

Another quick note: I'd also like to to merge this in asap, but if you
want to start working on other IPv6 code already, just branch from the
topic branch. That should then merge into master easily later as well.

Robin

I need to look more closely but a quick note: most of the IPAddr
methods should probably be inlined to avoid the function call
overhead. That might help a bit with the CPU.

Yes, that closed the difference between --enable-brov6 and topic/v6-addr.

Cool, that sounds much better. Let me play with it a bit next week and
if I see similar results on our traffic here, I'll merge it in and we
can then see if we can tune it further.

Robin

I've now run it on about 9GB of traffic. The surprising thing is that
I don't see much of a difference at all between non-v6 master and the
branch:

master wo/ --enable-brov6

4m43.56s real 4m27.10s user 15.32s sys
243020 maximum resident set size

master w/ --enable-brov6

4m55.97s real 4m42.04s user 12.71s sys
249376 maximum resident set size

topic/v6-addr

4m46.32s real 4m30.37s user 14.77s sys
244148 maximum resident set size

This is on an x86_64 Linux. Jon, what platform did you run your
comparision on?

Robin

Did you load the local.bro script? Jon said that he did and that should make a big difference in memory usage and performance. Also, we need to compare if you two populated the Site::local_nets variable since that should make an impact as well.

  .Seth

Did you load the local.bro script?

No, I didn't. I'll try that.

(But still, if the base config doesn't change at all, that's good!
Means that at least internally we won't have much of a problem.)

  usage and performance. Also, we need to compare if you two
  populated the Site::local_nets variable since that should make an
  impact as well.

Didn't do that either.

Robin

This is on an x86_64 Linux. Jon, what platform did you run your
comparision on?

Also x86_64 Linux, with local.bro loaded and nothing added to Site::local_nets.

+Jon

Even with local.bro and setting local networks I don't see much of a
difference in terms of CPU time and memory. That sounds almost too
good to be true ...

Robin

So should I go ahead and merge it into master?

Or, Seth, do you want to give it a try first as well?

Robin

I have been playing with it quietly and haven't seen any trouble. Unfortunately I haven't run on high volume live traffic yet.

I say merge away.

  .Seth

I'm going through the v6-addr branch. Good job with that!

A few questions:

- in BroValUnion, 'addr_val' is of type pointer type IPAddr*. I
  suppose that's to keep the union size as small as possible, which
  makes sense. I'd still be curious though how allocating the
  addresses dynamically compares against storing the instance directly
  inside the union. Have you tried that? I realize it's not an easy
  change to do that, so fine if not.

- The DNS binpac analyzer generates both dns_a6_reply and
  dns_aaaa_reply, but the standard analyzer only the latter (for both
  resource record types). What is correct?

- In the RPC code, there's this:

    is_mapped_dce_rpc_endpoint(const ConnID* id, TransportProto proto)
········{
········if ( id->dst_addr.family() == IPAddr::IPv6 )
················return false;
        ...
        }

   Does the protocol not support IPv6 at all or is that a "todo"?

- in DPM.cc, the ExpectedConn class: should it now store IPAddr
  instead of uint32[4] for orig and resp?

- Expr.cc, BinaryExpr::AddrFold: why still the uint32[4] here? If it's
  just to make the macro work, I'd just remove that and use
  comparision operators instead.

- IP.h, IP_Hdr: Do we need the new src_addr/dst_addr attributes? Why
  not construct on the fly out of ip4/ip6?

- PktSrc.cc:

    - protocol = (data[3] << 24) + (data[2] << 16) + (data[1] << 8) + data[0];
    + protocol = (data[0] << 24) + (data[1] << 16) + (data[2] << 8) + data[3];

  Does that mean this was buggy before?

Robin

(I did that). It seems like it was, but Jon pointed out that it may still be wrong if that 32bit int is supposed to be in a defined byte order. It was broken for me on Mac OS X at least with some IPv6 packets I had that used the loopback interface.

Jon did also point out that it probably shouldn't have been fixed in the v6 branch which made sense to me after I had already committed it. :slight_smile:

  .Seth

- in BroValUnion, 'addr_val' is of type pointer type IPAddr*. I
suppose that's to keep the union size as small as possible, which
makes sense. I'd still be curious though how allocating the
addresses dynamically compares against storing the instance directly
inside the union. Have you tried that? I realize it's not an easy
change to do that, so fine if not.

You mean having a IPAddr versus IPAddr* member in the union? I had started out that way, until the compiler told me union members cannot have non-trivial constructors. And I figured going with a pointer is better for size anyway as you mentioned, although the largest member in the old --enable-brov6 code was the subnets with five uint32's and that did not seem to increase memory as much in the tests as I expected.

- The DNS binpac analyzer generates both dns_a6_reply and
dns_aaaa_reply, but the standard analyzer only the latter (for both
resource record types). What is correct?

My thought was to generate both depending on the type, but I guess I forgot to do it for the non-binpac one. I'll fix it.

- In the RPC code, there's this:

   is_mapped_dce_rpc_endpoint(const ConnID* id, TransportProto proto)
········{
········if ( id->dst_addr.family() == IPAddr::IPv6 )
················return false;
       ...
       }

  Does the protocol not support IPv6 at all or is that a "todo"?

I'm not sure -- the old --enable-brov6 switch also bailed out for IPv6. So yeah, I suppose the todo is to determine how much more of a todo there is.

- in DPM.cc, the ExpectedConn class: should it now store IPAddr
instead of uint32[4] for orig and resp?

Yes, seems like it could do that.

- Expr.cc, BinaryExpr::AddrFold: why still the uint32[4] here? If it's
just to make the macro work, I'd just remove that and use
comparision operators instead.

Ok, will do that.

- IP.h, IP_Hdr: Do we need the new src_addr/dst_addr attributes? Why
not construct on the fly out of ip4/ip6?

I guess not. Hadn't thought of that as I was just replicating the face that the old --enable-brov6 code also added those. Will fix.

+Jon

Ah, good point. Then it's mood anyway.

Thanks,

Robin

Yeah. :slight_smile:

Sounds like I should leave this out when merging? Not much help to go
from broken to broken differently ...

Robin

Yep, it's pretty low priority anyway since it's only going to come into play when sniffing loopback.

  .Seth