Hardware recommends

And on the heels of the NIC question, how about hardware experiences? I'm looking at the PCIE2 NIC's at both Myricom and Netronome....any recommends for the server hardware to wrap around these cards? The plan is to have this machine monitor a corporate LAN...lot's of traffic. Guessing the team will want to go Dell if that helps. Thanks for the advice all.

James

We’ve been doing the following:

Dell R630
2x Intel® Xeon® E5‐2687W v3 3.1GHz,25MCache,9.60GT/sQPI,Turbo,HT,10C/20T(160W)
128GB RAM
With whatever disk fits your needs. Our worker boxes are a mirrored pair of 120GB SSD. The manager node has slightly larger disk to handle 12h of storage. A Splunk forwarder ingests from the manager box for retention/analysis.

Most of this is in ‘dev’ right now, but we’ll be run around 7x 100GB sets by the end of the year following the Berkley model. Post-shunting we’ll be running Suricata on the traffic as well.

As a general rule, faster proc > more procs (Seth correct me here if this has changed!)

Of note - we ran into some problems with an AMD setup. Others can chime in with more detail, but the synopsis was that the AMD cores run at a slower clock speed and just can’t keep up with the amount of data coming in. For reference, it was a 64core AMD Opteron system w/128gb of ram. I’m in the process of horse trading the AMD box for a 32core Intel Xeon system that should be able to better keep up with the data coming in. We were pushing ~1.2gbs and the manager couldn’t keep up and would slowly consume all memory.

-Paul

Thanks for the great information all..it really does help.

James

The Bro architecture documents still seem to suggest you can only
process 80Mb/s or so of traffic per core, but even at 2.6 - 2.7 GHz you
end up getting closer to 250-300Mb/s+. 3.1 will boost this a bit and
allow you to handle slightly larger flows per core, but you may be able
to get many more cores on a single host at 2.6 for similar or less
money. I'd just be ware of optimizing too heavily on core count and
ending up with 1.8GHz clocks. If you are doing good flow shunting up
front I think you are likely to stray into the territory of more smaller
flows which probably lends itself better to having more moderately
clocked cores than fewer slightly higher clocked cores.

Also, unless you are considering ludicrous core counts per machine you
might find you are oversubscribing your server long before you can take
advantage of 40Gbps or 100Gbps NICs over a 10Gbps NIC. I've been fairly
happy with Intel 10Gbps NICs and PF_RING DNA/ZC, but some prefer to use
Myricom to avoid dealing with third party Intel NIC drivers. Be wary of
artificial worker limits using RSS or vendor provided host based
load-balancing (Myricom comes to mind). There are cases where folks have
been stuck with not being able to take full advantage of their server
hardware without having to run additional NICs, or do other work-arounds
due to having more cores than queues/rings on the cards.

On a side note: I found out a lot of interesting things about how my
sensors were performing, as well as my upstream load-balancer by using
Justin's statsd plugin (assuming your upstream shunting doesn't throw
off the output) to send the capture-loss script output to a time series
DB and graphing it. For example I discovered a port going to an
unrelated tool was becoming oversubscribed over the lunch hour causing
back-pressure on the load-balancer that translated to every worker on my
Bro cluster reporting 25-50% loss even though Bro should have been
seeing relatively little traffic and was itself not over-subscribed. In
that case I found it is sometimes desirable to have an extra 10G NIC in
each server so that the tool, not the load-balancer gets over-subscribed
until I can add more capacity to the tool and better spread the load.

Hah! Not really, it's pretty much still that as long as you still have a lot of cores. Like most things in life, you're searching for balance here. :slight_smile:

If I could order systems with 400 Xeon cores though, I'd be all over that!

.Seth

I'm working on improving the stats output of Bro now too so the 2.5 version will have lots of internal details in the stats.log that should provide a much better picture for people to see what's going on in their clusters (a couple of people have already sent me data and graphs which has been super exciting!). Capture loss is cool, but there is so much more data available that can really help you get a deeper understanding of what Bro is doing while it's running.

  .Seth

Which document? We should update that.

You should look carefully at the Intel manuals and spec sheets. Here I had a great success with the E5-2697 v3 (and older) Xeons. They seem slower, at 2.6Ghz but it’s the only Intel CPU that can overclock itself so much on all cores for arbitrary time.

When you read about the turbo mode, the number you get (for example 3.6Ghz) assumes a single core workload only - while all remaining cores stay idle.

What matters how high can the CPU go, on all cores at the same time.

For example sensors here are happy with 3Ghz!

You check it yourself in this Intel document

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf

Page 9

As for the capture cards Myricom does the job nicely. People had success with Intel cards, with or without the pfring ZC. Note, that Bro now supports Afpacket natively, without going through libpcap, in the most recent Tpacketv3 version.

And you need RAM. Lots. I have 128GB per sensor and 64 on the manager. Fortunately the next version of Bro will have much lower memory requirements for the manager. It’s not just Bro that uses a lot of memory here, my sensors with with a giant Myricom buffers, that’s why.

Turbostat results attached

Package Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt RAMWatt PKG_% RAM_%

      • 683 22.06 3096 2597 0 77.94 0.00 0.00 0.00 76 77 0.00 0.00 0.00 0.00 199.22 69.00 199.78 0.00
        0 0 0 1232 39.78 3097 2597 3229 60.22 0.00 0.00 0.00 51 54 0.00 0.00 0.00 0.00 97.59 35.13 99.90 0.00
        0 0 28 273 8.81 3097 2597 3229 91.19
        0 1 1 515 16.64 3097 2597 3229 83.36 0.00 0.00 0.00 51
        0 1 29 274 8.86 3097 2597 3229 91.14
        0 2 2 1315 42.47 3097 2597 3229 57.53 0.00 0.00 0.00 51
        0 2 30 280 9.04 3097 2597 3229 90.96
        0 3 3 536 17.30 3097 2597 3229 82.70 0.00 0.00 0.00 49
        0 3 31 275 8.86 3097 2597 3229 91.14
        0 4 4 842 27.18 3097 2597 3229 72.82 0.00 0.00 0.00 52
        0 4 32 274 8.85 3097 2597 3229 91.15
        0 5 5 924 29.84 3097 2597 3229 70.16 0.00 0.00 0.00 51
        0 5 33 274 8.85 3097 2597 3229 91.15
        0 6 6 821 26.51 3097 2597 3229 73.49 0.00 0.00 0.00 50
        0 6 34 275 8.87 3097 2597 3229 91.13
        0 8 7 868 28.04 3097 2597 3229 71.96 0.00 0.00 0.00 51
        0 8 35 274 8.85 3097 2597 3229 91.15
        0 9 8 948 30.61 3097 2597 3229 69.39 0.00 0.00 0.00 50
        0 9 36 274 8.86 3097 2597 3229 91.14
        0 10 9 1005 32.46 3097 2597 3226 67.54 0.00 0.00 0.00 53
        0 10 37 274 8.84 3096 2597 3223 91.16
        0 11 10 1683 54.34 3096 2597 3223 45.66 0.00 0.00 0.00 51
        0 11 38 274 8.85 3096 2597 3223 91.15
        0 12 11 865 27.94 3096 2597 3223 72.06 0.00 0.00 0.00 51
        0 12 39 274 8.84 3096 2597 3223 91.16
        0 13 12 860 27.76 3096 2597 3223 72.24 0.00 0.00 0.00 48
        0 13 40 278 8.99 3096 2597 3223 91.01
        0 14 13 1083 34.99 3096 2597 3223 65.01 0.00 0.00 0.00 49
        0 14 41 274 8.84 3096 2597 3223 91.16
        1 0 14 929 30.00 3096 2597 3223 70.00 0.00 0.00 0.00 74 77 0.00 0.00 0.00 0.00 101.63 33.87 99.88 0.00
        1 0 42 288 9.29 3096 2597 3223 90.71
        1 1 15 978 31.59 3096 2597 3224 68.41 0.00 0.00 0.00 70
        1 1 43 281 9.08 3096 2597 3226 90.92
        1 2 16 2583 83.42 3096 2597 3226 16.58 0.00 0.00 0.00 72
        1 2 44 279 9.01 3096 2597 3226 90.99
        1 3 17 2073 66.93 3096 2597 3226 33.07 0.00 0.00 0.00 76
        1 3 45 284 9.16 3096 2597 3226 90.84
        1 4 18 928 29.96 3096 2597 3226 70.04 0.00 0.00 0.00 74
        1 4 46 285 9.19 3096 2597 3226 90.81
        1 5 19 1303 42.08 3096 2597 3226 57.92 0.00 0.00 0.00 75
        1 5 47 286 9.23 3096 2597 3226 90.77
        1 6 20 883 28.52 3096 2597 3226 71.48 0.00 0.00 0.00 74
        1 6 48 281 9.07 3096 2597 3226 90.93
        1 8 21 1779 57.46 3096 2597 3226 42.54 0.00 0.00 0.00 75
        1 8 49 284 9.18 3096 2597 3226 90.82
        1 9 22 958 30.93 3096 2597 3226 69.07 0.00 0.00 0.00 76
        1 9 50 282 9.12 3096 2597 3226 90.88
        1 10 23 892 28.82 3096 2597 3226 71.18 0.00 0.00 0.00 73
        1 10 51 283 9.12 3096 2597 3226 90.88
        1 11 24 954 30.82 3096 2597 3226 69.18 0.00 0.00 0.00 72
        1 11 52 280 9.05 3096 2597 3226 90.95
        1 12 25 954 30.82 3096 2597 3226 69.18 0.00 0.00 0.00 71
        1 12 53 280 9.05 3096 2597 3226 90.95
        1 13 26 888 28.67 3096 2597 3226 71.33 0.00 0.00 0.00 74
        1 13 54 280 9.05 3096 2597 3226 90.95
        1 14 27 853 27.56 3096 2597 3226 72.44 0.00 0.00 0.00 71
        1 14 55 281 9.08 3096 2597 3226 90.92

Thanks all SO much for the great information...I'll be able to make a good decision on the hardware going forward. Thanks again.

James

I reached this page via Google, but the breadcrumbs show it as being part of the 2.4.1 documentation, which I believe is considered current.

https://www.bro.org/sphinx/cluster/index.html

Under Workers section:

“The rule of thumb we have followed recently is to allocate approximately 1 core for every 80Mbps of traffic that is being analyzed. However, this estimate could be extremely traffic mix-specific. It has generally worked for mixed traffic with many users and servers. For example, if your traffic peaks around 2Gbps (combined) and you want to handle traffic at peak load, you may want to have 26 cores available (2048 / 80 == 25.6). If the 80Mbps estimate works for your traffic, this could be handled by 3 physical hosts dedicated to being workers with each one containing dual 6-core processors.”

Thanks for the link reminder. Unfortunately that was current as of about 6 or 7 years ago, it's just been a very long time since it was updated. :slight_smile:

  .Seth