Software frontend

Hi,

Does any one have a Click or other software frontend implementation that splits traffic to different nodes (and not cores)?

Thank you,
Sunjeet Singh

Sorry, forgot to add- Linux based.

Sunjeet

Is this what you are looking for?

http://www.bro-ids.org/wiki/index.php/ClusterFrontendClickModularRouter

Tyler

Thanks Tyler. From my understanding, this would be used to split traffic across cores on the same worker machine.

Can this be extended to get what I want- split traffic from the fronted (which will be running this Click daemon) to workers running on different machines?

Thanks,
Sunjeet

You are correct, this only splits traffic across workers on the same machine. I've investigated, but haven't had time to test splitting traffic across workers on different machines. You should be able to tweak the linked config a little by removing the tapX lines and redirecting the my_switch outputs to the various physical interfaces. For example:

my_switch[0] -> Queue -> eth1; #(repeat for eth2... ethX)

I haven't tried this, but it should work. This software-based load balancing will only work for smallish amounts of traffic. If you are trying to feed upwards of 1 Gbps, the user mode Click will probably choke. I started to investigate using kernel mode Click with the RouteBricks code to improve performance, but got stuck at a kernel panic and didn't have time to pursue it further. For that, you need a multi-core Nehalem server with Intel 10Gbps 82598EB cards. The best solution is probably to buy a hardware load balancer like the cPacket cFlow device. Currently, they have a 10Gbps version, but heard they are working on a 40Gbps version. Other people have used Cisco routers, or other hardware load balancers.

It would be nice to find a low-cost and effective software-based load balancer, but I haven't seen anything yet. Right now, I am using Click! and dropping a significant fraction of our traffic to cope with the limitations of running the software load balancer and workers on one multi-core mid-range server.

Tyler

Thanks again, Tyler for your reply.

Right now, Software-based load balancing is my only option. And I'm not too worried about the 1Gbps performance. Why, you might ask, do I want to deploy a cluster in the first place? Here's why-

I am trying to implement the Bro cluster on an Amazon EC2 cloud- just for fun and in an effort to learn about cloud computing. I am interested in seeing what challenges arise from porting a cluster-based implementation to a cloud. I have the manager, proxy and worker set up. But I need a front-end to make use of more than one worker machines. So I'm not worried about the front-end bottleneck, I just want to get the architecture running.

Now, considering the Click implementation that you described-

You are correct, this only splits traffic across workers on the same machine. I've investigated, but haven't had time to test splitting traffic across workers on different machines. You should be able to tweak the linked config a little by removing the tapX lines and redirecting the my_switch outputs to the various physical interfaces. For example:

my_switch[0] -> Queue -> eth1; #(repeat for eth2... ethX)

Even if this does forward the packet to the eth1 interface (sending out of eth1 I assume), we haven't done the part where the packet goes from the interface to the right worker machine (which is done by rewriting the MAC address on the packet I suppose).

It would be nice to find a low-cost and effective software-based load balancer, but I haven't seen anything yet. Right now, I am using Click! and dropping a significant fraction of our traffic to cope with the limitations of running the software load balancer and workers on one multi-core mid-range server.

Thank you for sharing your experience. This helps a long way.

Sunjeet

Even if this does forward the packet to the eth1 interface (sending out
of eth1 I assume), we haven't done the part where the packet goes from
the interface to the right worker machine (which is done by rewriting
the MAC address on the packet I suppose).

I had written a config to do that, but never tested it. Here are the basics.

AddressInfo(mymac 10.0.0.1/8 1:1:1:1:1:1);
AddressInfo(worker1 10.0.0.2/8 2:2:2:2:2:2);
AddressInfo(worker2 10.0.0.3/8 3:3:3:3:3:3);

my_switch :: HashSwitch(26, 8);

FromDevice(eth1, PROMISC true, BURST 8 ) -> my_switch;
todevice1::ToDevice(eth2, ALLOW_NONEXISTENT true);
todevice2::ToDevice(eth3, ALLOW_NONEXISTENT true);

//example: my_switch[0] -> EtherEncap(0x0800, 1:1:1:1:1:1, 2:2:2:2:2:2) -> Queue -> ToDevice(eth2, ALLOW_NONEXISTENT true);
my_switch[0] -> EtherEncap(0x0800, mymac, worker1) -> Queue -> todevice1;
my_switch[1] -> EtherEncap(0x0800, mymac, worker2) -> Queue -> todevice2;

or if you just want to bypass the MAC rewrite to test that traffic is being load balanced:

my_switch[0] -> Queue -> todevice1;

mymac would be the MAC of the interface receiving the traffic, worker1 and worker2 need to be set to the MAC of the worker machines. I was hoping this would take in the traffic, load-balance, rewrite the MAC and send it out to several interfaces. I think this is where I was getting a kernel crash and didn't have time to upgrade the kernel. I think it was on CentOS 5.3, and I saw patch references to the kernel error I was getting.

Tyler

Hi Tyler,

Can you please help me troubleshoot here? I did what you said (on Linux so some function option-parameters are gone), and here's what my click script looks like (currently testing with one frontend machine and one worker machine)-

AddressInfo(mymac <IP add. of frontend/8> <mac add of eth0>);
AddressInfo(worker1 <IP add. of worker1/8> <mac add of worker eth0>);
AddressInfo(worker2 <IP add. of worker1/8> <mac add of eth1>);
AddressInfo(worker3 <IP add. of worker1/8> <mac add of eth2>);

my_switch :: HashSwitch(26, 8);

FromDevice(eth4, PROMISC true) -> my_switch;
todevice1 :: ToDevice(eth0);
todevice2 :: ToDevice(eth1);
todevice3 :: ToDevice(eth2);

my_switch[0] -> EtherEncap(0x0800, mymac, worker1) -> Queue -> todevice1;
my_switch[1] -> EtherEncap(0x0800, mymac, worker2) -> Queue -> todevice2;
my_switch[2] -> EtherEncap(0x0800, mymac, worker3) -> Queue -> todevice3;

When I run the script with the command "sudo click try.click", it starts executing and gives no messages. To test it,
I used tcpdump to first see if any of the interfaces on worker1 is receiving any traffic -> No.
Then I checked if any traffic is going out of eth0, eth1 or eth2 on frontend -> No.
Checked if eth4 is receiving the packets I sent through tcpreplay -> Yes.

How can I go about debugging this?

Thanks,
Sunjeet

Hi Sunjeet,

I have hardly worked with Click, and don't have experience with troubleshooting it. Your best bet is to sign up with their mailing list at the bottom of the page on the following link. http://read.cs.ucla.edu/click/

If you get it working, please follow up with the working configuration.

Tyler

Good news is that I got it to work for a simple case- 1 Frontend node and 1 Worker node. So the job is use Click to simply relay all traffic entering Frontend to a Worker node. Here are the steps-

1. Enterprise traffic comes in at Frontend's interface eth0
2. Source MAC Address of the packet is changed to Frontend's eth3's MAC address
3. Destination MAC Address of packet is changed to Worker's eth3's MAC address
4. Packet is sent to Frontend's eth3 for delivery.

The Click script looks like-

FromDevice(eth0, PROMISC true) -> StoreEtherAddress(an:on:ny:mi:ze:d1, OFFSET src) -> StoreEtherAddress(an:on:ny:mi:ze:d2, OFFSET dst) -> Queue -> ToDevice(eth3);

To debug, I simply used tcpdump to check at interfaces eth0 and eth3 on Frontend, and interface eth3 (with -e flag for displaying MAC address) to check if the packets were reaching.

For multiple workers, one can use a Hash function (as described earlier in this thread) to calculate the destination Worker MAC address.

Sunjeet

Sorry for coming late to the party but we actually have Click code
for this (including hashing):

    http://www.icir.org/robin/tmp/cluster-click.tgz

There's no dcoumentation and it hasn't been used in a while, but
should be pretty straight-forward to figure out.

Robin

OSU has been running this code for years and it has been working well, they have 3 click boxes there now. One big problem is that it doesn't support IPv6 traffic but you can deal with that in your click configuration by redirecting IPv6 traffic "manually" if you need it.

  .Seth

Oh, cool, thanks for sharing!

New challenge when deploying cluster onto cloud-
Just rewriting the MAC addresses does not deliver packet to destination because every two machines on the cloud are not on the same physical network. Network devices between the two discard the packet.

Solution? Encapsulate the packet in some new packet- implies extra overhead.

=> Points to the bigger problem of Software load-balancing in a cloud environment.

This just-for-fun experiment is turned out to be more fun than I thought it would.

Sunjeet

I know this is really just an experiment, but I'm interested in where you are going with this. Are you thinking that you would use this for processing really large tracefiles on a cloud infrastructure or are you thinking of live traffic? If live traffic, what traffic would you be sniffing in that context?

Definitely an interesting idea though.

  .Seth

Definitely an interesting idea though.

Thank you. Like you mention, there's really multiple directions in which this can go.

For passive analysis and for live traffic, where you're sending traffic from your enterprise into the cloud for analysis, there would be significant price involved if all packets were sent as-is. One can imagine a more optimal setting where event-analysis can be done locally and only the batched events are sent to the event-handler stage that runs on the cloud.

Another idea and the one that I have in mind is that everything runs on the cloud, even your enterprise. This makes much more sense. A cloud provider can have a Bro Instance (like the existing Snort instance http://www.snort.org/news/2010/07/07/snort-now-available-on-the-amazon-cloud/ ) sitting in front of their cloud network or simply cloud web server.

Sunjeet

One can imagine a more optimal setting where event-analysis can be done locally and only the batched events are sent to the event-handler stage that runs on the cloud.

This is likely to just cause more overhead than it's worth.

A cloud provider can have a Bro Instance

I can imagine doing this. I may look into it at some point too.

  .Seth

Well, if people are going to be looking at this space, I'd quickly like to summarize the information the I partly gained from this forum or otherwise learnt from the challenges that I ran into-

The Frontend Load Balancer remains the bottleneck-
In a cloud setting, frontend remains the non-scalable part of the existing cluster architecture. With the option of hardware lond balancing gone in the Cloud, software load-balancing will have to incur some overhead. You can't modify only MAC address (the packet will get dropped before reaching Worker) and you can't modify both MAC and IP (you need the original IPs (duh)). You need to either encapsulate the packet yourself (user-level or kernel-level but still processing overhead and requires de-capsulation code at the receiver side) or configure a cluster Virtual Private Cloud and subnet in which case the Cloud is doing the encapsulation for you. How well does this scale?

I would go this way if the cloud providers offer it. The more you move towards being able to control the layer-2 aspects of the network, the closer you are to being able to easily do load balancing. If you are given control over VLANs within your cloud, you could still use the cluster-click package then Robin sent the link to earlier because it has the capability of distributing packets by appending vlan headers to them.

  .Seth

Amazon's EC2 cloud has ec2-create-VPC and ec2-create-subnet commands in it's API.

Sunjeet