You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Introduction

Programmability of virtual switches in hypervisor combined with ability to control data path flows with OpenFlow opens up different possibilities where L2-L4 services typically provided by virtual/physical appliances are pushed on to edge switches in hypervisors. In the current VPC model in CloudStack VPC VR provides many L3-L7 services. One of the services provided by VPC VR is to route inter-tier traffic. Entire VPC's inter-tier traffic has to get routed by VPC VR. As the size of VPC increases, VPC VR can easily become choke-point. VPC VR is also a single point-of-failure in current VPC model. There is also traffic trombone [1] problem where routing by VPC VR can become in-efficient if the source and destination VM's are placed far (in different pod/zone for e.g) from the VPC VR. Traffic trombone could become serious problem in case of region-level VPC [2].

Current network services network ACL and routing support by CloudStack for east-west traffic (inter-tier traffic) can be orchestrated to be provided by virtual switches in hypervisors. Goal of this proposal to add distributed routing and firewall functionality to native SDN controller that leverages OpenVswitch capabilities to provide inter-tier routing and network ACL's at hypervisor level in distributed fashion. This would enable a scale-out model and VPC VR being choke point is avoided. Also traffic trombone problem is eliminated as traffic gets routed directly to destination hypervisor from source hypervisor.

References

[1] http://blog.ipspace.net/2011/02/traffic-trombone-what-it-is-and-how-you.html

[2]https://cwiki.apache.org/confluence/display/CLOUDSTACK/Region+level+VPC+and+guest+network+spanning+multiple+zones

[3]http://blog.scottlowe.org/2012/11/27/connecting-ovs-bridges-with-patch-ports/

[4] https://cwiki.apache.org/confluence/display/CLOUDSTACK/OVS+Tunnel+Manager+for+CloudStack

Glossary & Conventions

Bridge: bridge in this document refers to a OpenVswitch bridge on XenServer/KVM

Host: host in this document shall refer to hypervisor hosts and can be XenServer/KVM

logical router: term 'logical router' shall refer to OVS bridge setup on the hypervisor which is used as a way to interconnect tiers in a VPC

full mesh: refers to how tunnels are established between the hosts in full mesh topology to create a overlay network. refer to [4] for further details.

flow rules: openflow rules that are configured on an openvswitch

Conceptual model 

This section will describe conceptually how distributed routing and network ACL's are achieved in an example VPC deployment with three tiers with VM's spanning three hosts. Further sections builds on the concepts/design principles introduced in this section to elaborate the architecture and design on how CloudStack and OVS plug-in can orchestrate setting up VPC's with distributed routing and network ACL's. 

Here is an example VPC deployment with three tiers, with VM's spanning 3 hypervisor hosts. VPC VR still needed to be deployed for north-south traffic. In this example VPC VR is deployed on host 3. A logical router which is nothing but a OVS bridge is provisioned on the rest of the hosts (excluding the host running VPC VR) in which VPC spans. On the host on which VPC VR is running there is no need for a logical router (bridge). Irrespective of weather a host has VM's belonging to a tier or not, a bridge is setup on each host for each tier on the all of the hosts on which VPC spans. For e.g. host 1, does not have any tier 2 VM's still a bridge is created and is in full-mesh topology with the bridges created for tier 2 on host 2 and 3. Each of the logical router on the host is connected with patch ports [3] to the bridges corresponding to tiers. This setup of logical router is done to emulate a VPC VR with nics connected to bridges corresponding to each tier.

with understanding of how bridges, logical router are interconnected with patch ports lets see the flow rules are setup. Lets assume tier1, tier 2 and tier3 has subnets 10.1.1.0/24, 10.1.2.0/24 and 10.1.3.0/24 respectively. There are three different flow configurations on different bridges.

  • bridge connected to logical router with patch port
  • bridge connected to VPC VR (hence no patch port)
  • bridge corresponding to logical router

 

Flows rules for bridge connected to VPC VR: no new additional flow rules are added to such bridges apart from what is added by OVS tunnel manager currently.  Bridge will just act as a mac learning L2 switch with rules to handle broadcast/multicast traffic. To recap from [4] below are the flow rules. there is single table 0 for all the rows.

  • priority:1200 :- allow all incoming broadcast (dl_dst=ff:ff:ff:ff:ff:ff) and multicast (nw_dst=224.0.0.0/24) traffic from the VIF's that are connected to the VM's
  • priority:1100 :-permit broadcast (dl_dst=ff:ff:ff:ff:ff:ff) and multicast (nw_dst=224.0.0.0/24) traffic to be sent out ONLY on the VIF's that are connect to VM's (i.e excluding the tunnel interfaces)
  • priority:1000 :- suppress all broadcast/multicast ingress traffic on GRE tunnels
  • priority:0 :- do NORMAL processing on the rest of the flows. this rule will ensure (due to NORMAL processing) new mac address seen from a interface is learned

 

Flows rules for bridge connected to logical router with patch port will need additional rules to deal with patch port and ensure:

  • explicitly do MAC learning only on VIF's connected to the VM's and on tunnel interfaces. So MAC learning on patch port (to avoid learning the gateway MAC address for the subnet corresponding to tier) is excluded
  • for unknown mac address flood packets only on VIF's connected to the VM's and on tunnel interfaces
  • on patch port only permit traffic destined to other subnets of VPC

Below are the flow rules:

 

  • priority 1200:- allow all incoming broadcast (dl_dst=ff:ff:ff:ff:ff:ff) and multicast (nw_dst=224.0.0.0/24) traffic from the VIF's that are connected to the VM's
  • priority 1100 :-permit broadcast (dl_dst=ff:ff:ff:ff:ff:ff) and multicast (nw_dst=224.0.0.0/24) traffic to be sent out ONLY on the VIF's that are connect to VM's (i.e excluding the tunnel and patch interfaces)
  • priority 1000 :- suppress all broadcast/multicast ingress traffic on GRE tunnels
  • priority 1: all incoming packets on patch port, just flood
  • priority:0 :- do NORMAL processing on the rest of the flows. this rule will ensure (due to NORMAL processing) new mac address seen from a interface is learned

 

Fall back approach:

Achieving distributed routing and network ACL, would need distributed configuration. Given the scale of changes that would involve  

  • No labels