Q-in-Q for isolated networks functional spec

Introduction

Purpose

This is a draft spec for utilizing Q-in-Q to provide scalable isolated networks in the cloudstack environment.

Q-in-Q, also referred to as "double-tagged" vlans, is the concept of nesting tagged vlans inside each other. Thus, each
of the 4096 available vlans can also host 4096 vlans. This provides a way to scale isolated networks that is more compatible
with standard network equipment and lower overhead than GRE or other technologies that create point-to-point tunnels.

This has been designed and tested on the KVM platform, there's no reason to believe it wouldn't work on others, but the actual implementation would likely vary. More time and expertise will be required to implement other platforms.

Document History

Glossary

Feature Specifications

Quality risks (test guidelines)
- There is some question on how to handle MTU. See the chart included for the current solution.
- No single system will scale to thousands of networks, regardless of technology. Therefore, we need to ensure that CloudStack
  is properly tearing down networks that it no longer needs, and creating only networks necessary for the running instances
- The current implementation relies on naming conventions to differentiate between a tagged interface that CloudStack should
  treat normally and an interface that CloudStack should treat as a physical interface. This works on KVM since linux provides
  two standard interface names for tagged vlan interfaces, and one is rarely used. However, there could be issue in the event
  that a customer is using the rare convention. We'd need to document this in the standard CloudStack network setup for KVM
  hosts.
Supportability characteristics:
- The implementation leverages CloudStack's existing network/bridging management code, so troubleshooting would be
  largely the same. The only caveat is in regards to the MTU as mentioned and covered later on.
Configuration
- To use the feature, one simply needs to create a new "vlan#" network device on each KVM host (being a tagged network interface on an existing physical network interface), then create a new physical network in CloudStack and attach the guest network traffic type as one would with any physical interface.
- Jumbo frames should be enabled on any accompanying switch hardware that provides traffic between KVM hosts.

deployment requirements
- Currently this will only work in the master branch, which has had a redesign of CloudStack bridge names for KVM. Previously the CloudStack created bridges were "cloudVirBr<vlan>", and now they're "br<physdev>-<vlan>" to avoid collisions when the same vlan is used on different physical devices.
- Advanced networking is required
- Ethernet switches that support 802.1q and variable MTU
interoperability and compatibility requirements:
- KVM/Linux
security
- Security remains the same as with standard advanced networking and tagged networks. Anyone with privileges to listen on the unfiltered physical device can see all packets from all vlans.

Use cases

Administrators of businesses who want to use the VPC feature in a scalable way are going to be looking for a way to deploy potentially thousands of what CloudStack refers to as "Isolated Networks". Each VPC requires several, so even a company expecting to have 1,000 customers could reasonably expect to exhaust the 4000 vlan limit of their standard network deployment. This allows you to have 4096 for each vlan in their standard network, potentially 16 million.

Architecture and Design description

A traditional CloudStack advanced network environment might look like the below chart, with management, storage, and a public network created by the admin, and multiple vlans provisioned by cloudstack as needed:

This functional spec extends this design by allowing tagged interfaces to be utilized at the physical interface level:

The admin simply needs to create any 'vlan#' devices, and CloudStack uses them as physical devices.

CloudStack doesn't support defining actual physical devices to be used; instead it uses "traffic labels" to allow the admin to specify which bridges to use for what traffic, and then cloudstack determines the physical devices from those bridges. It then uses those physical devices to create subsequent tagged interfaces/bridges dynamically. Normally, if CloudStack finds that a bridge is on a tagged interface, it then looks up the parent of that interface. A small patch simply keeps CloudStack from looking up the parent if the device is a vlan# device, so that the vlan# dev is instead treated as a physical device and subsequent tagged networks are created there.

This implementation was chosen because it leverages much of the existing CloudStack code. In other words, it's a simple hack. Alternative ways of implementing this feature would include adding the ability for CloudStack to track/configure individual physical devices (my impression is that avoiding this was a conscious decision), which would make the feature independent of the "vlan#" device. This could be as simple as having a 'force-phys-dev' list that we check against.

MTU

Actual MTU values and what is included in them vary slightly between manufacturers and operating systems, but in general it should be kept in mind that Q-in-Q requires extra space in order to contain the new vlan tag, so switches that provide physical connections between KVM hosts need to accomodate that. The table below provides an idea of what MTU values have been tested to work. Generally speaking, an admin should increase the MTU on the applicable switch hardware in order to be compatible with existing MTUs in CloudStack system VMs and guest instances.

Switch MTU	1500	1532	9000	9032
Instance MTU 1468*	Y	Y	Y	Y
Instance MTU 1500	N	Y	Y	Y
Instance MTU 8968*	N	N	Y	Y
Instance MTU 9000*	N	N	N	Y

* Special consideration is required for MTU != 1500 for any virtual routers. Will need to add the ability to set MTU in the VR, perhaps in the same way that SSVM gets MTU set in the global settings.

Appendix

Appendix A:

Appendix B:

Space shortcuts

Child pages