Scalable and Secured Multicloud Networking
The pandemic has completely changed the way we live and work. Globally, as business agility and continuity become the new priority, business leaders have accelerated the shift towards digitization. Businesses are increasing their pace of multicloud adoption as significantly more workloads are migrated out of private data centers to the public clouds. However, most businesses are likely to keep their sensitive corporate data and consumer PII information on-premises to retain complete control. As this balance of on-premise, hybrid and multicloud environments becomes the norm, IT managers are challenged to build a network of scalable and secured multicloud networking infrastructure over the internet and over private wide area networks.
In this blog, I am using the example of a typical enterprise that has decided on a multicloud strategy. This enterprise is planning to migrate its tiered applications, databases, and serverless workloads to public cloud providers AWS and Azure. In addition, it has a mirrored deployment in its on-premise private data center for testing and staging. It is also keeping consumer PII information in datastore deployed on-premise. The enterprise has corporate users in multiple locations such as campuses, branch offices and remote locations. These users access to the business applications on the multicloud and also SaaS applications such as office365 and salesforce.
The diagram above shows the multicloud network of the enterprise.
Let me assume this enterprise has recently migrated its on-premise SQL server workloads to Azure SQL managed instances deployed across availability zones in a Virtual Network (VNET) for high availability. These databases are regularly read by tiered applications in Azure VM instances deployed across different VNETs. These VNETs are peered with a Hub VNET in a Hub and Spoke design. In the Hub VNET, Azure Hub VPN Gateways are provisioned to securely connect to on-premise locations over the Internet, and over Azure ExpressRoute peering in the private data center. As new VNETs or remote locations are added, the IT team creates new peering between VNETs on Azure and also provisions new redundant VPN connections towards the on-premise locations.
Over in the AWS environment, let me assume the enterprise has re-architected its legacy frontend and sales applications into microservices which are now deployed onto Elastic Kubernetes Services (EKS) clusters. The EKS clusters are deployed in different VPCs owned by different application teams. There are additional VPCs for CI/CD and administrative tooling. The VPCs peer with each other within the AWS region. The EC2 worker nodes in the EKS clusters are provisioned across different availability zones for high availability. In addition, there are Elastic Block Storage deployed which provide persistent volumes to the applications running on the EKS clusters. A mirrored Kubernetes setup is maintained in the private data center for testing and staging purposes. AWS VPN Gateways are deployed in a transit VPC to provide secured VPN connections to the on-premise locations. Site-to-site VPN connections are established over the internet and over the Direct Connect colocated in the private data center. Similarly, as new sites are added, the IT team provisions and manages the new VPN connections.
As there are imminent requirements for the serverless workloads in AWS to access to the Azure SQL database, the IT team has to establish site-to-site VPN connections between their AWS and Azure cloud regions. This is to avoid hair-pining the inter-cloud traffic via the private data center. Security policies and processes are assumed to be properly designed and enforced in the multicloud environment. It is also assumed that the addressing space has been planned across the multicloud environment using non-overlapped private address space.
Finally on the user side, corporate users who have returned to work in the office rely on the VPN connections established over the Internet or Service Provider managed private wide area network (MPLS) to access to cloud resources. For corporate users working in home offices or remote locations, they use SSL VPN to connect to the SSL VPN headend deployed in the private data center. These users also regularly access to Software as a Service (SaaS) applications such as Office365 and salesforce via their direct Internet connection. User experience is heavily dependent on the quality of the last mile fixed or mobile connection.
The above diagram shows the overall logical site-to-site VPN tunnels provisioned by the IT team across the multicloud environment. Assuming redundancy (two gateways per site both on-premise and in cloud) is a mandatory requirement, the total number of VPN tunnels across this multicloud environment is well above 30.
Next, let’s take a closer look at VPC/VNET design and cloud networking within the enterprise’s AWS and Azure environments.
Within the AWS environment, a common approach to scale inter-VPC peering is to use a Hub and Spoke design by creating a transit VPC to deploy a pair of software appliances such as Cisco Cloud Services Router 1000v (CSR1000v) to terminate the VPN tunnels. This approach is manageable for small setups but it can become hard to manage once the number of VPCs increases. There are caveats documented by AWS such as limitation of up to 1.25Gbps throughput per VPN tunnel.
Similarly in the Azure environment, a Hub and Spoke design with transit VNET and VNET peering approach can be used to establish VPN connectivity from its Azure region to the on-premise locations.
Here are the key lessons I have learned:
- The overall complexity in terms of provisioning, managing and troubleshooting the VPN connections across multicloud will increase dramatically as the number of locations increases.
- The VPN tunnels are essentially point-to-point connections over the Internet or over private peering links. There is no easy way to identify optimum paths or distribute traffic optimally even by running routing protocols over the tunnels.
- The cloud providers offer many tools such as AWS cloud formation, Azure Resource Manager, SDKs, APIs which you can use to automate provisioning. However, you need to remember which cloud provider you are working on, and baking the tools into your Infrastructure as Code and CI/CD pipeline require working knowledge of these different cloud environments.
- It is challenging to operationalize the management of a complex multicloud networking environment. A third party tool is often required to collect and analyze operational metrics and telemetry from the multicloud network.
Simplifying Multicloud Networking with Software Defined WAN Overlay
In this section, I would like to discuss how Software Defined WAN (SD-WAN) can simplify the complexities of multicloud networking which we have seen earlier in the example. The concept of SD-WAN is basically a network that is abstracted from hardware by creating a virtualized network overlay commonly known as the SD-WAN fabric. SD-WAN separates the network into two parts — the control plane and the forwarding plane. The control plane is typically deployed in a centralized location like the private data center, hosted on the public cloud, or it can be consumed as a managed service from Service Providers.
Now, recall the time when majority of Enterprise WAN networks in the world were built with a maze of IPSEC/GRE tunnels over the Internet. At that time, the introduction of MPLS VPN technology by Service Providers had completely revolutionized enterprise WAN networking. Enterprises worked with Service Providers to replace the point-to-point connectivity model in IPSEC/GRE tunnels with an any-to-any connectivity model in MPLS VPN. Now, SD-WAN has the potential to simplify the multicloud networking landscape the way MPLS VPN did. SD-WAN creates a virtualized software overlay which is managed by centralized controllers. It eliminates the need for a complex mesh of point-to-point VPN tunnels to create a multicloud networking infrastructure.
SD-WAN brings the benefits of automation and integration of the public cloud infrastructure. With SD-WAN, we now choose to deploy physical or virtual appliances in the on-premise locations, and virtual appliances on the public cloud.
Cisco SD-WAN for Multicloud
While there are many SD-WAN solutions in the market today, the Cisco SD-WAN platform can help enterprises to simplify complex multicloud networking by delivering advanced application optimization, multi-layered security, and multicloud integration.
The Cisco SD-WAN platform requires a one-time setup of software controllers which can be hosted either on the public cloud, on-premise data center, or by Service Providers as a managed service.
The SD-WAN software controllers — vbond orchestrator for authentication and secured zero touch provisioning via bootstrap, vsmart for control plane policies, and vmanage for centralized management dashboard . Together, they create a SD-WAN fabric based on the Overlay Management Protocol (OMP).
The WAN Edge routers, either physical appliances or virtual, provides the forwarding plane function and they can be deployed anywhere on the multicloud infrastructure. On the public clouds, the WAN Edge virtual cloud router can be onboarded from both AWS marketplace and Azure marketplace. Readers who are interested in understanding the details of deploying and setting up the Cisco SD-WAN controllers can refer here.
Using the vmanage centralized management dashboard, organizations can use a GUI interface to easily monitor, configure, and maintain all Cisco SD-WAN devices and links (Connections) in the underlay and overlay network. We can use the vmanage to provision WAN Edge routers to connect all private data centers, core and campus locations, WAN branches, colocation facilities, public cloud infrastructure, and remote workers. The vmanage also provides graphical dashboards for monitoring network performance and software life cycle management for all devices.
SD-WAN provides intelligent management of multiple active-active links towards the cloud, become aware of the application traffic riding on it and it can dynamically route them over the best path for better user experience. Moreover, SD-WAN can carry out end-to-end traffic segmentation, and integrate with cloud based cybersecurity solutions to secure both ends of the multicloud network.
Cisco SD-WAN now supports the latest cloud native networking innovations from AWS and Azure — AWS Transit Gateway (TGW) and Azure Virtual WAN (vWAN).
AWS Transit Gateway Integration
AWS Transit Gateway is a service announced by AWS in December 2018 which provides a better and scalable way to interconnect VPCs and VPNs on AWS. Once a Transit Gateway is provisioned, user VPCs can connect to the AWS Transit Gateway using VPC attachments. Users can now scale connectivity across thousands of VPCs, AWS accounts, and on-premise networks to a single Transit Gateway. Please refer to AWS TGW documentation for more details.
The Cisco SD-WAN solution can now deploy with AWS Transit Gateway to provide the combined benefits of both solutions.
The Cisco SD-WAN solution supports a feature known as Cloud OnRamp for Infrastructure as a Service (IaaS). With your AWS IAM credentials, the vmanage centralized dashboard can fully automate the deployment of a transit SD-WAN VPC in your AWS environment, then provisions two WAN Edge cloud virtual routers in the newly created transit VPC, and finally establishes interconnections from the WAN Edge cloud virtual routers towards the on-premise WAN Edge routers. This Cloud OnRamp is typically a one-time deployment per AWS region for high availability. In addition, creating SD-WAN deployments in multiple AWS regions allow us to use the AWS global backbone to route inter-region SD-WAN traffic between branch locations.
AWS Transit Gateway removes the need for a dedicated transit VPC architecture where multiple VPN connections are setup from the WAN Edge cloud virtual routers (in the transit SD-WAN VPC) to peer with VGW in each user VPC. Instead, the WAN Edge cloud virtual routers will establish a standard IKE-based IPSec tunnel directly to the TGW and Border Gateway Protocol (BGP) is configured between the WAN Edge cloud virtual routers and the TGW. The WAN Edge virtual routers and the TGW can now exchange BGP routes to learn VPC networks and redistribute these routes into Overlay Management Protocol (OMP). OMP will dynamically propagate the routes to the rest of the SD-WAN fabric. Other SD-WAN locations in the multicloud environment will now learn these public cloud routes via OMP. Standard redistribution filtering mechanisms can be used for more granular and flexible redistribution.
Azure Virtual WAN Integration
The Azure Virtual WAN is a cloud networking architecture on Azure which uses a Hub and Spoke design to inter-connect VNETs, branches, users, and ExpressRoute circuits. Full details on Azure Virtual WAN can be found here.
The Cisco SD-WAN Cloud OnRamp feature for Azure is similar in concept with AWS Transit Gateway described earlier. The vmanage management dashboard handles the creation of the transit VNET, the provisioning of a pair of WAN Edge virtual cloud routers in the VNET, and finally establishing interconnections towards the on-premise sites. Standard IKE-based IPSec tunnels will be established to the Azure virtual hub and BGP protocol is run over the IPSec tunnels. The WAN Edge virtual cloud routers will exchange routes via BGP to learn routes in the Azure VNETs. Similarly, these routes will be redistributed into OMP and propagated to the rest of the SD-WAN locations in the SD-WAN fabric.
Unified Multicloud Networking with SD-WAN
With a SD-WAN solution like Cisco SD-WAN, enterprises can now build a secured and scalable multicloud networking infrastructure spanning across different public cloud providers, on-premise data centers and remote locations.
The SD-WAN fabric provides benefits of centralized provisioning, monitoring, and troubleshooting of multicloud infrastructure networking, site inventory, visibility and analytics. Security is built into the SD-WAN platform with authentication, secured onboarding, encryption and segmentation.
Now, on top of the capabilities I have mentioned so far, SD-WAN platforms such as Cisco’s SD-WAN has evolved with newer capabilities to handle SaaS applications. This is gaining importance as more enterprises are shifting towards the SaaS model for business critical applications such as emails, productivity and collaboration applications. With Cloud OnRamp for SaaS, the SD-WAN fabric can continuously measures the performance of a designated SaaS application through all permissible paths from a branch.
Other benefits of Cloud OnRamp for SaaS include:
- Improved branch-office user experience for SaaS applications by using the best-performing network path
- Increased SaaS application resiliency with multiple network path selections and active monitoring
- Visibility into SaaS application performance using probes that measure real-time data
- Modification of path selection depending on the application performance without any required administrator action
- Operational simplicity and consistency through centralized control and management of SaaS application policies
Automating it all with Cisco SD-WAN Infrastructure as Code
Finally, I would like to wrap this up by looking at the various options available to developers to leverage on DevOps practices such as Infrastructure as Code and CI/CD to automate the provisioning and management of their multicloud networking infrastructure. Infrastructure as Code and automation tools like Terraform and Ansible are well supported by Cisco SD-WAN and developers can easily integrate them with the DevOps framework.
Here are some links on SD-WAN terraform providers for provisioning on AWS and Azure, and Ansible Modules to create automation for SD-WAN. For example, Developers can build SD-WAN provisioning into their pipeline on CI platforms such as Jenkins.