DEV Community

SEB for AWS Community Builders

Posted on • Originally published at faun.pub

AWS Landing Zone: Hybrid networking

Previously, I fleshed out the core aspects of AWS Control Tower managed landing zone and brought closer how to approach accounts baselining to maintain consistency and elevate the security level across the estate.

AWS Landing Zone #1: Expanding Control Tower managed estate

AWS Landing Zone #2: Control Tower Account Factory and baselining

This one is more of a try-out of how potentially hybrid networking could be set up, including hybrid DNS. Why so? It’s because it really depends on the individual requirements of an organisation. Such requirements can oscillate around various aspects like security, scalability, performance etc. and so the final architecture should be carefully considered to make sure the correct model is applied. Otherwise, one may end up with a configuration that in the long term won’t fit while modifications to such a fundamental matter can turn out to be very costly in many ways.
I decided to define objectives that could fit into a wide range of use cases there might be and set things up myself to gain even more experience with the recent AWS network services as well as share my thoughts, as usual.

Hybrid networking

Hybrid networking is nothing else than just connecting on-premises networks with the ones in the cloud in a secure and performant way.

Image description

To establish hybrid network connectivity three elements are required:

  1. AWS hybrid connectivity service (Virtual Private Gateway, Transit Gateway, Direct Connect Gateway)
  2. Hybrid network connection (AWS managed or software VPN, Direct Connect)
  3. On-prem customer gateway

Objectives

As for my on-prem network, I simply chose my home network there was only one set of building blocks I could use and these were respectively:

  1. AWS Transit Gateway (TGW)
  2. AWS Managed Site-to-Site VPN (S2S VPN)
  3. StrongSwan @ RaspberryPi

The functional objectives were to get:

  • centralised TGW in the Networking account shared across the Organisation with the use of AWS Resource Access Manager (RAM)
  • centralised egress routing via NAT Gateway (NGW) and Internet Gateway (IGW) living in the Networking account
  • hybrid DNS

Hybrid DNS

DNS is a critical component of every network. I wanted to make sure that I can resolve my local/home DNS domain (sebolabs.home) hosted on my Synology NAS from AWS accounts across my Organization and at the same time be able to resolve Route53 (R53) Private Hosted Zones’ records configured in those accounts.
The challenge here was the decision itself that must have been made in order to get it set up in the least complicated way keeping in mind the associated costs coming from the fact the R53 resolvers are not the cheapest services out there. They cannot be avoided though as the VPC R53 native resolver (.2) is not reachable from outside of AWS. The concept of using the Inbound/Outbound resolvers is pretty straightforward per se, however, things get more complicated when considering a multi-account set-up.

One other objective that I set was to be able to provide flexibility and autonomy to manage the R53 Private Hosted Zones (PHZ) within individual accounts but under one condition. That condition was that those hosted zones must overlap with the root hosted zone living in the Networking account along with the resolvers, namely they must represent subdomains:

  • Networking AWS account root R53 PHZ: sebolabs.aws
  • Sandbox AWS account R53 PHZ: sandbox.sebolabs.aws
  • another AWS account R53 PHZ: any.sebolabs.aws

Apart from the overlapping domain namespaces, one other requirement here is that all the R53 PHZ across the Organization accounts that want to benefit from the hybrid DNS must be associated with the VPC the root PHZ is associated with and where the R53 resolvers are located. At the same time, the Outbound R53 resolver must be shared through RAM to be associated with all other VPCs. The alternative of centralising multiple hosted zones in a shared AWS account didn’t feel appealing to me and so that was the idea I went with.

The solution

Just to make it crystal clear, the solution presented below is a sort of an MVP that in a real-world scenario would have to be expanded at least to introduce enough resiliency and performance. I will touch slightly upon that matter later in this section. Bear with me…

High-level design

For my PoC, I set everything up just like explained above leveraging my AWS Organization Networking account and a Sandbox one.
My on-prem network on the other hand is represented by a single Raspberry Pi 4 running Ubuntu 20.04 and StrongSwan 1.9.4, as well as Synology DS218+.

To keep the diagram below as clean as possible to highlight the concept the traffic lines were made to flow through the TGW while associated NICs are there to just indicate they physically exist as all that traffic is handled by them in fact. For the same reason, there are no availability zones visualised while this entire set-up is Multi AZ’d, as well as local routes in the routing tables were omitted.

Image description

Transit Gateway

As you can see the Transit Gateway routing has been simplified as the use case above is not complex. Normally, there would be multiple TGW routing tables assigned to attachments depending on the individual connectivity requirements of a particular VPC.

The centralised egress out to the Internet is apart from a way of reducing the costs of running NAT and Internet gateways in each VPC requiring them the ability to introduce security appliances (“bump-in-the-wire”) combined with AWS Gateway Load Balancer for traffic inspection or make use of the AWS Network Firewall.

Image description

DNS resolution

The support for overlapping domain names that is the core concept of the proposed set-up was introduced in late 2019 and made it easy to distribute permissions for managing private hosted zones across the organisation.
At the same time, it allows the R53 resolver route traffic based on the most specific match. If there is no hosted zone that exactly matches the domain name in the request, the R53 resolver checks for a hosted zone that has a name that is the parent of the domain name in the request.

As PHZs are global constructs and not regional they are also a perfect means to support DR scenarios leveraging a multi-region solution. Similar thing with R53 Inbound/Outbound resolvers, another pair of resolvers can be configured in another region to failover to in case of the primary region failure.

Caveats

The above is obviously just a fundament. Things complicate when you start considering how your workloads will run across your Organisation managed accounts and how services they host will be exposed.
Now, when there’s a centralised egress what about exposing your services to the Internet? Wait, wasn’t one of the main ideas behind centralising the egress to disallow the creation of Internet Gateways in managed accounts through SCPs? In such a case you probably either centralise your ingress or maybe disallow the creation of NAT Gateways and association of public IP addresses.
Hereby, I just wanted to emphasise how one decision can drive another one and eventually influence the shape of the target solution. In the end, every Organisation wants to end up with patterns and procedures for doing things, don’t they?

Final thoughts

Going back to my initial statement, there are multiple ways such a hybrid network can be designed and implemented depending on requirements. Each individual functionality must be carefully thought through.
Related aspects that I came across when working for companies in their cloud enablement phase were, among other things, considerations around:

  • centralised VPC with subnets sharing with RAM along with centrally managed R53 PHZs per share
  • single, centralised ingress ALB/NLB with multiple rules passing traffic to internal ALBs/NLBs with a firewall in between Not all those ideas turned out to be good choices therefore thinking global and making your target solution as flexible as possible is the way to go. For that of course, there must be a lot of experience within the team.

Now when AWS offers an enormous range of services and options to deliver very complex solutions while organisations decide to migrate to the cloud we’re back to centralising things but in a different place. The reason for that is that with hundreds of AWS accounts running a big number of workloads organisations want to ensure some control and elevate the level of security which is probably the most important key factor for them when deciding on migrating to the cloud. The times when individual projects were treated independently seem to be over. We’re back doing the same work around networking but in someone else’s data centre :)

As all these things are not always easy to comprehend, especially when AWS services evolve I strongly suggest following the Networking and Content Delivery Blog from AWS where you can find many useful clues, and solutions or at least get your head around what’s going on in that world.

Discussion (0)