Mark Zlamal

Posted on Oct 14, 2022

CockroachDB: Multi-Region ROSA using Cloud WAN

#cockroachdb #aws #openshift

what is this?

This blog dives into a distributed CockroachDB solution that is hosted on a fully-managed OpenShift platform known as ROSA.

For each region, a ROSA environment is provisioned, and are connected together using the AWS Cloud WAN (aka SD-WAN, Software-defined WAN).

To support multiple regions in a cloud ecosystem, the solution is tightly coupled across IaaS, PaaS, and SaaS.

IaaS, PaaS, and SaaS

There is a natural harmony and integration across these layers. This blog will highlight the capabilities, the value-add, and the required work-effort when building enterprise-ready production environments.

AWS Cloud (IaaS): AWS is the cloud vendor that hosts our global environments, facilitating all the infrastructure and platform services.
ROSA (PaaS): This is the Red Hat managed OpenShift platform that sits on the AWS ecosystem.
CockroachDB (SaaS): This is the distributed database that's deployed across multiple OpenShift (ROSA) clusters on AWS.

background: CockroachDB deployment options

Cockroach Labs offers CockroachDB as a self hosted solution, and as-a-service dedicated solution (including serverless), each with specific benefits tied to the desired use-cases & requirements.

self hosted

This relies on the expertise of customers to stand-up and operate the entire ecosystem. This means specialists making decisions about infrastructure and specialists who provision and connect it all together. So while you retain complete control across all layers, it’s no surprise that this approach is highly involved and complex from design to creation to maintenance of the entire stack.

dedicated and as-a-service offerings

This turnkey option lets customers focus on their data and apps, offering many great advantages that promote fast-to-market strategies with an evolving set of capabilities including hybrid connectivity.

where does ROSA fit in the CockroachDB landscape?

ROSA is the bridge between self-hosted and dedicated offerings — ...the bridge between **self-hosted** and **dedicated** offerings.

ROSA is a balanced sweet-spot that offers the flexibility of self hosted while automating everything else. It serves as best of both worlds by abstracting-away the annoyances of infrastructure decisions and countless choices on service types. This is all done through a prescriptive creation process using internal terraform scripts that are highly optimized for the cloud of choice (AWS in this case). This results in a rapid provisioned, ready-to-use, globally available Kubernetes platform.

This middle-ground simplifies day-2 operations so you remain focused on the databases, applications, and business logic, while inheriting all the AWS resources to explore this environment.

Like any containerization environment, ROSA lets you push applications, services, and software such as CockroachDB on the platform. The key advantage that specifically benefits CockroachDB is the ease in scaling the system. By scaling I mean true resizing of the cloud environment ranging from physical or virtual hardware all the way to the workloads themselves. This is possible because ROSA being a cloud-native project, is tightly integrated with AWS. You can start small and operate a cost-effective database solution. Should there be demand for more storage or performance, this end-to-end scaling can be accomplished in short order.

...to the AWS environment

When provisioning a ROSA Kubernetes cluster, something very exciting happens...

Detailed ROSA Architecture — Architecture: In less than 1 hour, ROSA becomes a complete, ready-to-use, scalable, highly-available solution ready for containerized apps such as CockroachDB. Red Hat OpenShift provisioning console

Through built-in terraform services, over 2 dozen services and resources are provisioned within AWS. Everything is secured, pre-configured and inter-connected including worker nodes, infrastructure nodes, master nodes. These server nodes are all accessible through load-balancers and network routing tables, gateways are established and IP addresses are defined, and finally access-controls are mapped out through defined security groups. The best part? It’s all laid out in the AWS cloud console, providing full visibility and control over this entire ecosystem.

...so what does this mean for CockroachDB?

The good news is that we already provide extensive documentation and guides on Kubernetes and OpenShift deployment using Operators, Helm charts, including collections of YAML specs and fragments. This makes ROSA a sweet-spot for rapid standing-up of a CockroachDB database in the cloud.
Deploy CockroachDB on Red Hat OpenShift
Orchestrate CockroachDB Across Multiple Kubernetes Clusters
YAML Fragments on GitHub
Scaling using operators, manual configs, helm charts

...to the NETWORKING!

Networking challenges for multi-region deployments aren’t specific to ROSA, it’s a general challenge in cloud-computing under any environment. I chose ROSA because it’s the path of least resistance, full featured, highly modernized, easy to deploy, scalable, everything I’ve already mentioned. The clusters sit on VPCs. VPCs are hosted in single regions, and regions belong to the global AWS Cloud ecosystem, and this is a common theme in all cases.

legacy approaches: Transit Gateways & VPC Peering

In the following diagrams you’ll see the use of transit gateways or VPC peering to establish connectivity, which is now considered a legacy approach. These examples highlight the growing challenges, complexity, and risk as a cluster extends across regions.

legacy: networking across 2 regions

This diagram represents the simplest solution where transit gateways or VPC peering is used to establish connectivity.

legacy: networking across 3 regions

When a 3^rd region is added, each routing table must be updated to reflect the new IP range and destination. Challenges around networking continue since all regions need explicit tables defined to see the other regions. It’s a game of continuous maintenance of routing tables across all the clusters, either via VPC peering or Transit-gateways.

legacy: networking across 4 regions

By adding a 4^th region you quickly see the complexity growing (almost exponential) since each cluster must have explicit access to the others. Every peering connection governs their IP range, this range points to the Transit gateway that in-turn connects to a network of other transit gateways distributed across regions. Everything must be perfectly defined and becomes a tedious & highly error-prone process since a single typo could take-out the entire ecosystem. To make matters worse, the addition of this new cluster requires you to visit all operational/production/live clusters and update those routing tables with this new VPC connection. Even in the AWS portal, navigating across the resources to find every field becomes a complicated and unmanageable mess that I can’t even fully depict here.

AWS Recommendation: Cloud WAN

I had a meeting with AWS staff, and they recommended the use of the Cloud WAN services instead of transit gateways or VPC peering. Transit Gateways are still the fundamental building block for communication between VPCs, in fact the underlying physical AWS WAN architecture continues to use transit gateways, but with added intelligence and UI tools to make it very easy to use and economical as you expand.

The same solution as above, using the AWS Software-Defined WAN. Immediately you see that it’s a flat network connecting all the clusters using the global AWS backbone. This picture shows the ENTIRETY of the implementation and network configuration. You only need a single route that points to a core network, and the SD-WAN manages all the connectivity across VPCs, across edges, regions, and partitions. That’s it. The AWS management console is a single pane-of-glass graphical UI providing visibility across the entire AWS network.

The best part is that when you add a new Cluster, you don’t have to make any changes to the existing clusters or infrastructure. The only rule that applies to all implementations: Each VPC must have a unique IP Address range for it to be propagated across the global network.

option 1: public Postgres endpoints

Here is an end-product for a traditional 3-region CockroachDB cluster, allowing users to pick & choose, or be assigned to the closest CockroachDB edge-location with lowest latency. All inter-node communication is hidden behind the SD-WAN, while direct access to the database ports is handled by regional load balancers.

option 2: private Postgres endpoints, public-facing colocated apps

For customers who do not want to expose actual CockroachDB Postgres connection-interfaces, instead they publicize application endpoints that can be firewalled and protected. These visible connections point to APIs, UIs, mobile apps services, while the data-processors and apps are all behind the firewall, sitting on the same subnet at the multi-region CockroachDB instance. VPC architectures like this are often referred to as secure landing zones.

Secure Landing Zones

These secure landing zones are entire platforms that operate under the umbrella of AWS compliance controls, security, encryption. You inherit all the governance rules for networking, user-permissions, and app permissions to protect these workloads. Data-hungry apps such as analytics/OLAP or CDC workloads are all prime candidates as colocated apps in these environments. I mentioned these workloads are on the same subnet, effectively all services are zero network-hops away from CockroachDB. Data can be consumed, perhaps using follower reads or pinned ranges to the physical nodes in the subnet. You’re literally benefiting from the performance of a local area network with no ingress or egress charges, limitations, or packet loss due to unreliable networks. At the risk of being controversial, we can enhance CockroachDB performance by running the entire cluster using insecure mode because nothing is accessible outside the secure landing zones.

connect our VPCs together: Cloud WAN on AWS

The first step is to create a global network. This defines your environment that will manage all the connections, policies, routes, and data metrics.

In my project I created a "core network", selecting 2 regions that will be connected (us-east-2, us-west-2). You can always add and remove regions across the entire AWS ecosystem.

AWS network backbone: us-west-2 to us-east-2

my Cloud WAN policy

The next step is to establish a network policy against this new segment. This policy governs the rights and extended capabilities for the connections, regions, rights, routes, and conditions for a connection. I left most of this as default, with an example policy JSON that works for me below:

{
  "version": "2021.12",
  "core-network-configuration": {
    "vpn-ecmp-support": true,
    "asn-ranges": [
      "64512-65534"
    ],
    "edge-locations": [
      {
        "location": "us-east-2"
      },
      {
        "location": "us-west-2"
      }
    ]
  },
  "segments": [
    {
      "name": "PrimarySegment",
      "edge-locations": [
        "us-east-2",
        "us-west-2"
      ],
      "require-attachment-acceptance": false
    }
  ],
  "attachment-policies": [
    {
      "rule-number": 100,
      "condition-logic": "and",
      "conditions": [
        {
          "type": "any"
        }
      ],
      "action": {
        "association-method": "constant",
        "segment": "PrimarySegment"
      }
    }
  ]
}

establish Cloud WAN connections

This final step is where the actual interfacing to the VPCs takes place (and you'll see that VPN, Transit Gateways, and other site-to-site options are provided).

By selecting a region where your VPC is provisioned (along with ROSA on that VPC), you pick the VPC and private subnet that ROSA provisioned. It's the private subnet that all the workers and servers are attached to.

back at the regional VPCs...

Find your ROSA VPC (region-specific), and go into the private subnet. This subnet has a link to the Route Table that's associated with this subnet. Here is where you establish the network route between this subnet and global network.

Subnet routing tables — All traffic in the range 10.x.x.x will enter SD-WAN, except for the **local** traffic.

This process needs to be done for each ROSA VPC in every region. Note that all ROSA VPCs must have unique CIDR ranges. In my demo I defined 3 ROSA clusters with local CIDR ranges:

10.100.0.0 (cluster 1, above example screenshot)

10.110.0.0 (cluster 2)

10.120.0.0 (cluster 3)

back at the Cloud WAN portal...

You can now verify the routes across each edge location to validate that the network propagation is complete and active.

The topology trees and maps can be explored for a graphical representation of your networks:

Network topology map showing the connections.

test connectivity between regions

The key to success is to ensure that every worker node in every ROSA cluster can talk to each other.
I create a dummy pod on each cluster/each region just to get access to a terminal window so I can run curl commands:

apiVersion: v1
kind: Pod
metadata:
  name: dummy-curl-pod
spec:
  containers:
    - name: dummy-curl-pod
      image: curlimages/curl
      command: [ "sh", "-c"]
      args:
      - while true; do
          echo -en '\n';
          printenv MY_NODE_NAME MY_HOST_IP MY_POD_NAME;
          sleep 20;
        done;
      env:
        - name: MY_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: MY_HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name

Issue a curl command to a compute node in a different cluster

/ $ curl 10.120.3.5
curl: (7) Failed to connect to 10.120.3.5 port 80 after 0 ms: Connection refused

success! This error message tells us that while there are no services listening to port 80 on that node, the server is reachable!
This one-time test needs to be run at all edges/clusters to ensure that CockroachDB nodes can properly communicate (to at least a single node). This ensures database integrity and data replication across the entire platform.
Delete these pods after testing is done

from the perspective of CockroachDB...

This blog does not dive into the installation process, but it's a typical Kubernetes-focused methodology of creating deployments, services, routes, etc (see caveats).

When CockroachDB is live, the admin console provides visibility to the entire cluster (all regions), along with latency-charts and maps of node-deployments to monitor the overall health of the database.

In this example, we have 3 ROSA clusters (each hosting 3 worker nodes with CockroachDB installed), totalling 9 worker nodes. One ROSA cluster is on the west coast (AWS Oregon), and the other two ROSA clusters are in OHIO.

Map of the ROSA-deployed CockroachDB service across 2 regions

Investigating the latency, there is approximately 52ms between the regional edges, while ROSA clusters in the same data-center have sub-ms latency.

Latency chart across all 9 worker nodes.

conclusion

I can't cover every detail into the architecture, design, and deployment since any of these topics and sub-topics can be a discipline on its own and ideally delivered in an in-person working session with Q&A and conversations rather than a blog.
This solution is merely a proof of concept that allows me to leverage Red Hat and AWS resources to the maximum extent in a reliable and repeatable environment.

caveats

Kubernetes services, deployments, persistent volume claims, secrets, and load balancers need to be used across each ROSA cluster. These are found on GitHub
The cockroach start join syntax must specify proper server (worker-node) IPs that are part of the ROSA clusters.
CockroachDB certificates must be created for secure environments.
ROSA EC2 instances (eg: worker-nodes) are inaccessible by default, governed by the Access-Control-List inbound rules and outbound rules. You will need to adjust them to allow traffic from the other CIDR ranges. For convenience I've been known to allow 0/0, all-ports, both ways in VPCs since the IP ranges are virtual and inaccessible. I would love to know what your thoughts and concerns are on this.

references

Red Hat OpenShift provisioning console
Migration to SD-WAN from TGW
Cloud Wan Product Overview
VPC Peering vs Transit Gateways
MULTUS CNI in OpenShift
MULTUS CNI GitHub Quickstart
Red Hat Advanced Cluster Management for Kubernetes
AWS Architecture on ROSA (MZRs)
7 Advantages Of OpenShift Over Kubernetes
For the tech-savvy and initiated (Cloud WAN pdf)
Cloud App Trends)
Containerization Market
Containerization Trends
Growth in managed services: ROSA
Forrester: Red Hat partnerships for OpenShift

DEV Community