Chabane R. for Stack Labs

Posted on Nov 12, 2021

Mayday, mayday! I need a scalable infrastructure to migrate on Scaleway Elements! Part 1 - Networking & Security

#scaleway #architecture #security #networking

That's it! We are going to migrate our on-premise applications to Scaleway Elements!

We start by deploying a first Web application to please the business team, we set up auto scaling and backups to satisfy the ops team, we encrypt the storage and the database with a key and a firewall to reassure the security team, all at a lower price.

Everything has been thought of and the migration of the first application is a real success and the business accepts to migrate other applications ... But wait ... no long-term strategy has been established to facilitate the migration of new applications 😱

Moving to the Public Cloud represents a major change within a company. A phase of adoption and acculturation is necessary for a successful transition.

A lack of a long-term vision can be costly during migrations to the Cloud and in particular to Scaleway Elements (SCW).

Organization of projects
Network topology
Centralization of security and monitoring
The DevOps platform
A plan for every migration strategy

Must be defined in advance, at the onboarding phases!

Did you also find it difficult to go further in your migration? Or are you thinking of migrating to Scaleway Elements?

Before discovering how to plan a migration in Scaleway Elements, let's see the common traps.

Traps to avoid

Poc to production

Some companies starting by developing a proof of concept put that POC into production.

A POC is a disposable, it should be dropped after the concept has been proved.

So if during the POC you created a SCW organization, used a CI/CD tool, deployed the resources using an infrastructure as code, it's not a POC but an MVP (Minimum Valuable Product) which it's different !

I saw many companies deployed their POC to production and asked a partner to perform a complete refactoring of their network topology, reorganazing the cloud projects, implementing a CI/CD and infrastructure as code and it finally costs more than they thought to save.

On Premise mindset

Some companies moving to the cloud continue to follow their on-premises practices.

Only the "business" workload should be moved to the cloud and it's called "lift & shift". All existing applications for security, networking, monitoring, etc. should be replaced either by the equivalent in the cloud provider or the cloud version of those applications.

Also for this case I saw that many companies wanted to move their IT software to the cloud which is not able to interact with cloud services.

Avoiding vendor lock-in at extreme level

Some companies moving to the cloud do not use managed services when it's possible.

Avoiding vendor lock-in has a sense when you are looking for portability of your business application and you don't want to depend on cloud SDKs. But all non-business applications for databases and storage should be hosted in managed services.

There are many other 'don't do' that can be listed in a dedicated post.

So before any move to the cloud, a strong sponsorship is needed to adopt a cloud mindset in the organization.

Choose a scalable strategy

When a customer asks to migrate to the cloud, I always recommend the deciders to think about a long-term vision and value with the selected cloud provider.

Why is the current approach and computing environment insufficient?
What are the primary metrics that you want to optimize by using the public cloud?
How long do you plan to use this cloud provider? Do you consider this solution permanent?

When the vision is defined, the strategy to adopt is clearer:

Identifying candidate workloads
Identifying applicable patterns
Identifying candidate topologies
Prioritizing workloads
Select initial workload to put in the public cloud
Setting the Scaleway Elements organization, projects, and policies
Implementing the network topology
Defining the disaster recovery plan
Setting the DevOps platform
Start workloads migration

Organization

When you have created the Scaleway Elements organization for your company, the first step to do is defining the projects roles. What I recommend to customers is to separate the business workloads from the operational workloads.

These roles help:

to apply granular permissions
to facilitate infrastructure as code automation
to centralize IT operations

Network

Many network topologies types exist and help to ensure communications between network node.

As we have a connection to establish between differents projects, we need to ensure a secure connectivity between networks.

1 - Create a private network for each project

2 - Create a public gateway with a reserved IP for each project and attach it to the private network.

3 - For each Kubernetes Kapsule cluster of each environment project, create an HTTPs Load balancer (LB) and whitelist the Public Gateway IPs of the Security and DevOps projects using LB ACLs.

At the time of writing this article, a Kubernetes Kapsule cluster cannot be attached to the private network and the public gateway. It's coming soon.

SecOps

Identity and Access Management

When you have a LDAP in your organization, it is common to connect it to your internal tools. There are many solutions to help you set up identity and access management, one of the most tool is Keycloak.

Keycloak supports multiples procotols:

OpenID Connect
OAuth 2.0
SAML 2.0

A support for Single Sign-On and Single Sign-Out, Admin Console, etc.

The tool could be centralized in a dedicated project and used to centralize authentication on each application in your organization. The following diagram illustrates IAM management:

Key management

There are three very sensitives resources in SCW and in any cloud provider that we need to protect whatever the price:

API Keys
Cryptographic keys
Secrets

Virtual instance images could also be critical for some organizations.

API Keys

There are some important best practices to secure the sensitives credentials:

Create single-purpose API keys
Don’t embed API keys in code
For easier visibility and auditing, central store API keys in a solution like Vault and in a dedicated project.

I wrote a complete tutorial on Securing access to Scaleway Elements API Keys from Gitlab CI

Cryptographic keys

If you need to encrypt your data using a cryptopgraphic key, there are some important best practices to apply:

Hosting encryption keys in a separate project,
For critical projects, some enterprises save the keys in a separate organization,
Least privilege and separation of duties.

Secrets

In any IT project, you will need two types of credentials:

Secrets for running your business applications like DB credentials.
Secrets to access shared applications like Elasticsearch, ArgoCD, Git repositories, etc.

In Scaleway Elements, you can host shared secrets in the Security project and keep secrets used by business applications in a Kubernetes secret.

There are some important best practices to manage secrets:

Use the replication feature when creating secrets in Vault.
Reference secrets by their version number rather than using the latest alias.
Disable secret versions before destroying them or deleting secrets.

The following diagram illustrates the key management:

Disaster recovery

A Disaster Recovery (DR) involves a set of policies, tools and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.

In Scaleway Elements, there are managed services to help you implement a disaster recovery plan, such as backup services on your databases and virtual instances. But in the event of a zone/region failure or human error that deleted a project, you will need to react very quickly to restore your data and applications.

Let's take an example of a kubernetes cluster and a RDB database that we want to backup and restore in the event of an outage:

Backup

Velero is an open source disaster recovery tool used to backup an entire Kubernetes cluster and save the backup in a bucket. A Scaleway Serverless function could be used to download and upload backups periodecally to a bucket:

Restore after a zone failure

In the event of a zone failure, a CI/CD pipeline could be run to restore the Kubernetes cluster with Velero and restore the databases using Scaleway CLI.

In the case of a re-creation of the RDB instance, you will also need to update the host and port information.

Restore after a region failure

In the event of a region failure or a project being deleted, CI/CD pipelines could be run to recreate the Kubernetes cluster and the RDB instances with Terraform, restore the Kubernetes cluster with Velero and restore the database using Scaleway CLI.

Conclusion

In this part we saw how it easy to build a scalable SCW organization, implementing a network topology, defining a disaster recovery and centralizing the IAM and key management.

What's next ?

In the second part we will see how we can deploy a DevOps platform using Gitlab and following the GitOps practices. We will also see how to centralize monitoirng and logging using Open Source tools. We will finish with an example of a migration of Docker applications in Kubernetes Kapsule and finishing with cost saving tips.

DEV Community