At G Adventures we deploy 20+ times a day to many different services that handle millions of requests and millions of dollars in transactions by keeping it simple
One of my guiding principles for developing and deploying software has always been to keep it as simple as possible for as long as possible. Premature optimization wastes valuable developer time, solving problems that may never be real. One of the areas where this is apparent at G Adventures is how we deploy code. Today I’m going to talk about the common ways we deploy code, the pros and cons of each way and what the future may look like for us.
At G, we pride ourselves in using the right tool for the job. While we’re primarily a Python shop, using Django for our websites, we also have a few services written in Go and have started building out our front ends using React. Each environment comes with different challenges and best practices when it comes to deployment, which I’ll describe these below.
We use GitHub to host our code, Jenkins and Travis CI to automate our tests, and right now most of our applications and code live in VMs in a Colo just outside of Toronto. We’re in the process of transitioning to AWS but I’ll leave that for another post.
Other common technologies we use are Postgres (primary DBs), Redis (cache and key/value store), RabbitMQ (messaging broker) and Celery (task queue). In future blog posts, we’ll break down the individual teams and stacks.
Historically each team developed their own processes around code reviews, migrations and deployments. As we’ve grown we’ve made an effort to standardize our processes to make it easier for teams to share developers and resources. It’s still apparent in how we deploy with each team doing it slightly differently.
This is our simplest and oldest method of deploying our applications. It uses Fabric, a command line tool for simplifying SSH connections and git to update the code on our servers and deploy a new version of our applications.
The above gist is pretty close to what we do in production for our Django applications. We do have some code that determines which environment to deploy to as well as posting the status of deploys to Slack.
As you can see this deploy strategy is extremely simple and has worked across multiple teams for many years. The simplicity allows us to deploy new applications quickly and confidently and bring new employees up to date and deploying their changes within the first week of them starting.
There are some downsides to such a simple deploy process. Server, GitHub and PyPI connection issues can leave the application in an inconsistent state. In theory, running multiple deploys at the same time could cause migration issues. In practice, we rarely run into connection issues and because we post the status of deploys to a slack channel all it takes is for a developer to check that other deploys have finished before starting their own.
An approach we’ve taken more recently is to build out a new environment entirely on each of the machines and then update the symlinks Nginx uses to serve the content. This allows us to deploy and verify that the application is in a consistent state before switching the symlinks and restarting the application servers.
Below is an example of how we deploy one of our React applications.
This deploy is almost as simple as the Git Pull method but adds an additional step of cloning into a new folder (e.g. app-2018010410224 ) and only switching the symlink once it’s been deployed across all of our servers.
The main benefit to this strategy is that if any of the requirements fail (connecting to a server, cloning the repo, building the application) across any of the servers the entire deploy will halt and production won’t be updated. This has been extremely useful as we’ve started moving to AWS and noticed an increase in connection timeouts. Rolling back deploys is also easier with this strategy. You simply update the symlinks to a previous working deploy and restart Nginx.
There are still a couple of downsides to this deploy strategy. Servers can still have different dependencies if you don’t lock down your pip requirements or node requirements and you need to manage how many past deployments you keep on the server or you can potentially run into using up the entire disk space. For a React application, you’re also still installing all of your node modules on each server which not only takes a long time but also uses a lot of disk space. This can be easily fixed by building locally and shipping off the application to each server.
As you can see from the above examples we’ve been able to deploy tens of thousands of times by using simple deployment strategies. As we start to push past 50 developers we’re looking at ways to improve how we deploy code.
The biggest thing we’re looking at right now is continuous deployment after all of our tests are run. We already have Continuous Integration for all of our applications so the next logical step is to bundle up the environment, requirements and code and ship it off after the tests pass successfully. We’re also looking at containerizing all of our applications to help us manage deployment (we’ve just started looking at this).
How do you test and deploy your code? What strategies are you most comfortable with? We’d love to hear from you down in the comments.
Want to help G Adventures join our growing team and travel the world? Check out all our jobs and apply today.
Originally posted on tech.gadventures.com