Cloud instances are very useful when you need compute power but don't want to buy machines and deal with the maintenance burden of maintaining physical machines, or just don't need them long enough to make it worth buying hardware. The idea is basically rent a remote machine for the time you need them and pay exactly for what you used.
This can be a pretty simple concept but can be very hard to implement. There are literally hundreds, or even thousands, of combinations and use cases that cloud providers support. Our focus on this article is the most basic use case ever: you rent a VM and pay for how much time and space it uses.
This article is focused on the Google Cloud Platform quirks, and my specific project is a NixOS VPS to use GPUs in the cloud for cheap.
The strategy
Now (6/7/2022) there are some resources that are free forever, like:
- 30GB of total balanced storage
- A e2-micro (0.5VCPU burstable to 2VCPU) instance for the whole month
- 1GB of egress network in almost all continents
- Free egress network for Google services
The idea is to use Terraform to migrate one instance to TURBO MODE to do the beefy stuff, and use the same system in an e2-micro for configuration, preparation, uploads, maintenance and so on.
The turbo instance can be a spot instance for extra cheap. Spot instances are cheaper because these can be interrupted at any time.
My OS of choice is NixOS because of the code based construction and, in my opinion, is more convenient to setup and replicate. I can replicate the same setup as many times as I want doing specific customizations for one specific deployment if I want.
The code
The code I used in this experiment is stored in my dotfiles repo.
The Terraform file I am using for this experiment is stored in infra/gcp.tf
. The Terraform state is stored in Terraform cloud.
The NixOS machine that I run in GCP is defined in nodes/vps
.
There are some CUDA specific stuff in the file nvidia.nix
. Tesla K80 doesn't support the latest driver so I had to use legacy 470. Tested with some Blender renders and it's working fine with CUDA and works correctly for what I need.
How to deploy
To deploy the instance in normal mode I run the following command in the infra folder:
terraform apply
And if I want to migrate it to the turbo instance I run the following command in the infra folder:
terraform apply -var 'modo_turbo=true'
Final thoughs
It was a pretty challenging project, and I though I would spend much more money in this. I did this tiering stuff in two days. I took a lot of time to figure out how to configure CUDA then I realized that hardware.opengl.enable
must be enabled to allow software to find the CUDA driver 🤡.
Future work
I need to find a way to automatically turn off the turboed machine when the job is over, automatically. I might do some wizardry with systemd units. I will not probably update this post. The code that will be changed for that project is in the VPS folder or some common module that I could use in any of my machines.
Top comments (0)