DEV Community

Hyunho Richard Lee for Meadowrun

Posted on • Originally published at Medium on

Running pytest in the Cloud for Fun and Profit

Introducing pytest-cloudist on Meadowrun

Nobody likes waiting, but it seems to be part of the developer life: we wait for builds, tests, code review, and deployments. This is especially annoying given that computers are more powerful than ever and the cloud promises access to infinite compute resources.


A python in the clouds. Generated by Stable Diffusion.

In this post, we introduce pytest-cloudist, a plugin for the Python testing library pytest. pytest-cloudist leverages Meadowrun to run pytest on any number of cloud machines with a minimal amount of fuss. The name is a riff on the venerable pytest-xdist, which is mainly used for running tests locally in parallel. pytest-xdist does support distributed runs using SSH, but pytest-cloudist develops this capability further by provisioning cloud compute on-demand and synchronizing our code and libraries across machines seamlessly.

We’ll introduce pytest-cloudist through a case study of running the pandas unit tests on AWS EC2 virtual machines.

Running the pandas tests locally

First, we’ll get the pandas tests running locally to establish a baseline. For all of the local runs, we’ll be using a laptop which has an Intel Core i7 with 12 vCPUs and 16 GiB RAM.

The development documentation has a few well-documented options to set up a pandas development environment. We went with a low-tech option, a virtualenv with libraries installed by pip. The requirements-dev.txt in the pandas repo is quite large, so we trimmed it down to the essentials which means we’ll skip a few of the integration tests involving other packages like geopandas.

We’ll also skip tests marked as slow, network, or db, following the logic in test_fast.sh, and we also explicitly disabled a handful of flaky tests. This still leaves over 100,000 tests to run.

Running vanilla pytest

One last bit of configuration. We used a pytest hook to turn off printing dots to the terminal for each completed test. This was adding a minute to the total runtime and wasn’t providing much value for running more than 100,000 tests. (Don’t get me started on how my laptop runs Fortnite at 60fps but printing dots to the terminal takes ages.)

On my laptop, running pytest on the pandas tests takes about 13.5 minutes:

> time pytest --skip-slow --skip-network --skip-db -m "not single_cpu" pandas

============= 145386 passed, 21573 skipped, 1230 xfailed, 1 xpassed, 22 warnings in 810.80s (0:13:30) ==============

real 13m34.698s
user 12m25.510s
sys 0m22.408s
Enter fullscreen mode Exit fullscreen mode

Running in parallel with pytest-xdist

Pandas recommends running pytest-xdist with 4 workers in test_fast.sh, but on my laptop, that actually results in a slowdown because the worker processes run out of memory. I did manage to get good results with 2 worker processes though:

> time pytest --skip-slow --skip-network --skip-db -m "not single_cpu" -n 2 pandas

============= 145386 passed, 21573 skipped, 1230 xfailed, 1 xpassed, 22 warnings in 447.80s (0:07:27) ==============

real 7m32.799s
user 16m18.856s
sys 0m56.269s
Enter fullscreen mode Exit fullscreen mode

Nice — almost a 2x speedup.

Amdahl’s Law

Why don’t we get the full 2x speedup, i.e. just under 7 minutes? Because of Amdahl’s law — not all the work is parallelizable.

First, pytest collects tests to run on a single thread, which takes over a minute for the pandas tests:

time pytest --skip-slow --skip-network --skip-db -m "not single_cpu" --collect-only pandas

=== 168158/169338 tests collected (1180 deselected) in 72.39s (0:01:12) ===

real 1m19.560s
user 0m59.860s
sys 0m7.425s
Enter fullscreen mode Exit fullscreen mode

There’s also time spent aggregating the results as tests complete, which is harder to isolate and measure.

Back of the envelope, we have 13.5 minutes of work total, of which 1 minute is test collection time, and about another minute is test aggregation and reporting time. That leaves 11.5 minutes of embarrassingly parallel work, i.e. running the tests, which lines up with the roughly 7.5 minute runtime (2 + 11.5 / 2) that we see in practice for two workers.

I’d like to try more workers, but sadly, two is close to what my laptop can handle — I had trouble doing anything else while the test was running as it consumed almost all of my laptop’s memory.

The current state of the art then, puts me between a rock and a hard place: I can run tests slowly sequentially and use my laptop for something else, or I can run tests quickly in parallel but make my laptop unusable.

Running pandas’ tests in the cloud

With pytest-cloudist, there’s now a third option: I can run the tests in parallel on AWS spot instances. Pytest-cloudist is a fairly thin wrapper around Meadowrun, which does the heavy lifting of creating EC2 instances, starting workers and deploying the environment and local code. In theory this is a seamless experience (Meadowrun maintainer bias warning!).

Getting started with pytest-cloudist

pytest-cloudist installation is like any other pytest plugin: you install it using pip or poetry alongside pytest, and it’s automagically available. If you haven’t set up Meadowrun in your AWS account before, there’s an additional step which we won’t repeat here.

By default cloud distribution is not enabled; it only kicks in if you pass the --cloudist test or --cloudist file arguments to pytest. The former distributes each individual test as a separate task and the latter distributes each test file as a task. Since pandas has over 100,000 tests, making tests into individual tasks introduces way too much overhead, so we’ll use per-file distribution exclusively in this post. There are about 850 test files, so still plenty of parallelization opportunity.

Further pytest-cloudist options all start with --cd. There are options to control the number of workers and how much CPU and memory each worker needs. This information is passed straight to Meadowrun which takes care of creating machines of the right size.

There’s also an option --cd-tasks-per-worker-target for combining tests or files into bigger tasks to maximize performance. Workers invoke pytest once per task, but there is some overhead for each invocation of pytest. To reduce this overhead, we can ask cloudist to combine multiple files into tasks so each task runs multiple files in one go. For example, if there are 850 test files, a run with --cloudist file --cd-num-workers 4 --cd-tasks-per-worker-target 10 tries to create 40 tasks total (4 workers*10 tasks/worker). Each task consists of roughly 21 test files each (850 files//40 tasks).

--cd-tasks-per-worker-target 1 minimizes the pytest invocation overhead, but also introduces a problem: a single slow task could take significantly longer than the other tasks, resulting in a longer overall runtime. We find empirically that a tasks per worker target between 5 and 20 works well.

Configuration for the pandas tests

The first issue we ran into when trying to run the pandas tests using pytest-cloudist is that pandas has Cython dependencies which Meadowrun currently doesn’t take care of automatically. To solve this, we compiled locally and then used cloudist’s --cd-extra-files argument to sync the .so and .pxd files to the remote machines. The --cd-extra-files argument is similar to pytest-xdist’s --rsyncdir.

The second issue is that some of the tests rely on data files. We can use the same --cd-extra-files argument to make sure these files are copied to the remote machines.

Here’s the full command line:

time pytest 
--skip-slow --skip-network --skip-db -m "not single_cpu" 
--cloudist file
--cd-extra-files 'pandas/_libs/**/*.pxd'
--cd-extra-files 'pandas/_libs/**/*.so' 
--cd-extra-files 'pandas/io/sas/*.so'
--cd-extra-files 'pandas/tests/ **/data/**' 
--cd-num-workers 2
--cd-cpu-per-worker 2
--cd-memory-per-worker 6 
--cd-tasks-per-worker-target 20 
pandas
Enter fullscreen mode Exit fullscreen mode

Note that we didn’t have to specify explicitly which python files to make available or what the environment should be. pytest-cloudist, via Meadowrun, figures that out by itself.

Also, we’ve given each worker two CPUs instead of the default of one as this seems to benefit some of the tests.

We have lift off

When starting a run, pytest collects tests as normal, but then pytest-cloudist (or Meadowrun, really) kicks in to create the necessary EC2 instances and synchronize the current Python environment, code, and extra files:

Mirroring current pip environment
0/2 workers allocated to existing instances: 
The current run_map's id is 6610ce7a-08c6-4995-8a29-59e3c66dac68
Launched 1 new instance(s) (total $0.0411/hr) for the remaining 2 workers:
        ec2-18-219-20-201.us-east-2.compute.amazonaws.com: m5.xlarge (4.0 CPU, 16.0 GB), spot ($0.0411/hr, 2.5% eviction rate), will run 2 workers
Enter fullscreen mode Exit fullscreen mode

First, Meadowrun detected that we’re running in a pip virtual environment. Meadowrun recreates virtual environments on the remote machine by building a container. This can take some time, but the resulting container is cached in ECR.

Then, local code and extra files are zipped and uploaded to S3. The file is not re-uploaded if its contents hasn’t changed.

As a final step before running the actual tests, EC2 virtual machines are created or reused from previous jobs if they’re available. Meadowrun keeps machines around for a limited amount of time after they’re idle, to save on startup times. Meadowrun also tries to pack more than one worker on the same machine if it’s cost-effective, so the number of workers is likely greater than the number of machines. Here we’ve asked for 2 workers with 2 vCPUs and 6GiB of memory each, which will all be running on a single m5.xlarge machine. pytest-cloudist uses cheap spot instances by default.

With a lukewarm start (defined shortly) and two workers, running the tests takes about 11.5 minutes:

==== 145376 passed, 21594 skipped, 1180 deselected, 1225 xfailed, 1 xpassed, 63 warnings in 685.60s (0:11:25) ===

real 11m36.304s
user 1m59.359s
sys 0m6.353s
Enter fullscreen mode Exit fullscreen mode

A lukewarm start means:

  • The container with the virtual environment has been built and cached in ECR which saves about 2 minutes of container building.
  • The code has been uploaded to S3. For pandas, the zip file is about 230MB and this takes about 1min 30sec.
  • No EC2 instances have been created or warmed up yet. This means an EC2 instance needs to be created and booted, and the instance also needs to pull the cached Docker container image from ECR which takes about 15 seconds.

A cold start takes about 3 minutes longer than the lukewarm start, although there can be a good amount of variation on creating and launching an EC2 instance.

A warm start means that suitable EC2 instances are already running and have the necessary Docker container available locally. Running the tests in this case takes almost 9 minutes.

=========================================================== 145376 passed, 21594 skipped, 1180 deselected, 1224 xfailed, 3 xpassed, 63 warnings in 522.77s (0:08:42) ============================================================

real 8m50.467s
user 1m10.620s
sys 0m2.983s
Enter fullscreen mode Exit fullscreen mode

So in the best case of a warm start, pytest-cloudist is about 1min 20 sec slower than pytest-xdist when we use two workers.

We’re obsessed with Meadowrun performance, and we hope to close this gap over time. The lowest hanging fruit is to make code upload smarter. That said, there’s not much we can do about how long it takes for pip to build the virtualenv, or for AWS to launch an EC2 instance.

Turn it to eleven

Unlike pytest-xdist, however, we can easily add more workers now. Just by changing --cd-num-workers we can speed up our tests even more:

We decreased the tasks per worker target (--cd-task-per-worker-target) as we added more workers, because finer-grained tasks come with more overhead. The “Lukewarm” and “Warm” columns give the runtime under those conditions and “Lukewarm-Warm” shows the difference between the two. This difference mostly represents EC2 startup overhead.

There are clearly diminishing returns as more workers are added, but it’s still cool that we can run all of these tests in about 3 minutes, especially considering that the non-parallelizable portion takes up about 2 of those 3 minutes. On top of that, having my laptop available while I was running these tests in the cloud was great — a much better experience than running them locally.

Conclusion

This post introduced pytest-cloudist, a pytest plugin that distributes tests to EC2 virtual machines using Meadowrun. As a case study, we used it to distribute a subset of the pandas tests. Developing this plugin drove a number of performance and feature enhancements in Meadowrun in the recent 0.2 releases, and has given us ideas for a few more.

We hope you are inspired to give pytest-cloudist and Meadowrun a try. Do get in touch for feedback, questions or just to hang out!

To stay updated, star us on Github or follow us on Twitter!

Top comments (0)