Author's note: this blog post ISN'T a beginner's guide to Python or DevOps.
Basic knowledge of Python, DevOps, Kubernetes, and Helm is assumed.
0 Background
0.1 Why Python
Programming languages rise and fall over time.
TIOBE, a Dutch software quality assurance company, has been tracking the popularity of programming languages. According to its programming community index (and its CEO, Paul Jansen, for that matter), Python ranks No.1 now: "for the first time in more than 20 years we have a new leader of the pack: the Python programming language. The long-standing hegemony of Java and C is over."
0.2 Why DevOps with Python
To quote Real Python:
Python is one of the primary technologies used by teams practicing DevOps. Its flexibility and accessibility make Python a great fit for this job, enabling the whole team to build web applications, data visualizations, and to improve their workflow with custom utilities.
On top of that, Ansible and other popular DevOps tools are written in Python or can be controlled via Python.
Plus, I'm a big fan of Python's easy-to-read, no-bracket code style.
One might wonder why easy-to-read is so essential. The 'puter has no problem executing code with ambiguous variable names, lengthy functions, a single file of a thousand (if not thousands) of lines of code, or all of them together, anyway. It will run properly, right?
Well, yes. But to quote Knuth:
Programs are meant to be read by humans and only incidentally for computers to execute.
All the methodologies and ideas, like refactoring, clean code, naming conventions, code smell, etc., are invented so that we, humans, can read the code better, not computers can run it better.
0.3 Why Concurrency
OK, this one is easy:
Because we can.
Jokes aside, the reason is, of course, performance:
Concurrent is faster (usually).
For instance, if you have multiple Helm charts installed in one namespace of a Kubernetes cluster and you want to purge all the Helm releases, of course, you can uninstall them one by one, waiting for the first release to be uninstalled, then start uninstalling the second, etc.
For some applications, the Helm uninstall part can be slow.
Even if for a few simple charts, uninstalling them concurrently can still drastically save time.
Based on a local test, uninstalling three helm charts (Nginx, Redis, and MySQL) one by one takes c.a. 0.8 second, while it takes 0.48s if done concurrently, a whopping 40% reduction.
If the scale of the problem goes up, like you have tens of charts to uninstall and you need to do them in multiple namespaces, the amount of time saved must be addressed.
Next, let's deal with this particular example using Python.
1 The Task
You have multiple teams and developers who share the same Kubernetes cluster as the dev ENV.
To achieve resource segregation, one namespace is assigned to each developer. Each developer needs to do some Helm install to get their apps and dependencies up and running so that they can develop and test them.
Now, since there are many namespaces, many Helm releases in each namespace, and many pods, which take up many nodes, you might wanna optimize the cost by deleting all those pods at the end of the working hours so that the cluster can scale down to save some dollars of the VM cost.
You want some form of automation that uninstalls all releases in some namespaces.
Let's tackle this issue in Python. For demonstration purposes, we will install nginx
, redis
and mysql
into the default
namespace, then write some automagic stuff to delete 'em.
Let's go.
2 Preparing the Environment for Testing Our Automation
Not that we are doing test-driven development, but before writing any code, let's create a local environment as a mock of this issue at hand so that we have something to test our automation script.
Here, we use Docker, minikube, and Helm. If you haven't installed them yet, check out the official websites:
- https://docs.docker.com/get-docker/
- https://minikube.sigs.k8s.io/docs/start/
- https://helm.sh/docs/intro/install/
Start your local Kubernetes cluster by running:
minikube start
Then, install some Helm charts in the default
namespace, which we will use automation to delete later:
# make sure we select the default namespace
kubectl config set-context --current --namespace=default
# add some helm repos and update
helm repo add nginx-stable https://helm.nginx.com/stable
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# install three applications that we will use automation to delete
helm install nginx nginx-stable/nginx-ingress
helm install redis bitnami/redis
helm install mysql bitnami/mysql
Local testing mock done.
3 Non-Concurrent Version
First, let's write some single-thread, non-concurrent code to solve this issue.
We will use the subprocess
module to run helm list
to get all the releases, run a simple loop over all the releases, and then helm uninstall
them. Nothing fancy here; only use Python to run some CLI commands.
Talk is cheap; show me the code:
https://gist.github.com/IronCore864/ca1e74a65f4a97937d93c63c094e9d32
4 Introducing concurrent.futures
4.1 multiprocessing
and threading
Before jumping right into concurrent.futures
(as advertised in the title of this blog), let's talk multiprocessing
and threading
for a bit:
- The
threading
module lets you work with multiple threads (also called lightweight processes or tasks) — multiple threads of control sharing their global data space. -
multiprocessing
is a package that supports spawning processes. Themultiprocessing
solves the Global Interpreter Lock issue using subprocesses instead of threads.
When choosing between the two, simply put (might not be 100% precise, but that's the gist):
- If your task is CPU-intensive, go for
multiprocessing
(which bypasses the GIL issue by utilizing multiple processes instead of threads). - If your task is I/O-intensive, the
threading
module should work.
4.2 What is concurrent.futures
, anyway?
Now that we've got these two modules out of the way, what's concurrent.futures
?
It is a higher-level interface to start async tasks and an abstraction layer on top of threading
and multiprocessing
. It's the preferred tool when you just want to run a piece of code concurrently and don't need the extra functionalities provided by the threading
or multiprocessing
module's API.
4.3 Learn with an Example
OK, enough theory, let's get our hands dirty and learn by an example, an example from the official documentation:
import concurrent.futures
import urllib.request
URLS = ['http://www.cnn.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
data = future.result()
print('%r page is %d bytes' % (url, len(data)))
Some observations:
- An
executor
, whatever it is, is required. (Which can either beThreadPoolExecutor
orProcessPoolExecutor
, according to the doc.) - The
Executor.submit()
method "submits" (or, in plain English, "schedules") the function calls (with parameters) and returns a Future object. - The
concurrent.futures.as_completed
method returns an iterator over the Future instance.
5 The Concurrent Code to Solve the Task
Once we understand the syntax and get a basic understanding of how it actually works, by copying and pasting from the example and being a little creative, it's easy to convert our non-concurrent version from the previous section into something concurrent. To put it all together:
See the code below:
https://gist.github.com/IronCore864/ad21130aa796d407624805c5342201db
Voila!
Note that the concurrent.futures
part is of precisely the same structure as the official concurrent.futures
example.
Summary
concurrent.futures
cheat sheet:
# or, with concurrent.futures.ProcessPoolExecutor()
with futures.ThreadPoolExecutor(max_workers=5) as executor:
future_objects = {
executor.submit(some_func, param1, param2 ...): param1 for param1 in xxx
}
for f in futures.as_completed(future_objects):
res = future_objects[f]
do_something()
Rule of thumb: use ThreadPoolExecutor
for I/O-intensive workload and ProcessPoolExecutor
for CPU-intensive workload.
If you enjoyed this article, please like, comment, subscribe. See you in the next piece.
Top comments (0)