Terminal/Commandline Trick: Multiprocessing Progress Bar — Python atpbar
In this blog, we will discuss a simple yet very useful library for python terminal/command line usecases. Very often we work with command line python scripts to process a significantly high volume of data or files. Expectation is to have a way to track how many files are processing and what’s the speed of every process.
The command line progress bars are here to rescue us.There are multiple progress bars available in the market and you can read more about open source python commandline progress bars.
How to create a progress bar on the command line in Python?
In this blog, we will discuss atpbar. Python multiprocessing enabled progress bar for terminal. Atpbar provides following features:
- Easy to install.
- Minimalistic progress bar without any fancy UX thus quite simple to implement.
- Compatible with multi-processing and multi-threading.
- Can add name to every subprocess in multiprocessing and multithreading.
- Python terminal progress bars simultaneously grow to show the progress of iterations of loops in threading or multiprocessing tasks.
- Compatible with Jupyter Notebook.
- On TTY devices where progress bar is not compatible, it can show the status with numbers without progress bar.
- The object atpbar is an iterable that can wrap another iterable and shows the progress bars for outer and inner iterations.
- Break and exception exit the code and progress bar will stop right there.
How to install python atpbar for commandline progress bar?
Create virtualenv, if not present, using the following command:
virtualenv -p python3.9
venv source venv/bin/activate
python3 --version
Now install atpbar using below command for multi-processing python terminal/command line progress bar.
pip install -U atpbar
How to use atpbar?
you can find more details on exact implementation onpython foundation website or on github page of atpbar. In this article I will explain the functionality in brief.
One loop
import time, random
from atpbar import atpbar
n = random.randint(1000, 10000)
for i in atpbar(range(n)):
time.sleep(0.0001)
A python terminal progress bar will look something like this
In order for atpbar to show a progress bar, the wrapped iterable needs to have a length. If the length cannot be obtained by len(), atpbar won't show a progress bar.
Nested loops
atpbar can show progress bars for nested loops as shown in the below example.
for i in atpbar(range(4), name='outer'):
n = random.randint(1000, 10000)
for j in atpbar(range(n), name='inner {}'.format(i)):
time.sleep(0.0001)
In this example, outer loop will iterate 4 times while inner loops are processing.
Threading
atpbar can show multiple progress bars for loops concurrently iterating in different threads.
from atpbar import flush
import threading
def run_with_threading():
nthreads = 5
def task(n, name):
for i in atpbar(range(n), name=name):
time.sleep(0.0001)
threads = []
for i in range(nthreads):
name = 'thread {}'.format(i)
n = random.randint(5, 100000)
t = threading.Thread(target=task, args=(n, name))
t.start()
threads.append(t)
for t in threads:
t.join()
flush()
run_with_threading()
As shown in below screenshot, tasks are running concurrently and python terminal progress bar will show the staus of each tasks simultaneously.
One important thing to notice here is flush() function that returns when loops have finished and informs main thread or main program to finish updating progress bars.
As a task completes, the progress bar for the task moves up. The progress bars for active tasks are at the bottom.
Multiprocessing
import multiprocessing
multiprocessing.set_start_method('fork', force=True)
from atpbar import register_reporter, find_reporter, flush
def run_with_multiprocessing():
def task(n, name):
for i in atpbar(range(n), name=name):
time.sleep(0.0001)
def worker(reporter, task, queue):
register_reporter(reporter)
while True:
args = queue.get()
if args is None:
queue.task_done()
break
task(*args)
queue.task_done()
nprocesses = 4
ntasks = 10
reporter = find_reporter()
queue = multiprocessing.JoinableQueue()
for i in range(nprocesses):
p = multiprocessing.Process(target=worker, args=(reporter, task, queue))
p.start()
for i in range(ntasks):
name = 'task {}'.format(i)
n = random.randint(5, 100000)
queue.put((n, name))
for i in range(nprocesses):
queue.put(None)
queue.join()
flush()
run_with_multiprocessing()
With multiprocessing enabled with atpbar, two more functions come into play:
- find_reporter() — This function is required to be called into main thread or main process. This intimate main thread of atpbar to look for subprocesses.
- register_reporter() — This function is required to be called inside every new subprocesses. Every call from subprocess will be tracked by main thread and a new python terminal progress bar will be created.
Simultaneously growing python terminal-based progress bars will look something like this.
[AUTHOR’S CORNER]
This article is part one of the progress bar in Python series. Stay tuned for more such articles on singlequote.blog.If you find this exercise helpful then motivate me to write more such posts for you. Share this with your friends, family, and colleagues.
Originally published at https://singlequote.blog on September 2, 2023.
Top comments (0)