Do you know how to keep track of the last five data points during some kind of processing instead of managing huge lists? How to append and pop items from the opposite ends of a data row faster and easier than with lists?
Today, I'm going to share with you all the how-tos of using the deque
from the collections
standard library module. First, we'll cover some basics, and then move on to solving problems. Hope you'll enjoy my post! If you do, please don't forget to like it :)
WHAT IS A DEQUE?
A list is one of the main workhorses in almost every Python script, yet, in some cases, opting for deques can lead to much faster performance. A deque is short for a double-ended queue, which is why it's called so. Double-ended means that it supports adding and removing elements from both ends.
In fact, deques are not entirely different from stacks and queues, they sort of integrate their features. This image may help you understand the distinction:
As you can see, deques are sort of a generalization of stacks and queues, but to see what deques really are in practice, we need to compare them with lists, so we won't be talking about stacks and queues anymore. To get the contrast between the two 'rivals', we're going to look more closely into how they're implemented under the hood.
Internally, deque is a representation of a doubly-linked list. Doubly-linked means that it stores at least two more integers (pointers) with each item, which is why such lists take up more memory space.
In contrast, lists in Python are implemented with fixed size memory blocks (arrays) and hence, they use less memory space than deques, but lists have to reallocate memory when a new item is inserted (except when appending).
DEQUE vs. LIST
This leads to the following distinctions in performance:
First, reallocation allows lists to access elements almost immediately by indexation. Using the Big-O notation, it can be measured as O(1).
If you're not familiar with the Big O notation, I'll just briefly inform you that it tells you how fast an algorithm (operation) works. It doesn't exactly measure time, but rather the possible maximum number of operations needed to perform a task.
O(1) means that it takes constant time to access data no matter how many elements are there. O(n) means that the more elements a structure holds, the longer it takes to find it. So, an O(1) algorithm is faster than O(n).
In our situation, a deque doesn't perform reallocation and to find an element by an index, internally Python will need to iterate over the deque. So, if there are n elements in the deque, the interpreter will have to carry out up to n operations in the worst-case scenario (if the element is at the very end). For lists, it's always O(1).
So, for accessing elements, lists are always a better choice, it's not at all what deques were designed for.
Second, because deques are implemented as doubly-ended arrays, they have the advantage when appending or popping from both the right and the left side of a deque (measured as O(1)).
It happens because internally only pointers should be corrected to insert a new node on a given position. It works just for the end sides, because anywhere else the interpreter will have to iterate over the deque, and it'll be O(n) time instead of O(1).
In contrast, lists don't have that luxury. Due to the reallocation, doing the same can take as many as O(n) operations. Well, actually, lists do support fast append and pop operations (measuread as O(1)), but only at the end.
Moreover, changes in the list trigger the reallocation process, which takes additional time on its own, while deque's append and pop performance is consistent, because it never uses reallocation.
So, when you need to add or remove elements from the 'head' or
'tail', use deques.
If your code doesn't process much data, this advantage of deques can be neglected, but it can have a significant performance boost if you process large amounts of data.
Third, deques are well suited for tracing the last occurrences or last elements of something (e.g., five latest client's transactions), because they allow to set the maximum length. Of course, it can be done using lists, but with deques, you already have the interface for that, and it's much faster.
It should also be mentioned that deques are thread-safe for appends and pops from opposite sides, because the append()
, appendleft()
, pop()
, popleft()
methods are atomic in CPython.
By the way, if you don't know much about threading, I recommend you to read my article where I explain the difference between threading and asyncio (so, you may shoot two birds with one stone and grasp both concepts if you don't know asyncio yet.)
A little cheat sheet to help you remember the performance difference between deques and lists:
To summarize the above,
Python lists are much better for random access and fixed-length operations, including slicing, while deques are much more useful for pushing and popping things off the ends, with indexing (but not slicing, interestingly) being possible but slower than with lists. (source)
PROBLEM SOLVING
Now, let's go through a couple of examples to see how it all works. I wouldn't explain the deque interface in detail, cause it's quite similar to what you've been doing with lists, and it's laid out very well in the collections documentation.
TRACING LAST TRANSACTIONS
In many apps, you've probably seen something like 'Last visits', 'Previous songs', 'Latest articles'. Now, let's see how we can easily keep a history of the x last user transactions with a deque.
To see if it's working, we'll also store all transactions in a list so we can make sure that the last of them actually match those we'll put in our deque:
from collections import deque
from random import randint
ts_all = [] # to keep all transactions
ts_history = deque(maxlen=10) # to keep only the latest
for i in range(30):
ts_sum = randint(1, 10000) # random transaction sum
record = f'Transaction sum: {ts_sum} dollars.'
ts_all.append(record) # store the record in a list
ts_history.append(record) # store the record in a deque
# Let's print our transactions from the deque:
for ts in ts_history:
print(ts)
# Now, make sure if our deque transactions match
# ten last transactions in the list:
assert list(ts_history) == ts_all[-10:]
The interpreter din't throw an error, so we can see that we got it right.
FIND A TEXT PATTERN WITH SOME PREVIOUS CONTEXT
Suppose we need to find a pattern in a text file, and provide it with some previous context (a couple of lines will do). In my code, I used 'The Adventures of Sherlock Holmes' in a .txt format, which you can download here for your own experiments. Somewhere in the text I put a phrase 'Harry Potter was there!' which is the pattern we're going to look for :)
We're all set to go:
from collections import deque
def search(text: 'IO', pattern: str, maxlen= 4):
previous_lines = deque(maxlen=maxlen)
for line in text:
if pattern in line:
yield line, previous_lines
else:
previous_lines.append(line)
if __name__ == '__main__':
with open('Sherlock Holmes.txt', 'r') as file:
pattern = 'Harry Potter was there!'
for line, previous_lines in search(file, pattern):
for item in previous_lines:
print(item, end='') # suppress extra new lines
print(line)
After running the script, we'll get:
somewhere. I lay listening with all my ears. Suddenly, to my
horror, there was a distinct sound of footsteps moving softly in
the next room. I slipped out of bed, all palpitating with fear,
and peeped round the corner of my dressing-room door.
...Harry Potter was there!
I found a nice place to put my Harry Potter phrase and spoil the atmosphere of horror Conan Doyle worked so hard on :)
RETURN N LAST LINES OF A FILE
Now, as in the last example, let's see how to retrieve the last n
lines from a file. I'll be torturing our Sherlock Holmes file again :)
from collections import deque
def tail(filename, n=6):
'Return the last n lines of a file'
with open(filename) as f:
return deque(f, n)
if __name__ == '__main__':
filename = 'Sherlock Holmes.txt'
last_lines = tail(filename)
for line in last_lines:
print(line, end='')
After executing, we'll get three lines if we use poor Sherlock Holmes. Yes, we asked for six, but we've suppressed the empty lines with print(line, end='')
:
interest in her when once she had ceased to be the centre of one
of his problems, and she is now the head of a private school at
Walsall, where I believe that she has met with considerable success.
SUMMARY
To sum it all up, in most use cases lists are fine. However, depending on specific tasks, there can be significant performance and thread safety advantages to a deque.
I hope you now understand how deques work. If you feel like learning some more interesting stuff about Python, I suggest you read my posts about starred expressions: the basics and problem solving parts.
Top comments (1)
Thank you for your thorough analysis. I've provided a time comparison in this post: dev.to/wnleao/python-deque-vs-list...