When I had to compare two dictionaries for the first time, I struggled―a lot!
For simple dictionaries, comparing them is usually straightforward. You can use the ==
operator, and it will work.
However, when you have specific needs, things become harder. The reason is, Python has no built-in feature allowing us to:
- compare two dictionaries and check how many pairs are equal
- assert nested dictionaries are equal (deep equality comparison)
- find the difference between two
dict
s (dict diff) - compare
dict
s that have floating-point numbers as values
In this article, I will show how you can do those operations and many more, so let’s go.
Why You Need a Robust Way to Compare Dictionaries
Let's imagine the following scenario: you have two simple dictionaries. How can we assert if they match? Easy, right?
Yeah! You could use the
==
operator, off course!
>>> a = {
'number': 1,
'list': ['one', 'two']
}
>>> b = {
'list': ['one', 'two'],
'number': 1
}
>>> a == b
True
That's kind of expected, the dictionaries are the same. But what if some value is different, the result will be False
but can we tell where do they differ?
>>> a = {
'number': 1,
'list': ['one', 'two']
}
>>> b = {
'list': ['one', 'two'],
'number': 2
}
>>> a == b
False
Hum... Just
False
doesn't tell us much...
What about the str
's inside the list
. Let's say that we want to ignore their cases.
>>> a = {
'number': 1,
'list': ['ONE', 'two']
}
>>> b = {
'list': ['one', 'two'],
'number': 1
}
>>> a == b
False
Oops...
What if the number was a float
and we consider two floats to be the same if they have at least 3 significant digits equal? Put another way, we want to check if only 3 digits after the decimal point match.
>>> a = {
'number': 1,
'list': ['one', 'two']
}
>>> b = {
'list': ['one', 'two'],
'number': 1.00001
}
>>> a == b
False
You might also want to exclude some fields from the comparison. As an example, we might now want to remove the list
key->value from the check. Unless we create a new dictionary without it, there's no method to do that for you.
Can't it get any worse?
Yes, what if a value is a numpy
array?
>>> a = {
'number': 1,
'list': ['one', 'two'],
'array': np.ones(3)
}
>>> b = {
'list': ['one', 'two'],
'number': 1,
'array': np.ones(3)
}
>>> a == b
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-eeadcaeab874> in <module>
----> 1 a == b
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Oh no, it raises an exception in the middle of our faces!
Damn it, what can we do then?
Using the Right Tool for the Job
Since dict
s cannot perform advanced comparisons, there are only two forms of achieving that. You can either implement the functionality yourself or use a third party library. At some point in your life you probably heard about not reinventing the wheel. So that's precisely what we're going to do in this tutorial.
We'll adopt a library called deepdiff
, from zepworks. deepdiff
can pick up the difference between dictionaries, iterables, strings and other objects. It accomplishes that by searching for changes in a recursively manner.
deepdiff
is not the only kid on the block, there's also Dictdiffer, developed by the folks at CERN. Dictdiffer
is also cool but lacks a lot of the features that make deepdiff
so interesting. In any case, I encourage you to look at both and determine which one works best for you.
This library is so cool that it not only works with dictionaries, but other iterables, strings and even custom objects. For example, you can "even mix and match" and take the difference between two lists of dicts.
Getting a Simple Difference
In this example, we'll be solving the first example I showed you. We want to find the key whose value differs between the two dict
s. Consider the following code snippet, but this time using deepdiff
.
In [1]: from deepdiff import DeepDiff
In [2]: a = {
...: 'number': 1,
...: 'list': ['one', 'two']
...: }
In [3]: b = {
...: 'list': ['one', 'two'],
...: 'number': 2
...: }
In [4]: diff = DeepDiff(a, b)
In [5]: diff
Out[5]: {'values_changed': {"root['number']": {'new_value': 2, 'old_value': 1}}}
Awesome! It tells us that the key 'number'
had value 1 but the new dict
, b, has a new value, 2.
Ignoring String Case
In our second example, we saw an example where one element of the list was in uppercase, but we didn't care about that. We wanted to ignore it and treat "one"
as "ONE"
You can solve that by setting ignore_string_case=True
In [10]: a = {
...: 'number': 1,
...: 'list': ['ONE', 'two']
...: }
...:
In [11]: b = {
...: 'list': ['one', 'two'],
...: 'number': 1
...: }
In [12]: diff = DeepDiff(a, b, ignore_string_case=True)
In [13]: diff
Out[13]: {}
If we don't do that, a very helpful message is printed.
In [14]: diff = DeepDiff(a, b)
In [15]: diff
Out[15]:
{'values_changed': {"root['list'][0]": {'new_value': 'one',
'old_value': 'ONE'}}}
Comparing Float Values
We also saw a case where we had a float
number that we only wanted to check if the first 3 significant digits were equal. With DeepDiff
it's possible to pass the exact number of digits AFTER the decimal point. Also, since float
s differ from int
's, we might want to ignore type comparison as well. We can solve that by setting ignore_numeric_type_changes=True
.
In [16]: a = {
...: 'number': 1,
...: 'list': ['one', 'two']
...: }
In [17]: b = {
...: 'list': ['one', 'two'],
...: 'number': 1.00001
...: }
In [18]: diff = DeepDiff(a, b)
In [19]: diff
Out[19]:
{'type_changes': {"root['number']": {'old_type': int,
'new_type': float,
'old_value': 1,
'new_value': 1.00001}}}
In [24]: diff = DeepDiff(a, b, significant_digits=3, ignore_numeric_type_changes=True)
In [25]: diff
Out[25]: {}
Comparing numpy
Values
When we tried comparing two dictionaries with a numpy
array in it we failed miserably. Fortunately, DeepDiff
has our backs here. It supports numpy
objects by default!
In [27]: import numpy as np
In [28]: a = {
...: 'number': 1,
...: 'list': ['one', 'two'],
...: 'array': np.ones(3)
...: }
In [29]: b = {
...: 'list': ['one', 'two'],
...: 'number': 1,
...: 'array': np.ones(3)
...: }
In [30]: diff = DeepDiff(a, b)
In [31]: diff
Out[31]: {}
What if the arrays are different?
No problem!
In [28]: a = {
...: 'number': 1,
...: 'list': ['one', 'two'],
...: 'array': np.ones(3)
...: }
In [32]: b = {
...: 'list': ['one', 'two'],
...: 'number': 1,
...: 'array': np.array([1, 2, 3])
...: }
In [33]: diff = DeepDiff(a, b)
In [34]: diff
Out[34]:
{'type_changes': {"root['array']": {'old_type': numpy.float64,
'new_type': numpy.int64,
'old_value': array([1., 1., 1.]),
'new_value': array([1, 2, 3])}}}
It shows that not only the values are different but also the types!
Comparing Dictionaries With datetime
Objects
Another common use case is comparing datetime
objects. This kind of object has the following signature:
class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0)
In case we have a dict
with datetime
objects, DeepDiff
allows us to compare only certain parts of it. For instance, if only care about year, month, and day, then we can truncate it.
In [1]: import datetime
In [2]: from deepdiff import DeepDiff
In [3]: a = {
'list': ['one', 'two'],
'number': 1,
'date': datetime.datetime(2020, 6, 17, 22, 45, 34, 513371)
}
In [4]: b = {
'list': ['one', 'two'],
'number': 1,
'date': datetime.datetime(2020, 6, 17, 12, 12, 51, 115791)
}
In [5]: diff = DeepDiff(a, b, truncate_datetime='day')
In [6]: diff
Out[7]: {}
Comparing String Values
We've looked at interesting examples so far, and it's a common use case to use dict
s to store strings values. Having a better way of contrasting them can help us a lot! In this section I'm going to explain you another lovely feature, the str
diff.
In [13]: from pprint import pprint
In [17]: b = {
...: 'number': 1,
...: 'text': 'hi,\n my awesome world!'
...: }
In [18]: a = {
...: 'number': 1,
...: 'text': 'hello, my\n dear\n world!'
...: }
In [20]: ddiff = DeepDiff(a, b, verbose_level=2)
In [21]: pprint(ddiff, indent=2)
{ 'values_changed': { "root['text']": { 'diff': '--- \n'
'+++ \n'
'@@ -1,3 +1,2 @@\n'
'-hello, my\n'
'- dear\n'
'- world!\n'
'+hi,\n'
'+ my awesome world!',
'new_value': 'hi,\n my awesome world!',
'old_value': 'hello, my\n'
' dear\n'
' world!'}}}
That's nice! We can see the exact lines where the two strings differ.
Excluding Fields
In this last example, I'll show you yet another common use case, excluding a field. We might want to exclude one or more items from the comparison. For instance, using the previous example, we might want to leave out the text
field.
In [17]: b = {
...: 'number': 1,
...: 'text': 'hi,\n my awesome world!'
...: }
In [18]: a = {
...: 'number': 1,
...: 'text': 'hello, my\n dear\n world!'
...: }
In [26]: ddiff = DeepDiff(a, b, verbose_level=2, exclude_paths=["root['text']"])
...:
In [27]: ddiff
Out[27]: {}
If you want even more advanced exclusions, DeepDiff
also allow you to pass a regex expression. Check this out: https://zepworks.com/deepdiff/current/exclude_paths.html#exclude-regex-paths.
Conclusion
That's it for today, folks! I really hope you've learned something new and useful. Comparing dict
's is a common use case since they can used to store almost any kind of data. As a result, having a proper tool to easy this effort is indispensable. DeepDiff
has many features and can do reasonably advanced comparisons. If you ever need to compare dict
's go check it out.
Other posts you may like:
See you next time!
This post was originally published at https://miguendes.me
Top comments (2)
Necessary subject done well.
Great examples and I can see a great deal of organization too.
Thanks Matt, your feedback is much appreciated. I'm gald you liked the post and hope it can be useful to you!