Image Hashing
The problem
Say you have an API that allows customers to upload photos and you want to ban some set of nuisance photos. You might immediately think to use one of the hash functions you already know to check new uploads against a deny list. Hash functions like MD5
, SHA1
, or some other cryptographic hash function may come to mind. And this would work assuming that the photo never changes by even a single bit.
Unfortunately for us and our hypothetical API it is very easy to create two images that are perceived by a human to be the exact same, but whose cryptographic hashes are totally different. Take for instance the following two photos:
The left hand photo has a SHA1 sum of
bb3ff9bdf224e8a058d3c4c3c5fb0dd45034519b wedding.jpeg
and the right hand photo has a SHA1 sum of
ed14e6d3eb42ad66d5a5f49e554b9e0f765da0cd wedding-compressed.jpeg
A human would perceive these two photos to be the same image. The only difference is that the right hand photo has been compressed by 1%. This effect can also be achieved by cropping 1 row or column of pixels or changing one pixel in the image from grey to 'slightly-darker-or-lighter-grey', as well.
Other hashing methods
So, clearly cryptographic hashes aren't going to work for us here. There are just too many ways to modify an image without losing the ability for a human to perceive them as the same. One solution to this problem is a class of hashes known as perceptual hashes. Some perceptual hashes are aHash, pHash, and dHash. The one we'll discuss in detail here and provide a collision/collision generator for is known as difference hashing
(dHash).
As far as I can tell, the definitive source on these hashes is a blog written by Dr. Neal Krawetz: http://www.hackerfactor.com/blog/ and For dHash specifically, you should read this post. If you read that post you should feel free to skip the next section as Dr. Krawetz goes more in depth than I will.
How dHash works
At a very high level the dHash algorithm is:
- Create an 8x8 array of booleans that is the hash output
- Convert an image to greyscale
- Resize the image to a 9x8 pixels
- For each pixel you compare the left hand pixel to the right
- If the right is brighter than the left, store True in the resulting hash array at the corresponding index
- otherwise store False
- Flatten the hash array
- Convert the bools to bits and turn it into a string
- Return that string as a hash
So for example, our wedding photo from above when converted and compressed turns into:
And results in the following hash:
[[ True False False False False True False False]
[ True False True False False True False False]
[ True True False True False True True False]
[ True False False True True False True False]
[ True False False True True False False True]
[ True False False True True False True True]
[ True True False False False False True True]
[ True True True False False True False False]]
Where the first item in the hash matrix (0,0) is True
because the pixel in the image at (0,0) is slightly darker than the pixel in the image at (0,1).
When this hash array is folded and turned into a string, it results in the dHash value 84a4d69a999bc3e4
. According to Dr. Krawetz, who has a larger data set than I do, this method has a relatively low false-positive and false-negative rate and is significantly faster than the other perceptual hashes tested.
Creating Collisions
So, dHash is pretty cool! But what if I want to intentionally create a collision? What if I have two images and I want them to have the same dHash value while being perceived by a human to be totally different images? That was the question I asked myself and was surprised to find that no one had answered (as far as I could tell). To be fair, even if someone had answered it, I probably would have tried to do it myself anyway.
I will call the image whose hash we want to match the collision target
and the image we're going to modify the mod image
. My algorithm for creating a collision is:
- get the hash for
collision target
- Divide the image into 9x8 sub-images
- Iterate over the hash of
collision target
:- calculate the current hash of
mod_image
- if True: check that the sub-image to the right of the current is brighter than the current. If it's not, brighten it!
- if False: check that the sub-image to the right of the current is darker than the current. If it's not, darken it
- rebuild mod-image from the sub_images
- calculate the current hash of
- Check the hashes for equality re-iterate as necessary (repeat step 3) to make sure that the sampling used during resizing doesn't break the collision
Example
Given the following two images where the left hand image is the collision target
and the right hand is the mod image
:
We can run the above algorithm and produce the following image which is a dHash collision with the collision target
:
Both of these images have a dHash value of 0193210ed49192c8
Proof of concept code
Proof of concept code can be found here: https://github.com/LivingInSyn/image_collsion_gen
At current this code will not work on very dark images due to the fact that brightening black just results in...more black. I'll spend some time trying to make it work in those situations though.
Conclusion
I'm not sure where exactly attacker generated collisions would have the greatest impact. It would potentially allow a malicious actor to post a picture that would cause other legitimate pictures to get deny-listed which would result in a denial of service type condition. It could also allow a malicious actor to post images that are supposed to be denied by making it collide with a picture on an allow list. Ultimately it is simply proof that you should not expect these hashes to be perfect or collision resistant.
That this was fun code to write. It nerd sniped me super hard for a few days and there are a lot of ways to go from here. Expanding this code to generate collisions among multiple hash types would be a logical next step, or to find ways to make the image less perceptibly modified. I'd love to hear your ideas if you have them :)
Top comments (0)