DEV Community

loading...
Cover image for How to find an impostor binary search implementation in Python! :-)

How to find an impostor binary search implementation in Python! :-)

Anurag Pandey
・4 min read

Recently I have been working on writing STL algorithms of C++ in Python (here). I came across a typical problem, which was how to test the implementation of binary search algorithm? Let us write some tests first.
You can write tests using any Python testing framework like pytest , unittest etc, here I am using unittest which is part of Python Standard Library.

import random
import unittest

from binary_search import binary_search

class BinarySearchTestCase(unittest.TestCase):

    def test_empty(self):
        arr = []
        self.assertFalse(binary_search(arr, 5))

    def test_true(self):
        arr = [1,2,3,4,5]
        self.assertTrue(binary_search(arr, 4))

    def test_false(self):
        arr = [1,2,3,4,5]
        self.assertFalse(binary_search(arr, 99))

    def test_on_random_list_false(self):
        arr = [random.randint(-500, 500) for _ in range(500)]
        arr.sort()
        self.assertFalse(binary_search(arr, 999))

if __name__ == '__main__':
    unittest.main()
Enter fullscreen mode Exit fullscreen mode

The testcases are divided as follows:

  • Searching for any element in an empty list should result False.
  • Searching for an element present in the list should result True.
  • Searching for an element not present in the list should result False.

The above testcases seem reasonable. To be more robust about writing the testcases we should use hypothesis library which is the Python port of QuickCheck library in Haskell. You can simply install it using pip install hypothesis.
The tests using hypothesis are as below:

import random
import unittest

from hypothesis import given
import hypothesis.strategies as st

from binary_search import binary_search


class BinarySearchTestCase(unittest.TestCase):

    @given(st.integers())
    def test_empty(self, target):
        arr = []
        arr.sort()
        self.assertFalse(binary_search(arr, target))

    @given(st.lists(st.integers(), min_size=1))
    def test_binary_search_true(self, arr):
        arr.sort()
        target = random.choice(arr)
        self.assertTrue(binary_search(arr, target))

    @given(st.lists(st.integers(), min_size=1))
    def test_binary_search_false(self, arr):
        arr.sort()
        target = arr[-1] + 1
        self.assertFalse(binary_search(arr, target))

if __name__ == '__main__':
    unittest.main()
Enter fullscreen mode Exit fullscreen mode
test.py

Hypothesis automatically generates different testcases given the specification, which in this case is a list of integers.

Now the fun part is the binary search code:

def binary_search(arr, target):
    return target in arr
Enter fullscreen mode Exit fullscreen mode
binary_search.py

Let us run the test now.

$ python test.py
...
----------------------------------------------------------------------
Ran 3 tests in 0.380s

OK
Enter fullscreen mode Exit fullscreen mode

The above code is no where near the binary search implementation, but passes all the tests! The linear search algorithm passes the binary search testcases! What?? Now how can we rule out this impostor code?

The problem with these tests are that it doesn't use any of the property of binary search algorithm, it just checks the property of a searching algorithm.

We know one property of binary search that at maximum log2(n) + 1 items will be seen, as it discards half the search space at every iteration.
Here n is the total number of elements in the array.

So we write a class which behaves like a list, by implementing __iter__ and __getitem__ special methods.

class Node:
    def __init__(self, arr):
        self.arr = arr
        self.count = 0

    def __iter__(self):
        for x in self.arr:
            self.count += 1
            yield x

    def __getitem__(self, key):
        self.count += 1
        return self.arr[key]

    def __len__(self):
        return len(self.arr)
Enter fullscreen mode Exit fullscreen mode

We now have a Node class which is similar to list class but additionally has a count variable, which increments every time an element is accessed. This will help to keep track of how many elements the binary search code checks.

In Python, there is a saying, if something walks like a duck, quacks like a duck, it is a duck.

We add this extra testcase using the above Node class.

import math

@given(st.lists(st.integers(), min_size=1))
def test_binary_search_with_node(self, arr):
    arr.sort()
    target = arr[-1]
    max_count = int(math.log2(len(arr))) + 1 
    arr = Node(arr)
    ans = binary_search(arr, target)
    self.assertTrue(ans)
    self.assertTrue(arr.count <= max_count)
Enter fullscreen mode Exit fullscreen mode

Let us run the tests again now:

$ python test.py
..Falsifying example: test_binary_search_with_node(
    self=<__main__.BinarySearchTestCase testMethod=test_binary_search_with_node>,
    arr=[0, 0, 1],
)
F.
======================================================================
FAIL: test_binary_search_with_node (__main__.BinarySearchTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "code.py", line 48, in test_binary_search_with_node
    def test_binary_search_with_node(self, arr):
  File "/home/tmp/venv/lib/python3.6/site-packages/hypothesis/core.py", line 1162, in wrapped_test
    raise the_error_hypothesis_found
  File "code.py", line 54, in test_binary_search_with_node
    self.assertTrue(arr.count <= math.log2(len(arr)) + 1)
AssertionError: False is not true

---------------------------------------------------------------------------
Ran 4 tests in 0.435s

FAILED (failures=1)
Enter fullscreen mode Exit fullscreen mode

This code fails because each and every element will be checked once, which is not true for binary search. It discards half the search space at every iteration. Hypothesis also provides the minimum testcase which failed the test, which in this case is an array of size 3.
Impostor code found!


Complete test code

import random
import math
import unittest

from hypothesis import given
import hypothesis.strategies as st

from binary_search import binary_search

class Node:
    def __init__(self, arr):
        self.arr = arr
        self.count = 0

    def __iter__(self):
        for x in self.arr:
            self.count += 1
            yield x

    def __getitem__(self, key):
        self.count += 1
        return self.arr[key]

    def __len__(self):
        return len(self.arr)

class BinarySearchTestCase(unittest.TestCase):

    @given(st.integers())
    def test_empty(self, target):
        arr = []
        arr.sort()
        self.assertFalse(binary_search(arr, target))

    @given(st.lists(st.integers(), min_size=1))
    def test_binary_search_true(self, arr):
        arr.sort()
        target = random.choice(arr)
        self.assertTrue(binary_search(arr, target))

    @given(st.lists(st.integers(), min_size=1))
    def test_binary_search_false(self, arr):
        arr.sort()
        target = arr[-1] + 1
        self.assertFalse(binary_search(arr, target))

    @given(st.lists(st.integers(), min_size=1))
    def test_binary_search_with_node(self, arr):
        arr.sort()
        target = arr[-1]
        arr = Node(arr)
        max_count = int(math.log2(len(arr))) + 1
        ans = binary_search(arr, target)
        self.assertTrue(ans)
        self.assertTrue(arr.count <= max_count)

if __name__ == '__main__':
    unittest.main()
Enter fullscreen mode Exit fullscreen mode

test.py

Where to go from here?

  • Check out this awesome talk by John Huges on Testing the hard stuff and staying sane, where he talks about how he used QuickCheck for finding and fixing bugs for different companies.
  • Check out this talk on hypothesis, the port of QuickCheck in Python by ZacHatfield-Dodds.
  • Read more on unittest framework here.

Happy learning!

Discussion (0)