Saurabh Paryani

Posted on

# Finding the duplicate number in constant space (Python)

LeetCode’s problem number 287. Find the Duplicate Number poses a great challenge to anybody familiar with problem-solving using data structures like Linked Lists or Arrays. The problem might look easy to solve in the first glance but given the constraints of the problem, becomes a real task to do.

### Problem Statement

Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive.
There is only one repeated number in nums, return this repeated number.
You must solve the problem without modifying the array nums and uses only constant extra space.
All the integers in nums appear only once except for precisely one integer which appears two or more times.

### Understanding The Problem

#### Examining The Examples

##### Example 1:
``````Input: nums = [1,3,4,2,2]
Output: 2

# 2 is the repeated number.
``````
##### Example 2:
``````Input: nums = [3,1,3,4,2]
Output: 3

# 3 is the repeated number.
``````
##### Example 3:
``````Input: nums = [3,3,3,3,3]
Output: 3

# The only repeated number is 3.
``````

### Approaching The Problem

#### ❌ Approach 1: Brute Force

The obvious approach that comes to mind is to use two pointers and run two loops. If `nums[i] == nums[j]` then we can deduce that there has been a repeated number, and we can return that.

Sadly, this costs too much time. The time complexity of this approach becomes O(N²), N being the length of the array, resulting in a TLE. (Time Limit Exceeded)

#### ❌ Approach 2: Trying out sorting, hashing and negating

Sorting the array, `[1,3,4,2,2] → [1,2,2,3,4]` and checking adjacent elements would have been a valid approach, but we have a constraint to not modify the given array.

Using a set to store the visited numbers would also have been a valid approach, had the question not ask us to do this in constant space. Using a set data structure to check if a number has been visited to find out the duplicate number takes O(N) space, which isn’t allowed in this problem.

❌ Another smart approach that fails is to negate every number we visit, acknowledging the fact that the numbers in `nums` are a valid index. This particular statement will be useful later on, but let’s focus on why this approach fails. If I’m traversing through the array, my approach is to make `nums[|nums[i]|]` negative. If I find out that `nums[|nums[i]|]` is already negative, then I can be sure that there’s a duplicate.

This modifies the array which is not allowed.

### ✅ Approach 3: Binary Search

The idea of a binary search comes from the fact that we have been given an inclusive range of `[1,n]` We know the numbers ranging from 1 to n are in an increasing order, hence sorted, which gives us the freedom to apply Binary Search on this range. Our `low` gets initialized to 0 and `high` gets initialized to `len(nums)-1` as our two pointers. We use a while loop and find out the middle index for every check `low < high`

``````low, high = 1, len(nums)-1
while low < high:
mid = (low + high) // 2
.....
``````

Initially, we have `low = 1, high = 4` because remember, we’re doing binary search on [1,2,3,4] and not on our given array, [1,3,4,2,2].

For each mid value, we will count the number of elements in the array that are less than or equal to this mid value. So our mid is now 2. We will use the given array `[1,3,4,2,2]` and traverse through it, to find out how many such numbers exist that are less than or equal to mid. There are 3 such numbers (1, 2 and 2) which are lesser or equal to mid (2).
Now, you see, try to visualize this.

The range array [1,2,3,4] is sorted. Say our given array was `[3,1,3,4,2]` From the range array, we know our mid is at 2. Traversing our `nums` array, there are only 2 elements (1 and 2) lesser than or equal to mid (2). Meaning, both 1 and 2 are such elements which have no duplicate.
If our given array was `[1,3,4,2]` i.e. with no duplicates, the count of elements lesser than or equal to mid (2) would have been 2 (1 and 2). But since we have 3 elements lesser than or equal to mid, we can deduce that 2 has been repeated.

We conclude by saying that if the count is greater than mid, then the duplicate lies in the left half of the range; otherwise, it lies in the right half.
This can be better understood with further narrowing down of the binary search. Since we have now deduced that the duplicate has to lie on the left half of the range array, we shift our `high` to `mid` meaning our new `high` is now equal to 2.

`mid` is equal to 1. We will traverse `nums = [1,3,4,2,2]` again and count the number of elements which are lesser or equal to 1. It’s only 1 (1).
This count isn’t greater than mid, that means our duplicate has to lie on the right half of `[1]`
We shift our `low` to `mid+1` making `low = 2 = high`.
The while loop ends with low pointing at 2, which indeed is our answer. 2 is the repeated number because if `low = high = 2` then our `mid = 2` too.
And if we traverse `nums = [1,3,4,2,2]` again, we find out that there are 3 elements (1, 2 and 2) that are higher than mid. And since our while loop has been exhausted, we can conclude — 2 is our repeated number.

``````low, high = 1, len(nums)-1
while low < high:
mid = (low + high)//2
count = 0
for n in nums:
if n <= mid:
count += 1

if count > mid: # duplicate lies in the left half
high = mid
else: # duplicate will lie in the right half
low = mid + 1

return low
``````

Since we’re applying Binary Search on our range array `[1,n]` it will cost us O(NlogN) time complexity, N being the length of the given array and space complexity being O(1) — taking no extra space.
Amazing.
But, can we do better?

### ✅ Approach 4: Hare-Tortoise

How can we prove that at least one duplicate number must exist in nums?
Can you solve the problem in linear runtime complexity?

The goal of this approach is to solve this in linear runtime. Let’s solve the first follow up question — proving that at least one duplicate number must exist in `nums` and why is this important.
Say n = 5. Meaning, our array should have a size of n+1 or 6 elements.
So, `[4,3,1,2,5,?]` But we know the array will only include elements `[1,5]` inclusive, the `?` should be a repeated number.

The trick of this approach, like I mentioned in Approach 3, is acknowledging the fact that the numbers in `nums` are a valid index.

The “Hare-Tortoise” approach, also known as, slow-fast pointer approach is used here. Problems having:
a) Duplicate elements,
b) The ability to traverse through indices — can be done by the slow-fast pointer approach. A very popular question called Linked List Cycle II uses the same approach, where we first detect a cycle, and if it does, return the node where the cycle begins.

This problem is also very similar. We’re trying to find a cycle and then the starting number to that cycle. If we’re stuck in a cycle, we know there is a repetition. Walking through the dry-run will give more clarity.

There will be 2 phases to this problem.
Phase 1: Detecting a cycle. Here, the slow pointer (tortoise) s moves 1 move at a time and the fast pointer (hare) f moves 2 moves at a time.
Initially, both s and f are at 0.

Let’s not worry about the looping constraint for now, and focus on the movement of s and f.
We define s’s movement as `slow = nums[slow]`
and f’s movement as `fast = nums[nums[fast]]`

``````slow = nums[slow]
= nums[0]
= 1

fast = nums[nums[fast]]
= nums[nums[0]]
= nums[1]
= 3
``````

``````slow = nums[slow]
= nums[1]
= 3

fast = nums[nums[fast]]
= nums[nums[3]]
= nums[2]
= 4
``````

``````slow = nums[slow]
= nums[3]
= 2

fast = nums[nums[fast]]
= nums[nums[4]]
= nums[2]
= 4
``````

``````slow = nums[slow]
= nums[2]
= 4

fast = nums[nums[fast]]
= nums[nums[4]]
= nums[2]
= 4
``````

``````slow = nums[slow]
= nums[4]
= 2

fast = nums[nums[fast]]
= nums[nums[4]]
= nums[2]
= 4
``````

There clearly is a cycle. When `slow == fast` there is a cycle beginning. Now, we only need to find the starting point of the cycle — which begins our Phase 2.

Phase 2: Reset slow to 0, and move both slow and fast at the same pace, both by 1 move.

Movement of the pointers: `slow = nums[slow]`, `fast = nums[fast]`

``````slow = nums[slow]
= nums[0]
= 0

fast = nums[fast]
= nums[4]
= 2
``````

``````slow = nums[slow]
= nums[1]
= 3

fast = nums[fast]
= nums[2]
= 4
``````

``````slow = nums[slow]
= nums[3]
= 2

fast = nums[fast]
= nums[4]
= 2
``````

``````slow = nums[slow]
= nums[2]
= 4

fast = nums[fast]
= nums[2]
= 4
``````

And this keeps happening. So the moment `slow == fast` we know there is a cycle happening. And from Phase 2, we know that `slow == fast` happened at `slow = 2` Meaning 2 has to be the starting point of the cycle.
Which concludes to the fact that 2 has to be the number that has to be repeated first.
Thus, finally, we return `slow` (or `fast`, as they’re equal) as our final answer.

``````slow = fast = nums[0]
slow = nums[slow]
fast = nums[nums[fast]]

while slow != fast:
slow = nums[slow]
fast = nums[nums[fast]]

slow = nums[0]

while slow != fast:
slow = nums[slow]
fast = nums[fast]

return slow
``````

The Time Complexity is O(N) and we’ve solved this problem in constant space so Space Complexity is O(1).

If you’ve reached this far, remember — coming up with a solution like this super hard at first, but by recognizing patterns, this can be done!