A clear explanation of detecting duplicates in an array using a hash set and sorting.
Problem Restatement
We are given an integer array nums.
We need to determine whether any value appears at least twice.
Return:
| Return value | Meaning |
|---|---|
True | Some value appears more than once |
False | Every value is distinct |
The official statement asks us to return true if any value appears at least twice in the array, and false if every element is distinct. (leetcode.com)
Input and Output
| Item | Meaning |
|---|---|
| Input | Integer array nums |
| Output | Boolean |
True | Duplicate exists |
False | All elements are unique |
Example function shape:
def containsDuplicate(nums: list[int]) -> bool:
...Examples
Example 1:
nums = [1, 2, 3, 1]The value 1 appears twice.
Answer:
TrueExample 2:
nums = [1, 2, 3, 4]All values are distinct.
Answer:
FalseExample 3:
nums = [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]Several values repeat.
Answer:
TrueFirst Thought
We could compare every pair of elements.
For each index i, check every later index j.
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] == nums[j]:
return TrueThis works, but it is slow.
Problem With Brute Force
For an array of length n, the nested loops may compare:
n * (n - 1) / 2pairs.
That gives:
O(n^2)time complexity.
We can do much better using a hash set.
Key Insight
A set supports fast membership checking.
While scanning the array:
- if we see a number for the first time, add it to the set
- if the number already exists in the set, we found a duplicate
This lets us detect duplicates in one pass.
Algorithm
- Create an empty set
seen. - Scan the array from left to right.
- For each number:
- If it already exists in
seen, returnTrue. - Otherwise add it to
seen.
- If it already exists in
- If the scan finishes, return
False.
Walkthrough
Use:
nums = [1, 2, 3, 1]Start:
seen = set()Process 1:
1 not in seenAdd it:
seen = {1}Process 2:
seen = {1, 2}Process 3:
seen = {1, 2, 3}Process 1 again:
1 already exists in seenReturn:
TrueCorrectness
The set seen always contains exactly the distinct values encountered earlier in the array.
When processing a number:
- if the number already exists in
seen, then the current occurrence and the earlier occurrence form a duplicate pair, so returningTrueis correct - otherwise the number has not appeared before, so adding it to
seenpreserves the invariant
If the algorithm finishes scanning the array without finding a repeated value, then every element appeared exactly once, so returning False is correct.
Therefore the algorithm correctly determines whether the array contains duplicates.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) average | Each set lookup and insertion is average O(1) |
| Space | O(n) | The set may store all elements |
Hash Set Implementation
class Solution:
def containsDuplicate(self, nums: list[int]) -> bool:
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return FalseCode Explanation
Create the set:
seen = set()Scan the array:
for num in nums:Check whether the value already appeared:
if num in seen:
return TrueOtherwise store it:
seen.add(num)If no duplicate is found:
return FalseShorter Python Version
Python sets automatically remove duplicates.
So:
len(set(nums))gives the number of distinct elements.
If duplicates exist, the set becomes smaller than the original array.
class Solution:
def containsDuplicate(self, nums: list[int]) -> bool:
return len(nums) != len(set(nums))This is concise and commonly used in interviews.
Sorting Solution
Another approach is sorting.
After sorting, duplicates become adjacent.
Example:
nums = [3, 1, 2, 1]Sorted:
[1, 1, 2, 3]Now duplicates are easy to detect.
class Solution:
def containsDuplicate(self, nums: list[int]) -> bool:
nums.sort()
for i in range(1, len(nums)):
if nums[i] == nums[i - 1]:
return True
return FalseSorting Complexity
| Metric | Value |
|---|---|
| Time | O(n log n) |
| Extra space | Depends on sorting implementation |
The hash set solution is usually preferred because it is linear time.
Testing
def run_tests():
s = Solution()
assert s.containsDuplicate([1, 2, 3, 1]) is True
assert s.containsDuplicate([1, 2, 3, 4]) is False
assert s.containsDuplicate([1, 1, 1, 3, 3, 4, 3, 2, 4, 2]) is True
assert s.containsDuplicate([]) is False
assert s.containsDuplicate([5]) is False
assert s.containsDuplicate([-1, -2, -3, -1]) is True
print("all tests passed")
run_tests()| Test | Why |
|---|---|
[1,2,3,1] | Simple duplicate |
[1,2,3,4] | All unique |
| Multiple repeated values | Many duplicates |
[] | Empty array |
[5] | Single element |
| Negative values | Works for all integers |