Skip to content

LeetCode 217: Contains Duplicate

A clear explanation of detecting duplicates in an array using a hash set and sorting.

Problem Restatement

We are given an integer array nums.

We need to determine whether any value appears at least twice.

Return:

Return valueMeaning
TrueSome value appears more than once
FalseEvery value is distinct

The official statement asks us to return true if any value appears at least twice in the array, and false if every element is distinct. (leetcode.com)

Input and Output

ItemMeaning
InputInteger array nums
OutputBoolean
TrueDuplicate exists
FalseAll elements are unique

Example function shape:

def containsDuplicate(nums: list[int]) -> bool:
    ...

Examples

Example 1:

nums = [1, 2, 3, 1]

The value 1 appears twice.

Answer:

True

Example 2:

nums = [1, 2, 3, 4]

All values are distinct.

Answer:

False

Example 3:

nums = [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]

Several values repeat.

Answer:

True

First Thought

We could compare every pair of elements.

For each index i, check every later index j.

for i in range(len(nums)):
    for j in range(i + 1, len(nums)):
        if nums[i] == nums[j]:
            return True

This works, but it is slow.

Problem With Brute Force

For an array of length n, the nested loops may compare:

n * (n - 1) / 2

pairs.

That gives:

O(n^2)

time complexity.

We can do much better using a hash set.

Key Insight

A set supports fast membership checking.

While scanning the array:

  • if we see a number for the first time, add it to the set
  • if the number already exists in the set, we found a duplicate

This lets us detect duplicates in one pass.

Algorithm

  1. Create an empty set seen.
  2. Scan the array from left to right.
  3. For each number:
    • If it already exists in seen, return True.
    • Otherwise add it to seen.
  4. If the scan finishes, return False.

Walkthrough

Use:

nums = [1, 2, 3, 1]

Start:

seen = set()

Process 1:

1 not in seen

Add it:

seen = {1}

Process 2:

seen = {1, 2}

Process 3:

seen = {1, 2, 3}

Process 1 again:

1 already exists in seen

Return:

True

Correctness

The set seen always contains exactly the distinct values encountered earlier in the array.

When processing a number:

  • if the number already exists in seen, then the current occurrence and the earlier occurrence form a duplicate pair, so returning True is correct
  • otherwise the number has not appeared before, so adding it to seen preserves the invariant

If the algorithm finishes scanning the array without finding a repeated value, then every element appeared exactly once, so returning False is correct.

Therefore the algorithm correctly determines whether the array contains duplicates.

Complexity

MetricValueWhy
TimeO(n) averageEach set lookup and insertion is average O(1)
SpaceO(n)The set may store all elements

Hash Set Implementation

class Solution:
    def containsDuplicate(self, nums: list[int]) -> bool:
        seen = set()

        for num in nums:
            if num in seen:
                return True

            seen.add(num)

        return False

Code Explanation

Create the set:

seen = set()

Scan the array:

for num in nums:

Check whether the value already appeared:

if num in seen:
    return True

Otherwise store it:

seen.add(num)

If no duplicate is found:

return False

Shorter Python Version

Python sets automatically remove duplicates.

So:

len(set(nums))

gives the number of distinct elements.

If duplicates exist, the set becomes smaller than the original array.

class Solution:
    def containsDuplicate(self, nums: list[int]) -> bool:
        return len(nums) != len(set(nums))

This is concise and commonly used in interviews.

Sorting Solution

Another approach is sorting.

After sorting, duplicates become adjacent.

Example:

nums = [3, 1, 2, 1]

Sorted:

[1, 1, 2, 3]

Now duplicates are easy to detect.

class Solution:
    def containsDuplicate(self, nums: list[int]) -> bool:
        nums.sort()

        for i in range(1, len(nums)):
            if nums[i] == nums[i - 1]:
                return True

        return False

Sorting Complexity

MetricValue
TimeO(n log n)
Extra spaceDepends on sorting implementation

The hash set solution is usually preferred because it is linear time.

Testing

def run_tests():
    s = Solution()

    assert s.containsDuplicate([1, 2, 3, 1]) is True
    assert s.containsDuplicate([1, 2, 3, 4]) is False
    assert s.containsDuplicate([1, 1, 1, 3, 3, 4, 3, 2, 4, 2]) is True
    assert s.containsDuplicate([]) is False
    assert s.containsDuplicate([5]) is False
    assert s.containsDuplicate([-1, -2, -3, -1]) is True

    print("all tests passed")

run_tests()
TestWhy
[1,2,3,1]Simple duplicate
[1,2,3,4]All unique
Multiple repeated valuesMany duplicates
[]Empty array
[5]Single element
Negative valuesWorks for all integers