Skip to content

Array Deduplication

Remove duplicate values from an array while keeping one representative of each value.

Array deduplication removes repeated values from an array. The usual method keeps a set of values already seen and writes each new value once.

You use it when the array may contain repeated elements and downstream work needs only distinct values.

Problem

Given an array AA of length nn, produce an array containing each distinct value once.

For stable deduplication, preserve the first occurrence order.

Algorithm

Use a set and a write pointer.

deduplicate(A):
    seen = empty set
    k = 0

    for i from 0 to length(A) - 1:
        if A[i] not in seen:
            add A[i] to seen
            A[k] = A[i]
            k += 1

    return k

Example

Let

A=[8,3,5,3,8,1] A = [8, 3, 5, 3, 8, 1]
stepvalueseen after steparray prefix
18{8}[8]
23{8, 3}[8, 3]
35{8, 3, 5}[8, 3, 5]
43{8, 3, 5}[8, 3, 5]
58{8, 3, 5}[8, 3, 5]
61{8, 3, 5, 1}[8, 3, 5, 1]

Return new length 44.

The deduplicated prefix is:

[8,3,5,1] [8, 3, 5, 1]

Correctness

The set seen contains exactly the values already written into the output prefix. When the scan reaches a new value, the algorithm writes it once and records it. When the scan reaches a value already in seen, the value has already been written, so skipping it preserves uniqueness.

Since the scan proceeds left to right, the first occurrence order is preserved.

Complexity

operationexpected time
deduplicateO(n)O(n)

Space usage:

O(m) O(m)

where mm is the number of distinct values.

When to Use

Array deduplication is appropriate when:

  • repeated values should be removed
  • first occurrence order matters
  • extra memory for a set is acceptable
  • expected linear time is preferred

It is less suitable when:

  • values cannot be hashed
  • strict constant extra space is required

For sorted arrays, deduplication can be done in-place with constant extra space.

Implementation

def deduplicate(a):
    seen = set()
    k = 0

    for x in a:
        if x not in seen:
            seen.add(x)
            a[k] = x
            k += 1

    return k
func Deduplicate[T comparable](a []T) int {
    seen := make(map[T]struct{})
    k := 0

    for _, x := range a {
        if _, ok := seen[x]; ok {
            continue
        }

        seen[x] = struct{}{}
        a[k] = x
        k++
    }

    return k
}