Remove duplicate values from an array while keeping one representative of each value.
Array deduplication removes repeated values from an array. The usual method keeps a set of values already seen and writes each new value once.
You use it when the array may contain repeated elements and downstream work needs only distinct values.
Problem
Given an array of length , produce an array containing each distinct value once.
For stable deduplication, preserve the first occurrence order.
Algorithm
Use a set and a write pointer.
deduplicate(A):
seen = empty set
k = 0
for i from 0 to length(A) - 1:
if A[i] not in seen:
add A[i] to seen
A[k] = A[i]
k += 1
return kExample
Let
| step | value | seen after step | array prefix |
|---|---|---|---|
| 1 | 8 | {8} | [8] |
| 2 | 3 | {8, 3} | [8, 3] |
| 3 | 5 | {8, 3, 5} | [8, 3, 5] |
| 4 | 3 | {8, 3, 5} | [8, 3, 5] |
| 5 | 8 | {8, 3, 5} | [8, 3, 5] |
| 6 | 1 | {8, 3, 5, 1} | [8, 3, 5, 1] |
Return new length .
The deduplicated prefix is:
Correctness
The set seen contains exactly the values already written into the output prefix. When the scan reaches a new value, the algorithm writes it once and records it. When the scan reaches a value already in seen, the value has already been written, so skipping it preserves uniqueness.
Since the scan proceeds left to right, the first occurrence order is preserved.
Complexity
| operation | expected time |
|---|---|
| deduplicate |
Space usage:
where is the number of distinct values.
When to Use
Array deduplication is appropriate when:
- repeated values should be removed
- first occurrence order matters
- extra memory for a set is acceptable
- expected linear time is preferred
It is less suitable when:
- values cannot be hashed
- strict constant extra space is required
For sorted arrays, deduplication can be done in-place with constant extra space.
Implementation
def deduplicate(a):
seen = set()
k = 0
for x in a:
if x not in seen:
seen.add(x)
a[k] = x
k += 1
return kfunc Deduplicate[T comparable](a []T) int {
seen := make(map[T]struct{})
k := 0
for _, x := range a {
if _, ok := seen[x]; ok {
continue
}
seen[x] = struct{}{}
a[k] = x
k++
}
return k
}