Array union combines elements from two arrays and returns all distinct values. It can also be defined as a multiset union that preserves counts.
You use it when merging datasets while eliminating duplicates.
Problem
Given arrays and , compute:
Two common variants:
- distinct union: each value appears once
- multiset union: each value appears times
Algorithm
For distinct union, use a set.
union_distinct(A, B):
S = empty set
for x in A:
add x to S
for x in B:
add x to S
return elements of SFor multiset union, use counts.
union_multiset(A, B):
countA = frequency map of A
countB = frequency map of B
R = empty array
for each key in union of keys:
k = max(countA[key], countB[key])
append key k times to R
return RExample
Let
and
Distinct union:
| step | set |
|---|---|
| add A | {8, 3, 5} |
| add B | {8, 3, 5, 1} |
Result:
Multiset union:
| value | count in A | count in B | max |
|---|---|---|---|
| 8 | 1 | 1 | 1 |
| 3 | 2 | 2 | 2 |
| 5 | 1 | 0 | 1 |
| 1 | 0 | 1 | 1 |
Result:
Correctness
For the distinct version, the set contains all unique elements from both arrays. Adding elements from both arrays ensures that every value present in either array appears exactly once.
For the multiset version, taking the maximum count for each value ensures that the resulting frequency matches the definition of multiset union.
Complexity
| variant | time | space |
|---|---|---|
| distinct | ||
| multiset |
When to Use
Array union is appropriate when:
- combining datasets
- removing duplicates
- computing coverage of values
If arrays are sorted, a two-pointer method can compute the union in:
time with:
extra space.
Implementation
def union_distinct(a, b):
return list(set(a) | set(b))
def union_multiset(a, b):
from collections import Counter
ca = Counter(a)
cb = Counter(b)
result = []
keys = set(ca) | set(cb)
for k in keys:
count = max(ca[k], cb[k])
result.extend([k] * count)
return resultfunc UnionDistinct(a, b []int) []int {
s := make(map[int]struct{})
for _, x := range a {
s[x] = struct{}{}
}
for _, x := range b {
s[x] = struct{}{}
}
result := make([]int, 0, len(s))
for k := range s {
result = append(result, k)
}
return result
}
func UnionMultiset(a, b []int) []int {
countA := make(map[int]int)
countB := make(map[int]int)
for _, x := range a {
countA[x]++
}
for _, x := range b {
countB[x]++
}
result := []int{}
keys := make(map[int]struct{})
for k := range countA {
keys[k] = struct{}{}
}
for k := range countB {
keys[k] = struct{}{}
}
for k := range keys {
c := countA[k]
if countB[k] > c {
c = countB[k]
}
for i := 0; i < c; i++ {
result = append(result, k)
}
}
return result
}