Skip to content

Array Union

Combine elements from two arrays into one collection without duplicates.

Array union combines elements from two arrays and returns all distinct values. It can also be defined as a multiset union that preserves counts.

You use it when merging datasets while eliminating duplicates.

Problem

Given arrays AA and BB, compute:

AB A \cup B

Two common variants:

  • distinct union: each value appears once
  • multiset union: each value appears max(countA,countB)\max(count_A, count_B) times

Algorithm

For distinct union, use a set.

union_distinct(A, B):
    S = empty set

    for x in A:
        add x to S

    for x in B:
        add x to S

    return elements of S

For multiset union, use counts.

union_multiset(A, B):
    countA = frequency map of A
    countB = frequency map of B

    R = empty array

    for each key in union of keys:
        k = max(countA[key], countB[key])
        append key k times to R

    return R

Example

Let

A=[8,3,5,3] A = [8, 3, 5, 3]

and

B=[3,3,8,1] B = [3, 3, 8, 1]

Distinct union:

stepset
add A{8, 3, 5}
add B{8, 3, 5, 1}

Result:

[8,3,5,1] [8, 3, 5, 1]

Multiset union:

valuecount in Acount in Bmax
8111
3222
5101
1011

Result:

[8,3,3,5,1] [8, 3, 3, 5, 1]

Correctness

For the distinct version, the set contains all unique elements from both arrays. Adding elements from both arrays ensures that every value present in either array appears exactly once.

For the multiset version, taking the maximum count for each value ensures that the resulting frequency matches the definition of multiset union.

Complexity

varianttimespace
distinctO(n+m)O(n + m)O(n+m)O(n + m)
multisetO(n+m)O(n + m)O(n+m)O(n + m)

When to Use

Array union is appropriate when:

  • combining datasets
  • removing duplicates
  • computing coverage of values

If arrays are sorted, a two-pointer method can compute the union in:

O(n+m) O(n + m)

time with:

O(1) O(1)

extra space.

Implementation

def union_distinct(a, b):
    return list(set(a) | set(b))

def union_multiset(a, b):
    from collections import Counter
    ca = Counter(a)
    cb = Counter(b)

    result = []
    keys = set(ca) | set(cb)

    for k in keys:
        count = max(ca[k], cb[k])
        result.extend([k] * count)

    return result
func UnionDistinct(a, b []int) []int {
    s := make(map[int]struct{})

    for _, x := range a {
        s[x] = struct{}{}
    }

    for _, x := range b {
        s[x] = struct{}{}
    }

    result := make([]int, 0, len(s))
    for k := range s {
        result = append(result, k)
    }

    return result
}

func UnionMultiset(a, b []int) []int {
    countA := make(map[int]int)
    countB := make(map[int]int)

    for _, x := range a {
        countA[x]++
    }

    for _, x := range b {
        countB[x]++
    }

    result := []int{}
    keys := make(map[int]struct{})

    for k := range countA {
        keys[k] = struct{}{}
    }
    for k := range countB {
        keys[k] = struct{}{}
    }

    for k := range keys {
        c := countA[k]
        if countB[k] > c {
            c = countB[k]
        }

        for i := 0; i < c; i++ {
            result = append(result, k)
        }
    }

    return result
}