Parallel Merge Sort

Parallel merge sort extends merge sort by executing recursive subproblems concurrently. The structure remains divide and conquer, but the recursion tree becomes a parallel computation graph.

You split the array into halves, sort each half in parallel, then merge the results. The merge step can also be parallelized using partitioning techniques.

Problem

Given an array $A$ of size $n$ , produce a sorted array in nondecreasing order using parallel computation.

Algorithm

The recursion forks at each level. Each subproblem runs independently.

parallel_merge_sort(A):
    if length(A) <= 1:
        return A

    split A into L and R

    spawn task:
        parallel_merge_sort(L)

    spawn task:
        parallel_merge_sort(R)

    wait for both tasks to finish

    return parallel_merge(L, R)

The key difference from sequential merge sort is the use of parallel tasks instead of sequential recursive calls.

Parallel Merge

A naive merge remains sequential and becomes the bottleneck. A parallel merge divides work by partitioning one array and locating corresponding positions in the other.

parallel_merge(A, B):
    if small:
        return sequential_merge(A, B)

    pick median of A
    find position in B using binary search

    split into two independent merges:
        left: A_left + B_left
        right: A_right + B_right

    execute both merges in parallel

Complexity

Let $T_1(n)$ be total work and $T_\infty(n)$ be span.

measure	value
work $T_1(n)$	$O(n \log n)$
span $T_\infty(n)$	$O(\log^2 n)$
parallelism $T_1 / T_\infty$	$O(n / \log n)$

With optimized parallel merge, span can be reduced closer to:

O(\log n)

Cost Model

Assume $p$ processors.

processors	time
$p = 1$	$O(n \log n)$
$p \le n / \log n$	near linear speedup
large $p$	limited by span

Speedup depends on merge efficiency and scheduling overhead.

Correctness

Each recursive call sorts its subarray correctly. The merge step combines two sorted arrays into a sorted result. Parallel execution does not change data dependencies, so correctness follows from the sequential version.

Practical Considerations

Task granularity matters. Very small tasks should run sequentially to avoid overhead.
Memory bandwidth becomes the dominant cost for large arrays.
Parallel merge requires extra care to avoid contention and false sharing.
Work stealing schedulers improve load balancing.

When to Use

Use parallel merge sort when:

the dataset is large
multiple cores or distributed workers are available
stable sorting is required
predictable $O(n \log n)$ behavior is preferred

For in memory single thread workloads, optimized sequential sorts such as introsort or timsort are usually faster.

Implementation (Go, simplified)

func ParallelMergeSort(a []int) []int {
	if len(a) <= 1 {
		return a
	}

	mid := len(a) / 2

	var left, right []int
	done := make(chan struct{}, 2)

	go func() {
		left = ParallelMergeSort(a[:mid])
		done <- struct{}{}
	}()

	go func() {
		right = ParallelMergeSort(a[mid:])
		done <- struct{}{}
	}()

	<-done
	<-done

	return merge(left, right)
}

func merge(a, b []int) []int {
	res := make([]int, 0, len(a)+len(b))
	i, j := 0, 0

	for i < len(a) && j < len(b) {
		if a[i] <= b[j] {
			res = append(res, a[i])
			i++
		} else {
			res = append(res, b[j])
			j++
		}
	}

	res = append(res, a[i:]...)
	res = append(res, b[j:]...)
	return res
}