Skip to content

Parallel Merge Sort

Divide the input, sort subarrays in parallel, then merge them using parallel merge procedures.

Parallel merge sort extends merge sort by executing recursive subproblems concurrently. The structure remains divide and conquer, but the recursion tree becomes a parallel computation graph.

You split the array into halves, sort each half in parallel, then merge the results. The merge step can also be parallelized using partitioning techniques.

Problem

Given an array AA of size nn, produce a sorted array in nondecreasing order using parallel computation.

Algorithm

The recursion forks at each level. Each subproblem runs independently.

parallel_merge_sort(A):
    if length(A) <= 1:
        return A

    split A into L and R

    spawn task:
        parallel_merge_sort(L)

    spawn task:
        parallel_merge_sort(R)

    wait for both tasks to finish

    return parallel_merge(L, R)

The key difference from sequential merge sort is the use of parallel tasks instead of sequential recursive calls.

Parallel Merge

A naive merge remains sequential and becomes the bottleneck. A parallel merge divides work by partitioning one array and locating corresponding positions in the other.

parallel_merge(A, B):
    if small:
        return sequential_merge(A, B)

    pick median of A
    find position in B using binary search

    split into two independent merges:
        left: A_left + B_left
        right: A_right + B_right

    execute both merges in parallel

Complexity

Let T1(n)T_1(n) be total work and T(n)T_\infty(n) be span.

measurevalue
work T1(n)T_1(n)O(nlogn)O(n \log n)
span T(n)T_\infty(n)O(log2n)O(\log^2 n)
parallelism T1/TT_1 / T_\inftyO(n/logn)O(n / \log n)

With optimized parallel merge, span can be reduced closer to:

O(logn) O(\log n)

Cost Model

Assume pp processors.

processorstime
p=1p = 1O(nlogn)O(n \log n)
pn/lognp \le n / \log nnear linear speedup
large pplimited by span

Speedup depends on merge efficiency and scheduling overhead.

Correctness

Each recursive call sorts its subarray correctly. The merge step combines two sorted arrays into a sorted result. Parallel execution does not change data dependencies, so correctness follows from the sequential version.

Practical Considerations

  • Task granularity matters. Very small tasks should run sequentially to avoid overhead.
  • Memory bandwidth becomes the dominant cost for large arrays.
  • Parallel merge requires extra care to avoid contention and false sharing.
  • Work stealing schedulers improve load balancing.

When to Use

Use parallel merge sort when:

  • the dataset is large
  • multiple cores or distributed workers are available
  • stable sorting is required
  • predictable O(nlogn)O(n \log n) behavior is preferred

For in memory single thread workloads, optimized sequential sorts such as introsort or timsort are usually faster.

Implementation (Go, simplified)

func ParallelMergeSort(a []int) []int {
	if len(a) <= 1 {
		return a
	}

	mid := len(a) / 2

	var left, right []int
	done := make(chan struct{}, 2)

	go func() {
		left = ParallelMergeSort(a[:mid])
		done <- struct{}{}
	}()

	go func() {
		right = ParallelMergeSort(a[mid:])
		done <- struct{}{}
	}()

	<-done
	<-done

	return merge(left, right)
}

func merge(a, b []int) []int {
	res := make([]int, 0, len(a)+len(b))
	i, j := 0, 0

	for i < len(a) && j < len(b) {
		if a[i] <= b[j] {
			res = append(res, a[i])
			i++
		} else {
			res = append(res, b[j])
			j++
		}
	}

	res = append(res, a[i:]...)
	res = append(res, b[j:]...)
	return res
}