Divide the input, sort subarrays in parallel, then merge them using parallel merge procedures.
Parallel merge sort extends merge sort by executing recursive subproblems concurrently. The structure remains divide and conquer, but the recursion tree becomes a parallel computation graph.
You split the array into halves, sort each half in parallel, then merge the results. The merge step can also be parallelized using partitioning techniques.
Problem
Given an array of size , produce a sorted array in nondecreasing order using parallel computation.
Algorithm
The recursion forks at each level. Each subproblem runs independently.
parallel_merge_sort(A):
if length(A) <= 1:
return A
split A into L and R
spawn task:
parallel_merge_sort(L)
spawn task:
parallel_merge_sort(R)
wait for both tasks to finish
return parallel_merge(L, R)The key difference from sequential merge sort is the use of parallel tasks instead of sequential recursive calls.
Parallel Merge
A naive merge remains sequential and becomes the bottleneck. A parallel merge divides work by partitioning one array and locating corresponding positions in the other.
parallel_merge(A, B):
if small:
return sequential_merge(A, B)
pick median of A
find position in B using binary search
split into two independent merges:
left: A_left + B_left
right: A_right + B_right
execute both merges in parallelComplexity
Let be total work and be span.
| measure | value |
|---|---|
| work | |
| span | |
| parallelism |
With optimized parallel merge, span can be reduced closer to:
Cost Model
Assume processors.
| processors | time |
|---|---|
| near linear speedup | |
| large | limited by span |
Speedup depends on merge efficiency and scheduling overhead.
Correctness
Each recursive call sorts its subarray correctly. The merge step combines two sorted arrays into a sorted result. Parallel execution does not change data dependencies, so correctness follows from the sequential version.
Practical Considerations
- Task granularity matters. Very small tasks should run sequentially to avoid overhead.
- Memory bandwidth becomes the dominant cost for large arrays.
- Parallel merge requires extra care to avoid contention and false sharing.
- Work stealing schedulers improve load balancing.
When to Use
Use parallel merge sort when:
- the dataset is large
- multiple cores or distributed workers are available
- stable sorting is required
- predictable behavior is preferred
For in memory single thread workloads, optimized sequential sorts such as introsort or timsort are usually faster.
Implementation (Go, simplified)
func ParallelMergeSort(a []int) []int {
if len(a) <= 1 {
return a
}
mid := len(a) / 2
var left, right []int
done := make(chan struct{}, 2)
go func() {
left = ParallelMergeSort(a[:mid])
done <- struct{}{}
}()
go func() {
right = ParallelMergeSort(a[mid:])
done <- struct{}{}
}()
<-done
<-done
return merge(left, right)
}
func merge(a, b []int) []int {
res := make([]int, 0, len(a)+len(b))
i, j := 0, 0
for i < len(a) && j < len(b) {
if a[i] <= b[j] {
res = append(res, a[i])
i++
} else {
res = append(res, b[j])
j++
}
}
res = append(res, a[i:]...)
res = append(res, b[j:]...)
return res
}