# Parallel Merge Sort

# Parallel Merge Sort

Parallel merge sort extends merge sort by executing recursive subproblems concurrently. The structure remains divide and conquer, but the recursion tree becomes a parallel computation graph.

You split the array into halves, sort each half in parallel, then merge the results. The merge step can also be parallelized using partitioning techniques.

## Problem

Given an array $A$ of size $n$, produce a sorted array in nondecreasing order using parallel computation.

## Algorithm

The recursion forks at each level. Each subproblem runs independently.

```text
parallel_merge_sort(A):
    if length(A) <= 1:
        return A

    split A into L and R

    spawn task:
        parallel_merge_sort(L)

    spawn task:
        parallel_merge_sort(R)

    wait for both tasks to finish

    return parallel_merge(L, R)
```

The key difference from sequential merge sort is the use of parallel tasks instead of sequential recursive calls.

## Parallel Merge

A naive merge remains sequential and becomes the bottleneck. A parallel merge divides work by partitioning one array and locating corresponding positions in the other.

```text
parallel_merge(A, B):
    if small:
        return sequential_merge(A, B)

    pick median of A
    find position in B using binary search

    split into two independent merges:
        left: A_left + B_left
        right: A_right + B_right

    execute both merges in parallel
```

## Complexity

Let $T_1(n)$ be total work and $T_\infty(n)$ be span.

| measure                      | value           |
| ---------------------------- | --------------- |
| work $T_1(n)$                | $O(n \log n)$   |
| span $T_\infty(n)$           | $O(\log^2 n)$   |
| parallelism $T_1 / T_\infty$ | $O(n / \log n)$ |

With optimized parallel merge, span can be reduced closer to:

$$
O(\log n)
$$

## Cost Model

Assume $p$ processors.

| processors         | time                |
| ------------------ | ------------------- |
| $p = 1$            | $O(n \log n)$       |
| $p \le n / \log n$ | near linear speedup |
| large $p$          | limited by span     |

Speedup depends on merge efficiency and scheduling overhead.

## Correctness

Each recursive call sorts its subarray correctly. The merge step combines two sorted arrays into a sorted result. Parallel execution does not change data dependencies, so correctness follows from the sequential version.

## Practical Considerations

* Task granularity matters. Very small tasks should run sequentially to avoid overhead.
* Memory bandwidth becomes the dominant cost for large arrays.
* Parallel merge requires extra care to avoid contention and false sharing.
* Work stealing schedulers improve load balancing.

## When to Use

Use parallel merge sort when:

* the dataset is large
* multiple cores or distributed workers are available
* stable sorting is required
* predictable $O(n \log n)$ behavior is preferred

For in memory single thread workloads, optimized sequential sorts such as introsort or timsort are usually faster.

## Implementation (Go, simplified)

```go
func ParallelMergeSort(a []int) []int {
	if len(a) <= 1 {
		return a
	}

	mid := len(a) / 2

	var left, right []int
	done := make(chan struct{}, 2)

	go func() {
		left = ParallelMergeSort(a[:mid])
		done <- struct{}{}
	}()

	go func() {
		right = ParallelMergeSort(a[mid:])
		done <- struct{}{}
	}()

	<-done
	<-done

	return merge(left, right)
}

func merge(a, b []int) []int {
	res := make([]int, 0, len(a)+len(b))
	i, j := 0, 0

	for i < len(a) && j < len(b) {
		if a[i] <= b[j] {
			res = append(res, a[i])
			i++
		} else {
			res = append(res, b[j])
			j++
		}
	}

	res = append(res, a[i:]...)
	res = append(res, b[j:]...)
	return res
}
```

