# Parallel Quicksort

# Parallel Quicksort

Parallel quicksort follows the same partition based divide and conquer structure as quicksort, but executes recursive calls concurrently. The key idea is that once the partition step finishes, the left and right subarrays become independent and can be processed in parallel.

Compared to parallel merge sort, this method avoids the cost of merging but introduces challenges in load balancing and pivot quality.

## Problem

Given an array $A$ of size $n$, sort it in nondecreasing order using parallel execution.

## Algorithm

Partition the array, then sort both sides concurrently.

```text id="n4kp9p"
parallel_quicksort(A, lo, hi):
    if lo >= hi:
        return

    p = partition(A, lo, hi)

    spawn task:
        parallel_quicksort(A, lo, p - 1)

    spawn task:
        parallel_quicksort(A, p + 1, hi)

    wait for both tasks
```

The partition step rearranges elements such that:

$$
A[lo..p-1] \le A[p] \le A[p+1..hi]
$$

## Partition

Partition remains sequential in most implementations, which can limit scalability.

```text id="v8p2ls"
partition(A, lo, hi):
    pivot = A[hi]
    i = lo

    for j from lo to hi - 1:
        if A[j] <= pivot:
            swap A[i], A[j]
            i += 1

    swap A[i], A[hi]
    return i
```

## Complexity

| measure            | value                 |
| ------------------ | --------------------- |
| work $T_1(n)$      | $O(n \log n)$ average |
| span $T_\infty(n)$ | $O(\log n)$ average   |
| worst case work    | $O(n^2)$              |

Parallelism depends heavily on balanced partitions.

## Cost Model

| condition           | behavior               |
| ------------------- | ---------------------- |
| balanced partitions | high parallelism       |
| skewed partitions   | low parallelism        |
| poor pivot choice   | degrades to sequential |

Expected parallel time with $p$ processors:

$$
O\left(\frac{n \log n}{p} + \log n\right)
$$

## Correctness

Each partition step places the pivot in its final sorted position. Recursive calls sort independent subarrays. Since no cross dependencies exist after partitioning, parallel execution preserves correctness.

## Practical Considerations

* Pivot selection is critical. Randomized or median of three improves balance.
* Small subarrays should switch to sequential sort to reduce overhead.
* In place partitioning gives good cache locality.
* Work stealing schedulers help mitigate imbalance.

## When to Use

Use parallel quicksort when:

* in place sorting is required
* memory usage must remain low
* average case performance is sufficient
* input distribution is not adversarial

For guaranteed worst case bounds or stability, parallel merge sort is more predictable.

## Implementation (Go, simplified)

```go id="h9p4rt"
func ParallelQuicksort(a []int, lo, hi int) {
	if lo >= hi {
		return
	}

	p := partition(a, lo, hi)

	done := make(chan struct{}, 2)

	go func() {
		ParallelQuicksort(a, lo, p-1)
		done <- struct{}{}
	}()

	go func() {
		ParallelQuicksort(a, p+1, hi)
		done <- struct{}{}
	}()

	<-done
	<-done
}

func partition(a []int, lo, hi int) int {
	pivot := a[hi]
	i := lo

	for j := lo; j < hi; j++ {
		if a[j] <= pivot {
			a[i], a[j] = a[j], a[i]
			i++
		}
	}

	a[i], a[hi] = a[hi], a[i]
	return i
}
```

