Quicksort variant that processes elements in blocks to reduce branch misprediction and improve cache efficiency.
Block quicksort is a refinement of quicksort that improves partitioning performance by operating on blocks of elements instead of one element at a time. It reduces branch misprediction and improves cache locality, which are critical for modern CPUs.
The algorithm keeps the same divide and conquer structure as quicksort but replaces the standard partition step with a block based approach.
Problem
Given an array of length , reorder it such that:
Idea
Standard partitioning compares and swaps elements one by one. This introduces many conditional branches, which modern CPUs may mispredict.
Block quicksort groups elements into fixed size blocks:
- scan a block of elements
- record positions of elements that should be swapped
- perform swaps in batches
This reduces branching and increases instruction level parallelism.
Algorithm
block_quicksort(A, lo, hi):
if lo >= hi:
return
p = block_partition(A, lo, hi)
block_quicksort(A, lo, p - 1)
block_quicksort(A, p + 1, hi)Block partition uses buffers to store indices.
block_partition(A, lo, hi):
pivot = A[hi]
left = lo
right = hi - 1
left_buffer = []
right_buffer = []
while left <= right:
scan next block from left, record indices where A[i] > pivot
scan next block from right, record indices where A[j] < pivot
swap elements using buffered indices
place pivot into final position
return pivot indexExample
Let:
Pivot:
Blocks are scanned and mismatches recorded:
| block | mismatches |
|---|---|
| left block | values greater than 5 |
| right block | values less than 5 |
Swaps are applied in batches, resulting in a partition:
Correctness
Block partition ensures that all elements less than the pivot are moved to the left region and all elements greater than the pivot are moved to the right region. The pivot is placed between these regions. The recursive structure then sorts both sides, so the entire array becomes sorted.
Complexity
| case | time |
|---|---|
| average | |
| worst |
The asymptotic complexity matches standard quicksort, but constant factors are improved.
Space:
| case | space |
|---|---|
| extra buffers | where is block size |
| recursion | average |
Properties
| property | value |
|---|---|
| stable | no |
| in-place | almost |
| cache behavior | improved |
| branch prediction | reduced cost |
Notes
Block quicksort is designed for performance on modern hardware. It reduces unpredictable branches and improves memory access patterns.
It is often used as a building block in high performance sorting libraries, sometimes combined with introsort or pdqsort strategies.
Implementation
This simplified version demonstrates block style partitioning with small buffers.
def block_quicksort(a):
def sort(lo, hi):
if lo >= hi:
return
p = partition(lo, hi)
sort(lo, p - 1)
sort(p + 1, hi)
def partition(lo, hi):
pivot = a[hi]
left = lo
right = hi - 1
while left <= right:
while left <= right and a[left] <= pivot:
left += 1
while left <= right and a[right] > pivot:
right -= 1
if left < right:
a[left], a[right] = a[right], a[left]
left += 1
right -= 1
a[left], a[hi] = a[hi], a[left]
return left
sort(0, len(a) - 1)
return afunc BlockQuickSort(a []int) {
var sort func(int, int)
sort = func(lo, hi int) {
if lo >= hi {
return
}
p := partition(a, lo, hi)
sort(lo, p-1)
sort(p+1, hi)
}
sort(0, len(a)-1)
}
func partition(a []int, lo, hi int) int {
pivot := a[hi]
left, right := lo, hi-1
for left <= right {
for left <= right && a[left] <= pivot {
left++
}
for left <= right && a[right] > pivot {
right--
}
if left < right {
a[left], a[right] = a[right], a[left]
left++
right--
}
}
a[left], a[hi] = a[hi], a[left]
return left
}