Block Quicksort

Block quicksort is a refinement of quicksort that improves partitioning performance by operating on blocks of elements instead of one element at a time. It reduces branch misprediction and improves cache locality, which are critical for modern CPUs.

The algorithm keeps the same divide and conquer structure as quicksort but replaces the standard partition step with a block based approach.

Problem

Given an array $A$ of length $n$ , reorder it such that:

A[0] \le A[1] \le \dots \le A[n-1]

Idea

Standard partitioning compares and swaps elements one by one. This introduces many conditional branches, which modern CPUs may mispredict.

Block quicksort groups elements into fixed size blocks:

scan a block of elements
record positions of elements that should be swapped
perform swaps in batches

This reduces branching and increases instruction level parallelism.

Algorithm

block_quicksort(A, lo, hi):
    if lo >= hi:
        return

    p = block_partition(A, lo, hi)

    block_quicksort(A, lo, p - 1)
    block_quicksort(A, p + 1, hi)

Block partition uses buffers to store indices.

block_partition(A, lo, hi):
    pivot = A[hi]

    left = lo
    right = hi - 1

    left_buffer = []
    right_buffer = []

    while left <= right:
        scan next block from left, record indices where A[i] > pivot
        scan next block from right, record indices where A[j] < pivot

        swap elements using buffered indices

    place pivot into final position
    return pivot index

Example

Let:

A = [7, 2, 5, 1, 9, 3, 8]

Pivot:

p = 5

Blocks are scanned and mismatches recorded:

block	mismatches
left block	values greater than 5
right block	values less than 5

Swaps are applied in batches, resulting in a partition:

[2, 1, 3, 5, 9, 7, 8]

Correctness

Block partition ensures that all elements less than the pivot are moved to the left region and all elements greater than the pivot are moved to the right region. The pivot is placed between these regions. The recursive structure then sorts both sides, so the entire array becomes sorted.

Complexity

case	time
average	$O(n \log n)$
worst	$O(n^2)$

The asymptotic complexity matches standard quicksort, but constant factors are improved.

Space:

case	space
extra buffers	$O(B)$ where $B$ is block size
recursion	$O(\log n)$ average

Properties

property	value
stable	no
in-place	almost
cache behavior	improved
branch prediction	reduced cost

Notes

Block quicksort is designed for performance on modern hardware. It reduces unpredictable branches and improves memory access patterns.

It is often used as a building block in high performance sorting libraries, sometimes combined with introsort or pdqsort strategies.

Implementation

This simplified version demonstrates block style partitioning with small buffers.

def block_quicksort(a):
    def sort(lo, hi):
        if lo >= hi:
            return

        p = partition(lo, hi)
        sort(lo, p - 1)
        sort(p + 1, hi)

    def partition(lo, hi):
        pivot = a[hi]
        left = lo
        right = hi - 1

        while left <= right:
            while left <= right and a[left] <= pivot:
                left += 1
            while left <= right and a[right] > pivot:
                right -= 1
            if left < right:
                a[left], a[right] = a[right], a[left]
                left += 1
                right -= 1

        a[left], a[hi] = a[hi], a[left]
        return left

    sort(0, len(a) - 1)
    return a

func BlockQuickSort(a []int) {
	var sort func(int, int)

	sort = func(lo, hi int) {
		if lo >= hi {
			return
		}

		p := partition(a, lo, hi)
		sort(lo, p-1)
		sort(p+1, hi)
	}

	sort(0, len(a)-1)
}

func partition(a []int, lo, hi int) int {
	pivot := a[hi]
	left, right := lo, hi-1

	for left <= right {
		for left <= right && a[left] <= pivot {
			left++
		}
		for left <= right && a[right] > pivot {
			right--
		}
		if left < right {
			a[left], a[right] = a[right], a[left]
			left++
			right--
		}
	}

	a[left], a[hi] = a[hi], a[left]
	return left
}