6.9 Bucket Sort

Bucket sort distributes elements into groups, sorts each group, and then concatenates the results. It relies on an assumption about the input distribution. When elements are spread roughly uniformly over a known range, bucket sort can run in linear time.

Problem

You have an array A[0..n-1] of real numbers in a known interval, typically [0, 1). You want a sorting algorithm that exploits this range and expected uniform distribution.

Solution

Divide the interval into k buckets. Map each element to its bucket, sort each bucket individually, and concatenate the buckets in order.

bucket_sort(A):
  n = length(A)
  buckets = array of n empty lists

  for each x in A:
    index = floor(n * x)
    append x to buckets[index]

  for i = 0 to n - 1:
    insertion_sort(buckets[i])

  return concatenate(buckets[0], ..., buckets[n-1])

The number of buckets is often chosen equal to n, but other choices are possible.

Example

Input:

[0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68]

Assign to buckets:

bucket 0: [0.12]
bucket 1: [0.17]
bucket 2: [0.21, 0.23, 0.26]
bucket 3: [0.39]
bucket 6: [0.68]
bucket 7: [0.72, 0.78]
bucket 9: [0.94]

Sort each bucket:

bucket 2 → [0.21, 0.23, 0.26]
bucket 7 → [0.72, 0.78]

Concatenate:

[0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94]

Bucket Invariant

Each bucket corresponds to a subinterval of the domain. For n buckets over [0,1), bucket i covers:

[i/n, (i+1)/n)

Every element placed in bucket i is less than every element placed in bucket j for i < j. Therefore, once each bucket is sorted internally, concatenating buckets in order produces a globally sorted array.

Correctness

Bucket sort satisfies the permutation condition because each element is placed into exactly one bucket and then copied into the output.

It satisfies the order condition because:

Within each bucket, elements are sorted
Across buckets, all elements in earlier buckets are smaller than all elements in later buckets

This depends on the correctness of the bucket mapping function.

Complexity

Let n be the number of elements and assume uniform distribution.

Distributing elements into buckets:

O(n)

Sorting each bucket:

If elements are evenly distributed, each bucket has expected size O(1), so sorting each bucket takes constant time on average.

Total time:

O(n)

Worst case occurs when all elements fall into one bucket:

O(n^2)

This reduces to the complexity of the internal sorting algorithm.

Space usage is:

O(n)

for storing buckets.

Choosing Buckets

The number and size of buckets determine performance.

more buckets → smaller bucket size, faster sorting inside buckets
fewer buckets → larger buckets, more work per bucket

Choosing n buckets is a common heuristic when input is uniformly distributed.

For general ranges [a, b), map values using:

index = floor(n * (x - a) / (b - a))

Stability

Bucket sort can be stable if:

elements are appended to buckets in input order
the sorting algorithm inside each bucket is stable

Insertion sort is commonly used because it is stable and efficient for small inputs.

When to Use

Bucket sort is effective when:

input is uniformly distributed
range is known and bounded
elements can be mapped to buckets efficiently

It is often used for floating-point numbers, probabilistic data, or pre-normalized values.

Limitations

Bucket sort depends heavily on input distribution. If the distribution is skewed, some buckets become large and performance degrades.

It also requires a mapping function that distributes values evenly. Poor mapping leads to imbalance and loss of performance.

For arbitrary data without distribution guarantees, comparison-based sorting is safer.

Implementation Notes

Buckets are often implemented as dynamic arrays or linked lists.

Using insertion sort inside buckets is efficient because bucket sizes are small on average.

For large datasets, bucket sort can be parallelized. Each bucket can be processed independently after distribution.

Common Bugs

A common bug is computing the bucket index incorrectly, especially at the upper boundary. Values equal to 1 in [0,1) assumptions must be handled carefully.

Another bug is allocating too few buckets, which causes uneven distribution and large bucket sizes.

A third bug is forgetting to sort individual buckets before concatenation.

Takeaway

Bucket sort achieves linear expected time by distributing work across many small groups. Its performance depends on uniform distribution and proper bucket mapping.