Distribute elements into buckets by value range, sort each bucket, then concatenate — achieving linear expected time on uniformly distributed input.
Bucket sort distributes elements into groups, sorts each group, and then concatenates the results. It relies on an assumption about the input distribution. When elements are spread roughly uniformly over a known range, bucket sort can run in linear time.
Problem
You have an array A[0..n-1] of real numbers in a known interval, typically [0, 1). You want a sorting algorithm that exploits this range and expected uniform distribution.
Solution
Divide the interval into k buckets. Map each element to its bucket, sort each bucket individually, and concatenate the buckets in order.
bucket_sort(A):
n = length(A)
buckets = array of n empty lists
for each x in A:
index = floor(n * x)
append x to buckets[index]
for i = 0 to n - 1:
insertion_sort(buckets[i])
return concatenate(buckets[0], ..., buckets[n-1])The number of buckets is often chosen equal to n, but other choices are possible.
Example
Input:
[0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68]Assign to buckets:
bucket 0: [0.12]
bucket 1: [0.17]
bucket 2: [0.21, 0.23, 0.26]
bucket 3: [0.39]
bucket 6: [0.68]
bucket 7: [0.72, 0.78]
bucket 9: [0.94]Sort each bucket:
bucket 2 → [0.21, 0.23, 0.26]
bucket 7 → [0.72, 0.78]Concatenate:
[0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94]Bucket Invariant
Each bucket corresponds to a subinterval of the domain. For n buckets over [0,1), bucket i covers:
[i/n, (i+1)/n)Every element placed in bucket i is less than every element placed in bucket j for i < j. Therefore, once each bucket is sorted internally, concatenating buckets in order produces a globally sorted array.
Correctness
Bucket sort satisfies the permutation condition because each element is placed into exactly one bucket and then copied into the output.
It satisfies the order condition because:
- Within each bucket, elements are sorted
- Across buckets, all elements in earlier buckets are smaller than all elements in later buckets
This depends on the correctness of the bucket mapping function.
Complexity
Let n be the number of elements and assume uniform distribution.
- Distributing elements into buckets:
O(n)- Sorting each bucket:
If elements are evenly distributed, each bucket has expected size O(1), so sorting each bucket takes constant time on average.
Total time:
O(n)Worst case occurs when all elements fall into one bucket:
O(n^2)This reduces to the complexity of the internal sorting algorithm.
Space usage is:
O(n)for storing buckets.
Choosing Buckets
The number and size of buckets determine performance.
more buckets → smaller bucket size, faster sorting inside buckets
fewer buckets → larger buckets, more work per bucketChoosing n buckets is a common heuristic when input is uniformly distributed.
For general ranges [a, b), map values using:
index = floor(n * (x - a) / (b - a))Stability
Bucket sort can be stable if:
- elements are appended to buckets in input order
- the sorting algorithm inside each bucket is stable
Insertion sort is commonly used because it is stable and efficient for small inputs.
When to Use
Bucket sort is effective when:
input is uniformly distributed
range is known and bounded
elements can be mapped to buckets efficientlyIt is often used for floating-point numbers, probabilistic data, or pre-normalized values.
Limitations
Bucket sort depends heavily on input distribution. If the distribution is skewed, some buckets become large and performance degrades.
It also requires a mapping function that distributes values evenly. Poor mapping leads to imbalance and loss of performance.
For arbitrary data without distribution guarantees, comparison-based sorting is safer.
Implementation Notes
Buckets are often implemented as dynamic arrays or linked lists.
Using insertion sort inside buckets is efficient because bucket sizes are small on average.
For large datasets, bucket sort can be parallelized. Each bucket can be processed independently after distribution.
Common Bugs
A common bug is computing the bucket index incorrectly, especially at the upper boundary. Values equal to 1 in [0,1) assumptions must be handled carefully.
Another bug is allocating too few buckets, which causes uneven distribution and large bucket sizes.
A third bug is forgetting to sort individual buckets before concatenation.
Takeaway
Bucket sort achieves linear expected time by distributing work across many small groups. Its performance depends on uniform distribution and proper bucket mapping.