Distribution sorting algorithm that approximates uniform distribution and then refines with insertion sort.
Flashsort is a hybrid distribution sorting algorithm that first classifies elements into buckets using an approximate linear mapping, then refines the partially ordered array using insertion sort.
It is designed for datasets that are approximately uniformly distributed, where it can achieve near linear performance.
Problem
Given an array of elements, sort the array efficiently by exploiting approximate distribution information.
Idea
Flashsort has two main phases:
Classification phase Map each element into one of classes using a linear transformation.
Permutation phase Rearrange elements so that each class occupies a contiguous region.
Final refinement Use insertion sort to finish ordering within each region.
Mapping Function
Let:
Each element is mapped to class:
This distributes elements approximately uniformly if the input is uniform.
Algorithm
flashsort(A):
n = length(A)
if n <= 1:
return A
m = int(0.45 * n)
L = [0] * m
min_val = min(A)
max_val = max(A)
if min_val == max_val:
return A
for x in A:
k = int((m - 1) * (x - min_val) / (max_val - min_val))
L[k] += 1
for i in range(1, m):
L[i] += L[i - 1]
i = 0
j = 0
k = m - 1
while i < n:
while i >= L[k]:
i += 1
k = int((m - 1) * (A[i] - min_val) / (max_val - min_val))
flash = A[i]
while i < L[k]:
k = int((m - 1) * (flash - min_val) / (max_val - min_val))
j = L[k] - 1
A[j], flash = flash, A[j]
L[k] -= 1
insertion_sort(A)
return AExample
Sort:
Step 1: classify into buckets using linear mapping Step 2: permute elements into approximate order Step 3: apply insertion sort
Final result:
Correctness
The classification phase partitions the value range into intervals. The permutation phase moves elements into approximate positions so that most elements are close to their final location. The final insertion sort completes ordering.
Complexity
Let:
- be number of elements
Average time:
Worst case:
due to insertion sort if distribution is poor.
Space:
where
Properties
| property | value |
|---|---|
| stable | no |
| in place | partly |
| comparison based | partially |
| distribution sensitive | yes |
When to Use
Use flashsort when:
- data is approximately uniformly distributed
- high performance on large arrays is required
- hybrid approach is acceptable
Avoid when:
- distribution is highly skewed
- worst case guarantees are required
- stability is needed
Notes
- Choice of affects performance
- Typically to
- Final insertion sort works well because array is nearly sorted