11. Parallel, Distributed, and GPU Sorting
High-throughput sorting: parallel merge and quicksort, GPU bitonic and radix sort, SIMD sorting networks, and distributed partition-based sorts.
13 notes
High-throughput sorting: parallel merge and quicksort, GPU bitonic and radix sort, SIMD sorting networks, and distributed partition-based sorts.
Sort a two dimensional mesh by alternating row sorts and column sorts until the grid is globally ordered.
Sort distributed keys across hypercube processors using dimension based compare and exchange stages.
Sort data in parallel by local sorting, regular sampling, global splitter selection, redistribution, and final local merge.
Count key frequencies in parallel, compute prefix sums, then scatter elements into sorted positions.
Sort an array by repeatedly comparing odd indexed and even indexed adjacent pairs in parallel.
Sort data by building and merging bitonic sequences through a regular compare and exchange network.
Sort integer keys by processing fixed width digit groups in parallel using counting and prefix sums.
Choose splitters from samples, partition the input into buckets, sort buckets independently, then concatenate the sorted buckets.
Partition the array around a pivot, then sort partitions concurrently using parallel recursion.
Divide the input, sort subarrays in parallel, then merge them using parallel merge procedures.
External sorting algorithm that performs run generation and merging concurrently across multiple processors and disks.
Distribute sorting work across multiple processors to reduce wall-clock time, with analysis of total work, span, communication, and synchronization.