11. Parallel, Distributed, and GPU Sorting
High-throughput sorting: parallel merge and quicksort, GPU bitonic and radix sort, SIMD sorting networks, and distributed partition-based sorts.
7 notes
High-throughput sorting: parallel merge and quicksort, GPU bitonic and radix sort, SIMD sorting networks, and distributed partition-based sorts.
Sort a block sized tile on the GPU using shared memory and cooperative threads.
Sort a small set of values inside one CUDA warp using shuffle operations and compare exchange steps.
Sort data on a GPU by selecting splitters, partitioning into buckets in parallel, then sorting buckets independently.
Sort data on a GPU by recursively merging sorted runs using parallel merge primitives.
Sort fixed width keys on a GPU using digit passes, parallel histograms, prefix sums, and scatter operations.
Sort data on a GPU using a regular bitonic compare and exchange network.