Skip to content

Bitonic Sort on Hypercube

Sort distributed keys across hypercube processors using dimension based compare and exchange stages.

Bitonic sort on a hypercube maps the bitonic sorting network onto processors connected as a hypercube. Each processor communicates with neighbors that differ in one binary address bit. The fixed communication pattern matches the regular compare and exchange stages of bitonic sort.

This model is useful for parallel sorting theory and for systems whose communication topology resembles a hypercube.

Problem

Given nn keys distributed across p=2dp = 2^d processors arranged as a dd dimensional hypercube, sort all keys in nondecreasing order.

Each processor may hold one key or a block of keys.

Hypercube Model

Each processor has a binary address of length dd.

Two processors are neighbors when their addresses differ in exactly one bit.

For processor address ii, the neighbor in dimension kk is:

i2k i \oplus 2^k

Algorithm

The algorithm follows the bitonic sorting network. At each stage, processors exchange data with a neighbor along one hypercube dimension.

hypercube_bitonic_sort(P):
    d = log2(number_of_processors)

    for size = 2 to 2^d doubling:
        for stride = size / 2 down to 1 halving:
            parallel for each processor i:
                j = i xor stride

                ascending = (i & size) == 0

                exchange data with processor j

                if should_keep_lower(i, j, ascending):
                    keep smaller half
                else:
                    keep larger half

When each processor holds one key, each exchange is a simple compare and exchange. When each processor holds a block, the two processors merge their blocks and split the merged result.

Compare Split

For block based sorting, processors exchange sorted local blocks.

compare_split(local, remote, keep_lower):
    merged = merge(local, remote)

    if keep_lower:
        return first half of merged

    return second half of merged

This preserves sorted local blocks after every exchange.

Direction Rule

For a compare between processors ii and jj:

ascending = (i & size) == 0

if ascending:
    lower processor keeps smaller keys
    higher processor keeps larger keys
else:
    lower processor keeps larger keys
    higher processor keeps smaller keys

This builds bitonic sequences across processor groups, then merges them.

Complexity

Let p=2dp = 2^d processors and each processor hold m=n/pm = n/p keys.

measurevalue
communication stagesO(log2p)O(\log^2 p)
local merge per stageO(m)O(m)
total local work per processorO(mlog2p)O(m \log^2 p)
communication volume per processorO(mlog2p)O(m \log^2 p)
one key per processor depthO(log2p)O(\log^2 p)

If each processor starts with an unsorted block, add local sorting cost:

O(mlogm) O(m \log m)

Correctness

The hypercube communication pattern implements the same comparator pairs as the bitonic sorting network. Each compare split places the smaller half of a combined pair on the lower side and the larger half on the upper side, according to the current direction.

The bitonic network is correct, and the block based compare split preserves the same ordering relation at partition granularity. After all stages, processor blocks are internally sorted and globally ordered by processor address.

Practical Considerations

  • Processor count is usually assumed to be a power of two.
  • Communication follows fixed dimensions, which simplifies scheduling.
  • Block based compare split is more practical than one key per processor.
  • Load balance is good when every processor holds the same number of keys.
  • The algorithm performs more communication stages than sample sort.
  • It is mostly used in parallel algorithm theory and topology aware systems.

When to Use

Use bitonic sort on a hypercube when:

  • the machine topology is hypercube like
  • predictable communication is required
  • processor count is a power of two
  • regular synchronization is acceptable
  • the goal is theoretical or educational clarity

For large distributed clusters, sample sort or range partition sort usually communicates less.

Implementation Sketch

processor_step(i, local, stride, size):
    j = i xor stride

    remote = exchange_with(j, local)

    ascending = (i & size) == 0
    lower_processor = i < j

    if ascending:
        keep_lower = lower_processor
    else:
        keep_lower = not lower_processor

    return compare_split(local, remote, keep_lower)
def compare_split(local, remote, keep_lower):
    merged = sorted(local + remote)
    half = len(merged) // 2

    if keep_lower:
        return merged[:half]

    return merged[half:]