Qequ Blog

Hello, World

2026-03-23

There Are a Thousand Ways to Cook an Egg (And Even More Ways to Sort an Array)

Taking a meme sorting algorithm seriously until it accidentally becomes Python's sort.


Eggs and Arrays

They say there are a thousand ways to cook an egg. Scrambled, poached, boiled, fried, baked, sous vide, coddled, shirred — each one a different technique arriving at roughly the same result: cooked protein.

Sorting algorithms are the eggs of computer science. It's the first non-trivial problem most CS students encounter, and like eggs, there's an absurd number of ways to do it. Bubble sort, insertion sort, selection sort, merge sort, quicksort, heapsort, radix sort, counting sort, timsort, introsort, shellsort, tree sort, patience sort, strand sort, bogosort, and the one that started this whole thing: Stalin Sort.

Stalin Sort is a joke. It scans the array left-to-right, keeps elements that are in non-decreasing order, and discards everything else. It "sorts" in O(n) by throwing away your data.

STALIN-SORT(A)
  B ← ⟨A[1]⟩
  for i ← 2 to |A|
      if A[i] >= B[|B|]
          APPEND(B, A[i])
      // else: discard. goodbye comrade.
  return B

Funny, right? But what if you didn't throw away the data?

The idea

What if instead of sending the comrade numbers to the gulag, we relocated them? Collected them into a new list and applied the filter again? And again on the leftovers? Each pass extracts a sorted subsequence, and the rejects get another chance. No element left behind. Eventually every number belongs to a sorted block. Then you merge them all back together.

A surprisingly simple twist on the joke that actually works as a sorting algorithm.

Sieve Sort

The first version came together quickly. Each pass "sieves" a sorted subsequence out of the remaining elements, like a physical sieve. Apply it repeatedly until nothing's left:

Input:  [38, 27, 43, 3, 9, 82, 10]

Pass 1: keep [38, 43, 82]       reject [27, 3, 9, 10]
Pass 2: keep [27]               reject [3, 9, 10]
Pass 3: keep [3, 9, 10]         reject []

Blocks: [[38, 43, 82], [27], [3, 9, 10]]

Now you have sorted blocks. To merge them, I used a quicksort-style partitioning: pick a pivot block, blocks entirely below go left, entirely above go right, and overlapping blocks get merged into the pivot. Recurse.

SIEVE-SORT(A)
  blocks ← STALIN-DECOMPOSE(A)
  blocks ← QUICKSORT-BLOCKS(blocks)
  return CONCATENATE(blocks)

STALIN-DECOMPOSE(A)
  blocks ← ⟨⟩
  remaining ← A
  while |remaining| > 0
      B ← ⟨remaining[1]⟩
      R ← ⟨⟩
      for i ← 2 to |remaining|
          if remaining[i] >= B[|B|]
              APPEND(B, remaining[i])
          else
              APPEND(R, remaining[i])
      APPEND(blocks, B)
      remaining ← R
  return blocks

QUICKSORT-BLOCKS(blocks)
  if |blocks| <= 1 return blocks
  pivot ← blocks[|blocks|]
  left ← ⟨⟩, right ← ⟨⟩
  for each B in blocks[1..|blocks|-1]
      if OVERLAPS(B, pivot)
          pivot ← MERGE(B, pivot)
      else if B.max <= pivot.min
          APPEND(left, B)
      else
          APPEND(right, B)
  return QUICKSORT-BLOCKS(left)
       + ⟨pivot⟩
       + QUICKSORT-BLOCKS(right)

Here it is in action:

Sieve Sort animation

It works. It's correct. It's also not great. The multi-pass decomposition is O(n × k) where k is the number of blocks. For random data, k ≈ √n, giving O(n√n). For reverse-sorted data, every element gets its own block: O(n²). The quicksort-blocks merge can also degenerate.

But it's a real sorting algorithm, derived from a joke. That was satisfying enough to keep going.

The optimization treadmill

I started optimizing.

The merge was the bottleneck. Quicksort-blocks with many overlapping blocks degenerates to O(n²). I profiled and confirmed: on random data with 1M elements, the merge phase was eating 98% of the time. Take an array like [5, 1, 8, 2, 9, 3]. The decomposition produces blocks [5, 8, 9], [1, 2, 3]. Their ranges overlap (1-3 vs 5-9), so the quicksort-blocks merge has to merge them into a single block. With random data, almost every block overlaps with every other block, and the pivot keeps absorbing everything.

I replaced it with a smarter approach: sort blocks by their minimum value, group non-overlapping blocks into "chains" (just concatenate them for free), and k-way merge overlapping blocks within each chain using a min-heap.

CHAIN-GROUPED-MERGE(blocks)
  SORT blocks by min value
  chains ← group overlapping blocks
  result ← ⟨⟩
  for each chain C
      if |C| = 1
          APPEND(result, C[1])       // no merge needed
      else
          APPEND(result, K-WAY-MERGE(C))
  return result

For example, given blocks [1,3,5], [2,4,6], [10,12,14], [11,13,15], the grouping detects two chains: {[1,3,5], [2,4,6]} (overlapping ranges 1-5 and 2-6) and {[10,12,14], [11,13,15]} (overlapping ranges 10-14 and 11-15). The two chains don't overlap with each other, so after merging within each chain, you just concatenate them. No wasted comparisons between blocks that can't possibly interact.

The multi-pass decomposition was too slow. Scanning the remaining array k times is wasteful. Consider [5, 3, 7, 1, 8, 2]: Pass 1 scans all 6 elements to extract [5, 7, 8]. Pass 2 scans the remaining 3 to extract [3]. Pass 3 scans the remaining 2 to extract [1, 2]. Three full scans. Instead, a single left-to-right pass can detect natural runs: [5] (ascending), [3, 7] (ascending), [1, 8] (ascending), [2]. One scan, four blocks, every element visited exactly once. O(n).

Short runs were creating too many blocks. For random data, natural runs average ~2 elements, creating n/2 blocks. That's 500K blocks for a million elements, and the merge phase has to deal with all of them. I added a minimum run length and extended short runs with insertion sort:

COMPUTE-MINRUN(n)
  // Top 6 bits of n, +1 if lower bits set. Result in [32, 64].
  r ← 0
  while n >= 64
      r ← r OR (n AND 1)
      n ← n / 2
  return n + r

So for n=1M, minrun=32. Any natural run shorter than 32 elements gets padded with the next few elements and sorted with insertion sort. Instead of 500K tiny blocks, you get ~31K blocks of ~32 elements each. Much more manageable.

And then I looked at what I'd built and realized something uncomfortable.

Convergence to Timsort

Step by step, I had replaced every component:

OriginalReplaced with
Multi-pass stalin filterSingle-pass consecutive run detection
Quicksort-blocks mergeK-way merge with chain grouping
No minimum block sizeMinrun + insertion sort extension
Ascending onlyAscending + descending run detection

I had independently reinvented Timsort, the sorting algorithm behind Python's sorted() and Java's Arrays.sort(). Not exactly, but close enough that the differences were cosmetic.

This isn't a failure. It's actually a cool finding: any algorithm that starts from "detect existing structure and merge" faces the same optimization pressures, and Timsort is a local optimum in that design space. Cache locality pushes you toward consecutive runs. Merge degeneracy pushes you toward balanced merging. Short runs push you toward minrun. Reverse data pushes you toward descending detection. The design space funnels you there.

But there was one idea from my experiments that Timsort doesn't have.

The compression coefficient

I kept thinking about blocks of numbers and what makes a "good" block. Take [1, 2, 3, 4, 5] and [1, 50, 51, 52, 100]. Both are sorted. Both have 5 elements. But the first one is solid. The second one has holes in it, like Swiss cheese. If I'm building blocks for sorting, I want blocks of cheddar, not Swiss. Dense, packed, no air gaps. A block full of closely spaced values is going to cover a narrow range and be less likely to overlap with other blocks during the merge phase.

So I introduced a metric I called the compression coefficient, measuring how evenly distributed the gaps are:

COMPRESSION-COEFFICIENT(B)
  // For sorted block B = [b1, b2, ..., bn]
  avg_gap ← (bn - b1) / (n - 1)
  max_gap ← max of all consecutive gaps
  return avg_gap / max_gap

For [1, 2, 3, 4, 5], all gaps are 1, so avg_gap / max_gap = 1.0. A perfect block of cheddar. For [0, 100, 1000], gaps are [100, 900], so avg_gap = 500, max_gap = 900, giving 0.556. Swiss cheese. For [1, 2, 3, 100], the jump from 3 to 100 dominates: 0.340. Mostly air.

When building blocks, don't just check if an element maintains ascending order; also check if it would make the block too holey. [1, 2, 3] should reject 500 because the gap is wildly different from existing gaps. But it should accept 4 because the gaps stay uniform.

The compression check is O(1) incremental. You just track the running max_gap:

COMPRESSION-OK(block, x, threshold)
  gap ← x - block.last
  new_max_gap ← max(block.max_gap, gap)
  new_range ← x - block.first
  new_avg_gap ← new_range / |block|
  return new_avg_gap / new_max_gap >= threshold

I looked through the sorting literature to see if anyone had done this before. There's a whole field of "measures of presortedness" (Mannila 1985, Estivill-Castro 1992), formal metrics for how disordered a sequence is: number of inversions, number of runs, oscillation, maximum displacement, encroaching lists. But all of them measure ordering disorder: are elements in the right relative position? Nobody measures value distribution: are the values close together in magnitude?

I also tested alternative metrics to see if something better existed. The inverse coefficient of variation (1 - std_dev/mean of gaps), the Gini coefficient of the gap distribution, Shannon entropy normalized over the gaps, and a simple min_gap / max_gap ratio. They all correlate highly. The Gini coefficient gives almost identical results to avg_gap / max_gap but costs O(n²) to compute (pairwise gap comparisons). Shannon entropy is more theoretically grounded but requires log operations per gap and can't be updated incrementally. The avg_gap / max_gap ratio won out: it's O(1) incremental (just track the running max), requires a single division, and discriminates well enough on the cases that matter.

The connection to information theory is direct. A block with uniform gaps has low entropy: given the first element and the gap pattern, you can predict the next element with high confidence. A block with wildly varying gaps has high entropy: each next element is a surprise. The compression coefficient is a cheap proxy for gap entropy. A "compressed" block in our sense is literally a low-entropy block, one that would compress well under delta encoding.

Gap Sort: the compression-aware algorithm

With that metric as the foundation, I built a new algorithm that preserves the original stalin-sort spirit while being actually competitive:

GAP-SORT(A)
  threshold ← ADAPTIVE-THRESHOLD(A)
  blocks ← DETECT-RUNS-AND-REASSIGN(A, threshold)
  return CHAIN-GROUPED-MERGE(blocks)

ADAPTIVE-THRESHOLD(A)
  // Estimate sortedness, map to threshold via bell curve
  s ← sample adjacent pairs, count fraction in order
  return 0.15 + 0.35 * 4 * s * (1 - s)

DETECT-RUNS-AND-REASSIGN(A, threshold)
  minrun ← COMPUTE-MINRUN(|A|)
  blocks ← ⟨⟩
  growable ← ⟨⟩     // completed blocks that can accept elements

  while elements remain
      // 1. Detect natural ascending or descending run
      run ← extend from current position
      if descending: REVERSE(run)

      // 2. Extend short runs to minrun
      if |run| < minrun
          for each overflow element x:
              // Try to REASSIGN to an existing block
              best ← binary search growable for largest last <= x
              if best exists and COMPRESSION-OK(best, x, threshold)
                  append x to best block  // the stalin-sort move
              else
                  keep x in current run
          INSERTION-SORT(run with kept elements)

      // 3. Register run as growable block
      APPEND(blocks, run)
      APPEND(growable, run)

  return blocks

The key move is in step 2: when an element breaks a run, instead of just starting a new block, we try to find it a home in an existing block, guided by the compression coefficient. This is the original stalin-sort idea surviving: rejected elements get reassigned, not wasted.

Gap Sort animation

Is it any good?

Benchmarked in C against qsort on 1M elements:

ScenarioGap Sort vs qsort
Random1.3x slower
Nearly sorted1.5x slower
Reverse5x faster
Sorted blocks2.5x faster

It beats qsort on structured data and stays close on random data. It's not going to replace Timsort or pdqsort as a general-purpose sort. Those have years of engineering polish (galloping merge, cache-aware access patterns, branch-free comparisons) that Gap Sort lacks. But it's a real O(n log n) algorithm with a genuinely different approach to block quality.

What I learned

Starting from a joke and following the optimization pressure wherever it leads teaches you something textbooks don't: why algorithms are shaped the way they are. Timsort's design choices aren't arbitrary. They're the inevitable result of optimizing "detect structure and merge." You can arrive at them independently by just trying to make a bad algorithm less bad.

The compression coefficient is the one piece that didn't converge. Whether it's actually useful beyond this algorithm is an open question, but the idea that sorting could benefit from value-distribution statistics, not just ordering statistics, seems worth exploring.

The code is at github.com/qequ/sorting — Sieve Sort and Gap Sort in C, Go, and Python.