Skip to content

2.16 Run-Length Encoding

Run-Length Encoding (RLE) compresses sequences by replacing consecutive equal values with a pair `(value, count)`.

Run-Length Encoding (RLE) compresses sequences by replacing consecutive equal values with a pair (value, count). It is effective when the input contains long runs of the same element and serves as a simple pre-processing step for more complex compression or for speeding up certain algorithms.

Problem

Given a sequence, produce its run-length encoding.

Example:

[1, 1, 1, 2, 2, 3, 1, 1]

Encoding:

[(1, 3), (2, 2), (3, 1), (1, 2)]

Encoding

Scan the array once, tracking the current run.

type Run struct {
	Value int
	Count int
}

func RLEncode(a []int) []Run {
	if len(a) == 0 {
		return nil
	}

	var out []Run
	cur := a[0]
	cnt := 1

	for i := 1; i < len(a); i++ {
		if a[i] == cur {
			cnt++
		} else {
			out = append(out, Run{cur, cnt})
			cur = a[i]
			cnt = 1
		}
	}

	out = append(out, Run{cur, cnt})
	return out
}

Invariant

Before each iteration:

cur and cnt describe the run of equal values ending at index i-1
out contains all completed runs before that

When a new value appears, the current run is closed and appended.

Decoding

Reconstruct the original sequence.

func RLDecode(runs []Run) []int {
	var out []int

	for _, r := range runs {
		for i := 0; i < r.Count; i++ {
			out = append(out, r.Value)
		}
	}

	return out
}

Invariant:

After processing runs[0:k], out contains exactly the expansion of those runs

String Encoding

For strings, store (character, count) pairs.

type CharRun struct {
	Ch    byte
	Count int
}

func RLEncodeString(s string) []CharRun {
	if len(s) == 0 {
		return nil
	}

	var out []CharRun
	cur := s[0]
	cnt := 1

	for i := 1; i < len(s); i++ {
		if s[i] == cur {
			cnt++
		} else {
			out = append(out, CharRun{cur, cnt})
			cur = s[i]
			cnt = 1
		}
	}

	out = append(out, CharRun{cur, cnt})
	return out
}

For Unicode-aware encoding, operate on []rune instead of bytes.

Compressed String Form

Sometimes runs are stored as a string.

Example:

"aaabbc" -> "a3b2c1"
func RLECompress(s string) string {
	if len(s) == 0 {
		return ""
	}

	var out []byte
	cur := s[0]
	cnt := 1

	for i := 1; i < len(s); i++ {
		if s[i] == cur {
			cnt++
		} else {
			out = append(out, cur)
			out = append(out, []byte(strconv.Itoa(cnt))...)
			cur = s[i]
			cnt = 1
		}
	}

	out = append(out, cur)
	out = append(out, []byte(strconv.Itoa(cnt))...)

	return string(out)
}

Requires:

import "strconv"

Complexity

Time:  O(n)
Space: O(k)

where k is the number of runs.

In the worst case (all elements different), k = n, so no compression is achieved.

When RLE Is Effective

RLE works well when:

long runs of identical values exist

Examples:

binary images
monochrome data
repeated logs
sorted arrays with duplicates

It performs poorly on random data:

[1, 2, 3, 4, 5] -> [(1,1),(2,1),(3,1),(4,1),(5,1)]

No compression, overhead increases.

Using RLE in Algorithms

RLE can simplify problems by reducing input size.

Example: Counting Groups

Instead of scanning all elements, count runs:

func CountRuns(a []int) int {
	if len(a) == 0 {
		return 0
	}

	count := 1
	for i := 1; i < len(a); i++ {
		if a[i] != a[i-1] {
			count++
		}
	}
	return count
}

Equivalent to len(RLEncode(a)).

Example: Maximum Run Length

func MaxRun(a []int) int {
	if len(a) == 0 {
		return 0
	}

	best := 1
	cur := 1

	for i := 1; i < len(a); i++ {
		if a[i] == a[i-1] {
			cur++
			if cur > best {
				best = cur
			}
		} else {
			cur = 1
		}
	}

	return best
}

In-Place RLE (Overwrite Input)

Some problems require writing compressed output into the same array.

Example: compress a byte slice.

func CompressInPlace(a []byte) int {
	write := 0
	read := 0

	for read < len(a) {
		start := read

		for read < len(a) && a[read] == a[start] {
			read++
		}

		a[write] = a[start]
		write++

		count := read - start
		if count > 1 {
			for _, c := range strconv.Itoa(count) {
				a[write] = byte(c)
				write++
			}
		}
	}

	return write
}

Return value is the new length.

Common Pitfalls

Forgetting to append the final run after the loop.

Mixing byte-based and rune-based processing for strings.

Using RLE when input has no repetition increases size instead of reducing it.

Incorrectly handling single-element runs when encoding to compact string form.

Overflow of count when runs are extremely long.

Design Pattern

RLE uses a simple pattern:

  1. Track current value
  2. Count consecutive occurrences
  3. Emit when value changes

The invariant always describes the current run.

Takeaway

Run-length encoding compresses consecutive duplicates into compact (value, count) pairs. It is simple, linear, and effective when data contains repetition. Even when not used for compression, it provides a useful abstraction for reasoning about contiguous groups.