Run-Length Encoding (RLE) compresses sequences by replacing consecutive equal values with a pair `(value, count)`.
Run-Length Encoding (RLE) compresses sequences by replacing consecutive equal values with a pair (value, count). It is effective when the input contains long runs of the same element and serves as a simple pre-processing step for more complex compression or for speeding up certain algorithms.
Problem
Given a sequence, produce its run-length encoding.
Example:
[1, 1, 1, 2, 2, 3, 1, 1]Encoding:
[(1, 3), (2, 2), (3, 1), (1, 2)]Encoding
Scan the array once, tracking the current run.
type Run struct {
Value int
Count int
}
func RLEncode(a []int) []Run {
if len(a) == 0 {
return nil
}
var out []Run
cur := a[0]
cnt := 1
for i := 1; i < len(a); i++ {
if a[i] == cur {
cnt++
} else {
out = append(out, Run{cur, cnt})
cur = a[i]
cnt = 1
}
}
out = append(out, Run{cur, cnt})
return out
}Invariant
Before each iteration:
cur and cnt describe the run of equal values ending at index i-1
out contains all completed runs before thatWhen a new value appears, the current run is closed and appended.
Decoding
Reconstruct the original sequence.
func RLDecode(runs []Run) []int {
var out []int
for _, r := range runs {
for i := 0; i < r.Count; i++ {
out = append(out, r.Value)
}
}
return out
}Invariant:
After processing runs[0:k], out contains exactly the expansion of those runsString Encoding
For strings, store (character, count) pairs.
type CharRun struct {
Ch byte
Count int
}
func RLEncodeString(s string) []CharRun {
if len(s) == 0 {
return nil
}
var out []CharRun
cur := s[0]
cnt := 1
for i := 1; i < len(s); i++ {
if s[i] == cur {
cnt++
} else {
out = append(out, CharRun{cur, cnt})
cur = s[i]
cnt = 1
}
}
out = append(out, CharRun{cur, cnt})
return out
}For Unicode-aware encoding, operate on []rune instead of bytes.
Compressed String Form
Sometimes runs are stored as a string.
Example:
"aaabbc" -> "a3b2c1"func RLECompress(s string) string {
if len(s) == 0 {
return ""
}
var out []byte
cur := s[0]
cnt := 1
for i := 1; i < len(s); i++ {
if s[i] == cur {
cnt++
} else {
out = append(out, cur)
out = append(out, []byte(strconv.Itoa(cnt))...)
cur = s[i]
cnt = 1
}
}
out = append(out, cur)
out = append(out, []byte(strconv.Itoa(cnt))...)
return string(out)
}Requires:
import "strconv"Complexity
Time: O(n)
Space: O(k)where k is the number of runs.
In the worst case (all elements different), k = n, so no compression is achieved.
When RLE Is Effective
RLE works well when:
long runs of identical values existExamples:
binary images
monochrome data
repeated logs
sorted arrays with duplicatesIt performs poorly on random data:
[1, 2, 3, 4, 5] -> [(1,1),(2,1),(3,1),(4,1),(5,1)]No compression, overhead increases.
Using RLE in Algorithms
RLE can simplify problems by reducing input size.
Example: Counting Groups
Instead of scanning all elements, count runs:
func CountRuns(a []int) int {
if len(a) == 0 {
return 0
}
count := 1
for i := 1; i < len(a); i++ {
if a[i] != a[i-1] {
count++
}
}
return count
}Equivalent to len(RLEncode(a)).
Example: Maximum Run Length
func MaxRun(a []int) int {
if len(a) == 0 {
return 0
}
best := 1
cur := 1
for i := 1; i < len(a); i++ {
if a[i] == a[i-1] {
cur++
if cur > best {
best = cur
}
} else {
cur = 1
}
}
return best
}In-Place RLE (Overwrite Input)
Some problems require writing compressed output into the same array.
Example: compress a byte slice.
func CompressInPlace(a []byte) int {
write := 0
read := 0
for read < len(a) {
start := read
for read < len(a) && a[read] == a[start] {
read++
}
a[write] = a[start]
write++
count := read - start
if count > 1 {
for _, c := range strconv.Itoa(count) {
a[write] = byte(c)
write++
}
}
}
return write
}Return value is the new length.
Common Pitfalls
Forgetting to append the final run after the loop.
Mixing byte-based and rune-based processing for strings.
Using RLE when input has no repetition increases size instead of reducing it.
Incorrectly handling single-element runs when encoding to compact string form.
Overflow of count when runs are extremely long.
Design Pattern
RLE uses a simple pattern:
- Track current value
- Count consecutive occurrences
- Emit when value changes
The invariant always describes the current run.
Takeaway
Run-length encoding compresses consecutive duplicates into compact (value, count) pairs. It is simple, linear, and effective when data contains repetition. Even when not used for compression, it provides a useful abstraction for reasoning about contiguous groups.