Skip to content

Chapter 2. Arrays and Strings

This chapter studies linear data as the primary substrate for algorithm design.

This chapter studies linear data as the primary substrate for algorithm design. Arrays and strings appear simple, but most high performance solutions reduce to controlled traversal, local state maintenance, and careful boundary handling on these structures. The objective here is to make these operations predictable and composable.

An array is treated as a contiguous block of memory with constant time indexed access. A string is treated as a sequence with additional structural constraints, often over a fixed alphabet. The distinction matters when operations depend on encoding, comparison cost, or normalization rules. The chapter begins by fixing a precise model for indexing, slicing, and mutation, since many errors originate from implicit assumptions about these operations.

A central theme is transforming global problems into local invariants. Techniques such as prefix sums, difference arrays, and sliding windows convert repeated recomputation into incremental updates. These transformations reduce time complexity by preserving a compact summary of past computation. The chapter emphasizes how to define and maintain these summaries so that correctness follows from simple invariants.

Two pointer techniques and partitioning schemes are introduced as controlled ways to scan data with minimal overhead. These methods rely on monotonic movement of indices and clear loop invariants. When applied correctly, they avoid nested loops and reduce quadratic behavior to linear time. The discussion focuses on when such monotonicity exists and how to enforce it.

String processing introduces additional structure. Substring search, pattern matching, and hashing rely on representing segments of a string in a form that supports fast comparison. Rolling hash techniques provide constant time substring comparison after linear preprocessing, at the cost of probabilistic correctness. The chapter makes these trade offs explicit and shows how to control collision risk.

Another recurring concern is in-place transformation. Many array problems can be solved without additional memory by reusing the input structure. This requires careful ordering of operations and explicit reasoning about overwritten data. The chapter presents standard patterns for safe in-place updates, including stable partitioning and cyclic replacement.

Edge cases receive systematic treatment. Empty inputs, single element ranges, duplicate values, and boundary indices are handled as part of the algorithm design rather than as afterthoughts. Each technique is accompanied by a minimal set of conditions that must hold for correctness, and these conditions are checked explicitly.

Performance considerations are tied to memory behavior. Sequential access patterns exploit cache locality, while scattered access patterns degrade performance. The chapter highlights how algorithm design interacts with hardware constraints, especially for large inputs.

By the end of this chapter, you should be able to recognize when a problem over arrays or strings can be reduced to a small set of patterns: prefix accumulation, window maintenance, pointer scanning, or hashing. You should also be able to implement these patterns with clear invariants, correct boundary handling, and predictable performance.

2.1 Array TraversalArray traversal is the base operation for all algorithms over linear data.
4 min
2.2 Prefix SumsPrefix sums are a preprocessing technique for answering repeated range sum queries on an array.
5 min
2.3 Difference ArraysA difference array is the inverse pattern of a prefix sum.
6 min
2.4 Two PointersThe two pointers technique uses two indices that move through an array or string in a controlled way.
7 min
2.5 Sliding WindowsSliding windows maintain a contiguous subarray `[l, r)` while both endpoints move forward.
5 min
2.6 PartitioningPartitioning rearranges an array so that elements are grouped by a predicate.
6 min
2.7 RotationArray rotation moves elements by a fixed offset while preserving their relative circular order.
5 min
2.8 In-Place ModificationIn-place modification changes an array without allocating another array of the same size.
5 min
2.9 DeduplicationDeduplication removes repeated values while preserving a chosen notion of identity and, optionally, order.
4 min
2.10 String ScanningString scanning is the basic operation behind parsing, tokenization, validation, search, and text normalization.
6 min
2.11 TokenizationTokenization converts a string into a sequence of meaningful units called tokens.
6 min
2.12 Substring SearchSubstring search locates occurrences of a pattern `p` inside a text `s`.
4 min
2.13 PalindromesA palindrome is a sequence that reads the same forward and backward.
5 min
2.14 AnagramsAnagrams are strings or sequences that contain the same elements with the same multiplicities, possibly in different order.
5 min
2.15 Frequency TablesA frequency table records how many times each value appears in an array, string, or stream.
5 min
2.16 Run-Length EncodingRun-Length Encoding (RLE) compresses sequences by replacing consecutive equal values with a pair `(value, count)`.
4 min
2.17 Rolling HashesRolling hashes assign numeric fingerprints to substrings so that many substring comparisons can be done quickly.
5 min
2.18 String ComparisonString comparison determines the ordering or equality of two strings.
5 min
2.19 Parsing ExpressionsParsing expressions converts a sequence of tokens into a structured form that reflects operator precedence and associativity.
5 min
2.20 TriesA trie is a tree structure for storing a set of strings so that common prefixes are shared.
4 min
2.21 Matrix TraversalMatrix traversal processes a two-dimensional array in a defined order.
5 min
2.22 Spiral TraversalSpiral traversal visits a matrix layer by layer, moving right across the top row, down the right column, left across the bottom row, and up the left column,...
3 min
2.23 Flood FillFlood fill explores a connected region in a grid starting from a seed cell and marks or transforms all cells that belong to the same region.
4 min
2.24 Common PatternsArray and string problems often look different on the surface, but many reduce to a small number of reusable patterns.
5 min
2.25 Boundary ConditionsBoundary conditions define the valid domain of indices, ranges, and states in an algorithm.
4 min