Coding theory studies how to represent information so that errors can be detected or corrected.
A message may be transmitted through a noisy channel, stored on an unreliable medium, or processed by a system where bits may be lost or changed. Coding theory adds controlled redundancy to the message so that the original information can still be recovered.
Linear algebra enters coding theory through vector spaces over finite fields. A linear code is a subspace. A generator matrix encodes messages. A parity-check matrix detects errors. Syndrome decoding turns error correction into a problem about cosets and linear equations. Standard coding theory notes describe linear codes through generator matrices, parity-check matrices, and syndrome decoding.
123.1 Messages and Codewords
A message is the original information to be protected.
A codeword is the encoded form of that message.
For example, a binary message may be
A code may encode it as a longer vector
The extra coordinates are redundant. They are added so that errors can be detected or corrected.
The general pattern is:
If the codeword is changed during transmission, the receiver tries to infer the original codeword.
123.2 Alphabets and Finite Fields
Coding theory usually works over a finite alphabet.
The most common case is the binary alphabet
Binary arithmetic is performed in the finite field
In ,
This means addition is the same as exclusive-or.
More generally, codes may be defined over a finite field
where is a prime power.
A word of length over is a vector in
Thus coding theory studies selected subsets or subspaces of finite-dimensional vector spaces over finite fields.
123.3 Block Codes
A block code of length over is a subset
The elements of are codewords.
If the code contains codewords, then it can represent possible messages.
A code is often described by parameters such as
Here:
| Symbol | Meaning |
|---|---|
| Codeword length | |
| Number of codewords | |
| Minimum distance |
When the code is linear, the parameters are usually written as
Here is the dimension of the code as a vector space over .
123.4 Hamming Distance
The Hamming distance between two words is the number of positions in which they differ.
For
the Hamming distance is
For example, over ,
They differ in positions and , so
Hamming distance measures how many symbol errors are needed to change one word into another.
123.5 Hamming Weight
The Hamming weight of a word is the number of nonzero coordinates.
It is written as
For binary vectors, this is the number of ones.
For example,
has weight
Over a field,
Thus distance is weight of a difference vector.
123.6 Minimum Distance
The minimum distance of a code is
It is the smallest distance between distinct codewords.
Minimum distance controls error detection and correction.
If codewords are far apart, then a received word with a small number of errors is still closer to the correct codeword than to any other codeword.
A code with minimum distance can detect up to
errors.
It can correct up to
errors by nearest-codeword decoding.
123.7 Linear Codes
A code is linear if it is a subspace of .
This means:
and
If has dimension , then
A linear code of length , dimension , and minimum distance is written as
Linear codes are useful because their structure can be described by matrices. This makes encoding, checking, and decoding computationally efficient.
123.8 Minimum Distance of a Linear Code
For a linear code, the minimum distance is equal to the minimum nonzero weight:
The proof is short.
Since is linear, if , then
Therefore
is the weight of a nonzero codeword.
Thus the closest pair of distinct codewords is determined by the lightest nonzero codeword.
This is a major simplification.
123.9 Generator Matrices
A generator matrix for a linear code is a matrix whose rows form a basis of the code.
A message vector
is encoded by
The resulting vector is a codeword.
The set of all codewords is
Thus the code is the row space of .
A generator matrix is not unique. Any row-equivalent matrix with the same row space generates the same code.
123.10 Example of Encoding
Let
over .
Let the message be
Then
Compute over :
The third coordinate is in .
123.11 Systematic Form
A generator matrix is in systematic form if it has the shape
Here is the identity matrix.
If
then
The original message appears directly in the first coordinates. The remaining coordinates are parity symbols.
Systematic form is convenient because encoding is transparent.
The codeword contains the message plus redundancy.
123.12 Parity-Check Matrices
A parity-check matrix for a linear code is a matrix such that
Equivalently, a word is a codeword exactly when it satisfies all parity-check equations.
If has dimension , then has independent rows.
The rows of define linear constraints on codewords. Notes on linear codes describe the rows of a check matrix as a basis for the parity-check equations satisfied by codewords.
123.13 Generator and Parity-Check Matrices
If a generator matrix is in systematic form
then a parity-check matrix is
Over , the minus sign disappears because
The relation between and is
This means every row of is orthogonal to every row of .
Equivalently, every generated codeword satisfies all parity checks. Standard references derive parity-check matrices from systematic generator matrices using this relation.
123.14 Example of a Parity-Check Matrix
Let
where
Over ,
Check that
Thus defines the same code by parity constraints.
A vector is a codeword if and only if
123.15 Error Detection
Suppose a codeword is transmitted.
During transmission, an error vector is added:
The receiver observes .
If is not a codeword, then an error is detected.
Using a parity-check matrix,
Since ,
Therefore
If
then an error has definitely occurred.
If
then is a codeword. Either no error occurred, or the error changed one codeword into another.
123.16 Syndrome
The syndrome of a received word is
The syndrome is zero exactly when satisfies all parity checks.
Since
we have
Thus the syndrome depends only on the error vector, not on the transmitted codeword.
This is the key fact behind syndrome decoding. Coding theory notes describe syndrome decoding as replacing a large coset search by computing the syndrome and matching it to an error representative.
123.17 Syndrome Decoding
Syndrome decoding uses a table that maps syndromes to likely error vectors.
The procedure is:
| Step | Operation |
|---|---|
| 1 | Receive |
| 2 | Compute syndrome |
| 3 | Look up likely error with |
| 4 | Decode as |
Over , subtraction equals addition, so
Usually the table stores the lowest-weight error vector for each syndrome.
This gives nearest-codeword decoding when the number of errors is within the correction capability of the code.
123.18 Cosets
Let be a linear code in .
For any vector , the set
is a coset of .
All vectors in the same coset have the same syndrome.
Indeed, if
then
so
Thus syndromes label cosets.
Syndrome decoding chooses a likely error representative, often called a coset leader, for each coset.
123.19 Coset Leaders
A coset leader is a vector of minimum weight in a coset.
In decoding, the coset of the received word represents all possible error patterns with the same syndrome.
The coset leader is chosen as the most likely error when errors are assumed to be rare and independent.
If the received word is
and the true error is the coset leader of its syndrome class, then syndrome decoding recovers .
This is the algebraic form of nearest-neighbor decoding.
123.20 The Repetition Code
The binary repetition code of length encodes one bit by repeating it times.
For ,
The code is
It is a linear code.
A generator matrix is
A parity-check matrix is
The minimum distance is , so the code can correct one error.
For example, if is received, majority vote decodes it as .
123.21 Single Parity-Check Code
The binary single parity-check code of length consists of all vectors with even weight:
A parity-check matrix is
This is an code.
It can detect one error, because a single bit flip changes parity.
It cannot correct one error, because the syndrome only says that parity is wrong. It does not identify which bit changed.
123.22 Hamming Codes
Hamming codes are classical linear error-correcting codes.
A binary Hamming code with parity checks has length
and dimension
Its parity-check matrix contains all nonzero binary column vectors of length .
For , the code is a code.
Its minimum distance is , so it corrects one error.
The syndrome of a received word identifies the column of corresponding to the error position.
This makes decoding especially simple.
123.23 The Hamming Code
One parity-check matrix for the Hamming code is
The columns are the nonzero binary triples.
If a single error occurs in position , then
is the unit vector with a in position , and
is column of .
Thus the syndrome directly gives the error location.
After locating the error, flip that bit.
123.24 Dual Codes
The dual code of a linear code is
The dual code is also a linear code.
If has dimension , then
has dimension
A parity-check matrix for is a generator matrix for . This is the standard dual-code interpretation of parity-check matrices.
123.25 Standard Array
A standard array organizes all vectors in into cosets of .
The first row is the code .
Each later row is a coset
where is chosen as a coset leader.
Decoding finds the received word in the array and subtracts the row leader.
The standard array is useful conceptually but often too large for practical decoding.
Syndrome decoding stores only the syndrome and coset leader information, which is more compact.
123.26 Bounds
Coding theory studies limits on possible code parameters.
Important bounds include:
| Bound | Main idea |
|---|---|
| Singleton bound | Limits by redundancy |
| Hamming bound | Sphere-packing argument |
| Gilbert-Varshamov bound | Existence by covering argument |
| Plotkin bound | Strong restriction for large distance |
| Griesmer bound | Bound for linear codes |
The Singleton bound says that an code satisfies
Codes meeting this bound are called maximum distance separable, or MDS codes.
Reed-Solomon codes are important examples of MDS codes.
123.27 Reed-Solomon Codes
Reed-Solomon codes are nonbinary linear codes defined by evaluating polynomials over finite fields.
Choose distinct field elements
A message determines a polynomial of degree less than .
The codeword is
Since two distinct polynomials of degree less than can agree in at most points, Reed-Solomon codes have minimum distance
Thus they meet the Singleton bound.
They are used in storage systems, communication, QR codes, optical media, and distributed systems.
123.28 Erasures
An erasure is an error whose position is known.
For example, a receiver may know that symbol was lost but not know its value.
Erasures are easier than unknown errors.
A code with minimum distance can correct up to
erasures.
This is larger than the number of arbitrary errors it can correct:
The reason is that erasures do not require discovering error positions.
Linear algebraic decoding often solves a system of equations for the missing symbols.
123.29 Applications
Coding theory is used wherever information must survive noise or loss.
| Area | Use |
|---|---|
| Communication | Correct channel errors |
| Data storage | Recover corrupted bits |
| Distributed storage | Reconstruct missing fragments |
| QR codes | Correct damaged symbols |
| Deep-space communication | Handle weak noisy signals |
| Memory systems | ECC RAM |
| Network coding | Combine packets algebraically |
| Cryptography | Code-based cryptosystems |
| DNA storage | Correct synthesis and sequencing errors |
| Quantum computing | Quantum error correction |
Modern coding theory includes classical, algebraic, probabilistic, and quantum codes. Recent lecture notes list applications such as reliable communication, data storage, distributed computing, network coding, cryptography, DNA storage, and quantum computing.
123.30 Coding Theory and Linear Algebra
The dictionary is direct.
| Coding theory | Linear algebra |
|---|---|
| Word | Vector over |
| Linear code | Subspace |
| Code dimension | Vector space dimension |
| Generator matrix | Basis matrix for code |
| Encoding | Matrix multiplication |
| Parity-check matrix | Linear constraint matrix |
| Syndrome | Image under |
| Error pattern | Vector |
| Coset | Affine subspace |
| Dual code | Orthogonal complement |
| Minimum distance | Minimum nonzero weight |
| Decoding | Nearest subspace or coset problem |
Coding theory is finite-field linear algebra with a metric.
The metric is Hamming distance rather than Euclidean distance.
123.31 Summary
Coding theory protects information by adding redundancy.
A block code is a set of allowed words. A linear code is a vector subspace of . Its generator matrix encodes messages by
Its parity-check matrix detects valid codewords by
If a received word is
then its syndrome is
The syndrome depends only on the error. Syndrome decoding uses this fact to identify a likely error pattern and recover the transmitted codeword.
The central principle is that redundancy becomes algebraic structure. By choosing a subspace with large Hamming distance, one can detect and correct errors using matrix operations over finite fields.