# Chapter 123. Coding Theory

# Chapter 123. Coding Theory

Coding theory studies how to represent information so that errors can be detected or corrected.

A message may be transmitted through a noisy channel, stored on an unreliable medium, or processed by a system where bits may be lost or changed. Coding theory adds controlled redundancy to the message so that the original information can still be recovered.

Linear algebra enters coding theory through vector spaces over finite fields. A linear code is a subspace. A generator matrix encodes messages. A parity-check matrix detects errors. Syndrome decoding turns error correction into a problem about cosets and linear equations. Standard coding theory notes describe linear codes through generator matrices, parity-check matrices, and syndrome decoding.

## 123.1 Messages and Codewords

A message is the original information to be protected.

A codeword is the encoded form of that message.

For example, a binary message may be

$$
m =
\begin{bmatrix}
1 & 0 & 1
\end{bmatrix}.
$$

A code may encode it as a longer vector

$$
c =
\begin{bmatrix}
1 & 0 & 1 & 1 & 0 & 1
\end{bmatrix}.
$$

The extra coordinates are redundant. They are added so that errors can be detected or corrected.

The general pattern is:

$$
\text{message} \longmapsto \text{codeword}.
$$

If the codeword is changed during transmission, the receiver tries to infer the original codeword.

## 123.2 Alphabets and Finite Fields

Coding theory usually works over a finite alphabet.

The most common case is the binary alphabet

$$
\{0,1\}.
$$

Binary arithmetic is performed in the finite field

$$
\mathbb{F}_2.
$$

In \(\mathbb{F}_2\),

$$
1+1=0.
$$

This means addition is the same as exclusive-or.

More generally, codes may be defined over a finite field

$$
\mathbb{F}_q,
$$

where \(q\) is a prime power.

A word of length \(n\) over \(\mathbb{F}_q\) is a vector in

$$
\mathbb{F}_q^n.
$$

Thus coding theory studies selected subsets or subspaces of finite-dimensional vector spaces over finite fields.

## 123.3 Block Codes

A block code of length \(n\) over \(\mathbb{F}_q\) is a subset

$$
C \subseteq \mathbb{F}_q^n.
$$

The elements of \(C\) are codewords.

If the code contains \(M\) codewords, then it can represent \(M\) possible messages.

A code is often described by parameters such as

$$
(n,M,d).
$$

Here:

| Symbol | Meaning |
|---|---|
| \(n\) | Codeword length |
| \(M\) | Number of codewords |
| \(d\) | Minimum distance |

When the code is linear, the parameters are usually written as

$$
[n,k,d]_q.
$$

Here \(k\) is the dimension of the code as a vector space over \(\mathbb{F}_q\).

## 123.4 Hamming Distance

The Hamming distance between two words is the number of positions in which they differ.

For

$$
x,y\in\mathbb{F}_q^n,
$$

the Hamming distance is

$$
d(x,y)=|\{i:x_i\neq y_i\}|.
$$

For example, over \(\mathbb{F}_2\),

$$
x=101101,
\qquad
y=100111.
$$

They differ in positions \(3\) and \(5\), so

$$
d(x,y)=2.
$$

Hamming distance measures how many symbol errors are needed to change one word into another.

## 123.5 Hamming Weight

The Hamming weight of a word is the number of nonzero coordinates.

It is written as

$$
\operatorname{wt}(x)=|\{i:x_i\neq 0\}|.
$$

For binary vectors, this is the number of ones.

For example,

$$
x=101101
$$

has weight

$$
\operatorname{wt}(x)=4.
$$

Over a field,

$$
d(x,y)=\operatorname{wt}(x-y).
$$

Thus distance is weight of a difference vector.

## 123.6 Minimum Distance

The minimum distance of a code \(C\) is

$$
d_{\min} =
\min\{d(c,c') : c,c'\in C,\; c\neq c'\}.
$$

It is the smallest distance between distinct codewords.

Minimum distance controls error detection and correction.

If codewords are far apart, then a received word with a small number of errors is still closer to the correct codeword than to any other codeword.

A code with minimum distance \(d\) can detect up to

$$
d-1
$$

errors.

It can correct up to

$$
\left\lfloor \frac{d-1}{2} \right\rfloor
$$

errors by nearest-codeword decoding.

## 123.7 Linear Codes

A code \(C\subseteq \mathbb{F}_q^n\) is linear if it is a subspace of \(\mathbb{F}_q^n\).

This means:

$$
c_1,c_2\in C
\quad\Rightarrow\quad
c_1+c_2\in C,
$$

and

$$
a\in\mathbb{F}_q,\;c\in C
\quad\Rightarrow\quad
ac\in C.
$$

If \(C\) has dimension \(k\), then

$$
|C|=q^k.
$$

A linear code of length \(n\), dimension \(k\), and minimum distance \(d\) is written as

$$
[n,k,d]_q.
$$

Linear codes are useful because their structure can be described by matrices. This makes encoding, checking, and decoding computationally efficient.

## 123.8 Minimum Distance of a Linear Code

For a linear code, the minimum distance is equal to the minimum nonzero weight:

$$
d_{\min} =
\min\{\operatorname{wt}(c):c\in C,\;c\neq 0\}.
$$

The proof is short.

Since \(C\) is linear, if \(c,c'\in C\), then

$$
c-c'\in C.
$$

Therefore

$$
d(c,c')=\operatorname{wt}(c-c')
$$

is the weight of a nonzero codeword.

Thus the closest pair of distinct codewords is determined by the lightest nonzero codeword.

This is a major simplification.

## 123.9 Generator Matrices

A generator matrix for a linear \([n,k]_q\) code is a \(k\times n\) matrix \(G\) whose rows form a basis of the code.

A message vector

$$
m\in\mathbb{F}_q^k
$$

is encoded by

$$
c=mG.
$$

The resulting vector \(c\in\mathbb{F}_q^n\) is a codeword.

The set of all codewords is

$$
C=\{mG:m\in\mathbb{F}_q^k\}.
$$

Thus the code is the row space of \(G\).

A generator matrix is not unique. Any row-equivalent matrix with the same row space generates the same code.

## 123.10 Example of Encoding

Let

$$
G=
\begin{bmatrix}
1 & 0 & 1 & 1 \\
0 & 1 & 1 & 0
\end{bmatrix}
$$

over \(\mathbb{F}_2\).

Let the message be

$$
m=
\begin{bmatrix}
1 & 1
\end{bmatrix}.
$$

Then

$$
c=mG =
\begin{bmatrix}
1 & 1
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 1 & 1 \\
0 & 1 & 1 & 0
\end{bmatrix}.
$$

Compute over \(\mathbb{F}_2\):

$$
c=
\begin{bmatrix}
1 & 1 & 0 & 1
\end{bmatrix}.
$$

The third coordinate is \(1+1=0\) in \(\mathbb{F}_2\).

## 123.11 Systematic Form

A generator matrix is in systematic form if it has the shape

$$
G=
\begin{bmatrix}
I_k & P
\end{bmatrix}.
$$

Here \(I_k\) is the \(k\times k\) identity matrix.

If

$$
m=
\begin{bmatrix}
m_1 & \cdots & m_k
\end{bmatrix},
$$

then

$$
mG=
\begin{bmatrix}
m & mP
\end{bmatrix}.
$$

The original message appears directly in the first \(k\) coordinates. The remaining \(n-k\) coordinates are parity symbols.

Systematic form is convenient because encoding is transparent.

The codeword contains the message plus redundancy.

## 123.12 Parity-Check Matrices

A parity-check matrix for a linear code \(C\) is a matrix \(H\) such that

$$
C=\{x\in\mathbb{F}_q^n:Hx^T=0\}.
$$

Equivalently, a word is a codeword exactly when it satisfies all parity-check equations.

If \(C\) has dimension \(k\), then \(H\) has \(n-k\) independent rows.

The rows of \(H\) define linear constraints on codewords. Notes on linear codes describe the rows of a check matrix as a basis for the parity-check equations satisfied by codewords.

## 123.13 Generator and Parity-Check Matrices

If a generator matrix is in systematic form

$$
G=
\begin{bmatrix}
I_k & P
\end{bmatrix},
$$

then a parity-check matrix is

$$
H=
\begin{bmatrix}
-P^T & I_{n-k}
\end{bmatrix}.
$$

Over \(\mathbb{F}_2\), the minus sign disappears because

$$
-1=1.
$$

The relation between \(G\) and \(H\) is

$$
GH^T=0.
$$

This means every row of \(G\) is orthogonal to every row of \(H\).

Equivalently, every generated codeword satisfies all parity checks. Standard references derive parity-check matrices from systematic generator matrices using this relation.

## 123.14 Example of a Parity-Check Matrix

Let

$$
G=
\begin{bmatrix}
1 & 0 & 1 & 1 \\
0 & 1 & 1 & 0
\end{bmatrix} =
\begin{bmatrix}
I_2 & P
\end{bmatrix},
$$

where

$$
P=
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}.
$$

Over \(\mathbb{F}_2\),

$$
H=
\begin{bmatrix}
P^T & I_2
\end{bmatrix} =
\begin{bmatrix}
1 & 1 & 1 & 0 \\
1 & 0 & 0 & 1
\end{bmatrix}.
$$

Check that

$$
GH^T=0.
$$

Thus \(H\) defines the same code by parity constraints.

A vector \(x\in\mathbb{F}_2^4\) is a codeword if and only if

$$
Hx^T=0.
$$

## 123.15 Error Detection

Suppose a codeword \(c\) is transmitted.

During transmission, an error vector \(e\) is added:

$$
r=c+e.
$$

The receiver observes \(r\).

If \(r\) is not a codeword, then an error is detected.

Using a parity-check matrix,

$$
Hr^T=H(c+e)^T.
$$

Since \(c\in C\),

$$
Hc^T=0.
$$

Therefore

$$
Hr^T=He^T.
$$

If

$$
Hr^T\neq 0,
$$

then an error has definitely occurred.

If

$$
Hr^T=0,
$$

then \(r\) is a codeword. Either no error occurred, or the error changed one codeword into another.

## 123.16 Syndrome

The syndrome of a received word \(r\) is

$$
s=Hr^T.
$$

The syndrome is zero exactly when \(r\) satisfies all parity checks.

Since

$$
r=c+e,
$$

we have

$$
s=H(c+e)^T=Hc^T+He^T=He^T.
$$

Thus the syndrome depends only on the error vector, not on the transmitted codeword.

This is the key fact behind syndrome decoding. Coding theory notes describe syndrome decoding as replacing a large coset search by computing the syndrome and matching it to an error representative.

## 123.17 Syndrome Decoding

Syndrome decoding uses a table that maps syndromes to likely error vectors.

The procedure is:

| Step | Operation |
|---|---|
| 1 | Receive \(r\) |
| 2 | Compute syndrome \(s=Hr^T\) |
| 3 | Look up likely error \(e\) with \(He^T=s\) |
| 4 | Decode as \(c=r-e\) |

Over \(\mathbb{F}_2\), subtraction equals addition, so

$$
c=r+e.
$$

Usually the table stores the lowest-weight error vector for each syndrome.

This gives nearest-codeword decoding when the number of errors is within the correction capability of the code.

## 123.18 Cosets

Let \(C\) be a linear code in \(\mathbb{F}_q^n\).

For any vector \(a\in\mathbb{F}_q^n\), the set

$$
a+C=\{a+c:c\in C\}
$$

is a coset of \(C\).

All vectors in the same coset have the same syndrome.

Indeed, if

$$
r_1-r_2\in C,
$$

then

$$
H(r_1-r_2)^T=0,
$$

so

$$
Hr_1^T=Hr_2^T.
$$

Thus syndromes label cosets.

Syndrome decoding chooses a likely error representative, often called a coset leader, for each coset.

## 123.19 Coset Leaders

A coset leader is a vector of minimum weight in a coset.

In decoding, the coset of the received word represents all possible error patterns with the same syndrome.

The coset leader is chosen as the most likely error when errors are assumed to be rare and independent.

If the received word is

$$
r=c+e,
$$

and the true error \(e\) is the coset leader of its syndrome class, then syndrome decoding recovers \(c\).

This is the algebraic form of nearest-neighbor decoding.

## 123.20 The Repetition Code

The binary repetition code of length \(n\) encodes one bit by repeating it \(n\) times.

For \(n=3\),

$$
0\mapsto 000,
\qquad
1\mapsto 111.
$$

The code is

$$
C=\{000,111\}.
$$

It is a linear \([3,1,3]_2\) code.

A generator matrix is

$$
G=
\begin{bmatrix}
1 & 1 & 1
\end{bmatrix}.
$$

A parity-check matrix is

$$
H=
\begin{bmatrix}
1 & 1 & 0 \\
1 & 0 & 1
\end{bmatrix}.
$$

The minimum distance is \(3\), so the code can correct one error.

For example, if \(101\) is received, majority vote decodes it as \(111\).

## 123.21 Single Parity-Check Code

The binary single parity-check code of length \(n\) consists of all vectors with even weight:

$$
C=\{x\in\mathbb{F}_2^n:x_1+x_2+\cdots+x_n=0\}.
$$

A parity-check matrix is

$$
H=
\begin{bmatrix}
1 & 1 & \cdots & 1
\end{bmatrix}.
$$

This is an \([n,n-1,2]_2\) code.

It can detect one error, because a single bit flip changes parity.

It cannot correct one error, because the syndrome only says that parity is wrong. It does not identify which bit changed.

## 123.22 Hamming Codes

Hamming codes are classical linear error-correcting codes.

A binary Hamming code with \(r\) parity checks has length

$$
n=2^r-1
$$

and dimension

$$
k=2^r-r-1.
$$

Its parity-check matrix \(H\) contains all nonzero binary column vectors of length \(r\).

For \(r=3\), the code is a \([7,4,3]_2\) code.

Its minimum distance is \(3\), so it corrects one error.

The syndrome of a received word identifies the column of \(H\) corresponding to the error position.

This makes decoding especially simple.

## 123.23 The \([7,4,3]\) Hamming Code

One parity-check matrix for the \([7,4,3]_2\) Hamming code is

$$
H=
\begin{bmatrix}
1 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 1 & 1 & 0 & 0 & 1 & 1 \\
0 & 0 & 0 & 1 & 1 & 1 & 1
\end{bmatrix}.
$$

The columns are the nonzero binary triples.

If a single error occurs in position \(j\), then

$$
e_j
$$

is the unit vector with a \(1\) in position \(j\), and

$$
He_j^T
$$

is column \(j\) of \(H\).

Thus the syndrome directly gives the error location.

After locating the error, flip that bit.

## 123.24 Dual Codes

The dual code of a linear code \(C\subseteq\mathbb{F}_q^n\) is

$$
C^\perp =
\{y\in\mathbb{F}_q^n : x\cdot y=0 \text{ for all } x\in C\}.
$$

The dual code is also a linear code.

If \(C\) has dimension \(k\), then

$$
C^\perp
$$

has dimension

$$
n-k.
$$

A parity-check matrix for \(C\) is a generator matrix for \(C^\perp\). This is the standard dual-code interpretation of parity-check matrices.

## 123.25 Standard Array

A standard array organizes all vectors in \(\mathbb{F}_q^n\) into cosets of \(C\).

The first row is the code \(C\).

Each later row is a coset

$$
a+C,
$$

where \(a\) is chosen as a coset leader.

Decoding finds the received word in the array and subtracts the row leader.

The standard array is useful conceptually but often too large for practical decoding.

Syndrome decoding stores only the syndrome and coset leader information, which is more compact.

## 123.26 Bounds

Coding theory studies limits on possible code parameters.

Important bounds include:

| Bound | Main idea |
|---|---|
| Singleton bound | Limits \(d\) by redundancy |
| Hamming bound | Sphere-packing argument |
| Gilbert-Varshamov bound | Existence by covering argument |
| Plotkin bound | Strong restriction for large distance |
| Griesmer bound | Bound for linear codes |

The Singleton bound says that an \([n,k,d]_q\) code satisfies

$$
d\leq n-k+1.
$$

Codes meeting this bound are called maximum distance separable, or MDS codes.

Reed-Solomon codes are important examples of MDS codes.

## 123.27 Reed-Solomon Codes

Reed-Solomon codes are nonbinary linear codes defined by evaluating polynomials over finite fields.

Choose distinct field elements

$$
a_1,\ldots,a_n\in\mathbb{F}_q.
$$

A message determines a polynomial \(f(x)\) of degree less than \(k\).

The codeword is

$$
(f(a_1),f(a_2),\ldots,f(a_n)).
$$

Since two distinct polynomials of degree less than \(k\) can agree in at most \(k-1\) points, Reed-Solomon codes have minimum distance

$$
d=n-k+1.
$$

Thus they meet the Singleton bound.

They are used in storage systems, communication, QR codes, optical media, and distributed systems.

## 123.28 Erasures

An erasure is an error whose position is known.

For example, a receiver may know that symbol \(i\) was lost but not know its value.

Erasures are easier than unknown errors.

A code with minimum distance \(d\) can correct up to

$$
d-1
$$

erasures.

This is larger than the number of arbitrary errors it can correct:

$$
\left\lfloor \frac{d-1}{2} \right\rfloor.
$$

The reason is that erasures do not require discovering error positions.

Linear algebraic decoding often solves a system of equations for the missing symbols.

## 123.29 Applications

Coding theory is used wherever information must survive noise or loss.

| Area | Use |
|---|---|
| Communication | Correct channel errors |
| Data storage | Recover corrupted bits |
| Distributed storage | Reconstruct missing fragments |
| QR codes | Correct damaged symbols |
| Deep-space communication | Handle weak noisy signals |
| Memory systems | ECC RAM |
| Network coding | Combine packets algebraically |
| Cryptography | Code-based cryptosystems |
| DNA storage | Correct synthesis and sequencing errors |
| Quantum computing | Quantum error correction |

Modern coding theory includes classical, algebraic, probabilistic, and quantum codes. Recent lecture notes list applications such as reliable communication, data storage, distributed computing, network coding, cryptography, DNA storage, and quantum computing.

## 123.30 Coding Theory and Linear Algebra

The dictionary is direct.

| Coding theory | Linear algebra |
|---|---|
| Word | Vector over \(\mathbb{F}_q\) |
| Linear code | Subspace |
| Code dimension | Vector space dimension |
| Generator matrix | Basis matrix for code |
| Encoding | Matrix multiplication |
| Parity-check matrix | Linear constraint matrix |
| Syndrome | Image under \(H\) |
| Error pattern | Vector |
| Coset | Affine subspace |
| Dual code | Orthogonal complement |
| Minimum distance | Minimum nonzero weight |
| Decoding | Nearest subspace or coset problem |

Coding theory is finite-field linear algebra with a metric.

The metric is Hamming distance rather than Euclidean distance.

## 123.31 Summary

Coding theory protects information by adding redundancy.

A block code is a set of allowed words. A linear code is a vector subspace of \(\mathbb{F}_q^n\). Its generator matrix encodes messages by

$$
c=mG.
$$

Its parity-check matrix detects valid codewords by

$$
Hx^T=0.
$$

If a received word is

$$
r=c+e,
$$

then its syndrome is

$$
s=Hr^T=He^T.
$$

The syndrome depends only on the error. Syndrome decoding uses this fact to identify a likely error pattern and recover the transmitted codeword.

The central principle is that redundancy becomes algebraic structure. By choosing a subspace with large Hamming distance, one can detect and correct errors using matrix operations over finite fields.