Array Memory Layout

Array memory layout describes how elements are arranged in physical memory. The layout determines how efficiently the CPU can access data due to cache behavior and alignment.

You use this knowledge to write code that is fast in practice, not only in asymptotic terms.

Problem

Given an array $A$ of length $n$ , understand how its storage layout affects:

access time
cache utilization
traversal performance

Structure

A standard array uses contiguous memory:

A = [a_0, a_1, a_2, \dots, a_{n-1}]

Each element occupies a fixed number of bytes. The address of element $i$ is:

address(A[i]) = base(A) + i \cdot size(element)

Layout Patterns

1. Contiguous layout

Elements are stored sequentially.

excellent cache locality
predictable access

2. Row-major layout (for 2D arrays)

Rows are stored consecutively:

offset = i \cdot columns + j

3. Column-major layout

Columns are stored consecutively:

offset = j \cdot rows + i

4. Strided access

Access pattern skips elements:

A[0], A[2], A[4], \dots

This reduces cache efficiency.

Algorithm

Cache-friendly traversal:

scan(A):
    for i from 0 to n - 1:
        process A[i]

Cache-unfriendly traversal (example for 2D):

scan(matrix):
    for j from 0 to columns - 1:
        for i from 0 to rows - 1:
            process matrix[i][j]

Example

Consider a 2D array:

A = \begin{bmatrix} 8 & 3 & 5 \ 1 & 9 & 4 \end{bmatrix}

Row-major storage:

[8, 3, 5, 1, 9, 4]

Row-wise traversal:

step	value
1	8
2	3
3	5
4	1
5	9
6	4

Column-wise traversal jumps across memory, causing more cache misses.

Correctness

The mapping from index to memory location follows a deterministic formula. As long as indexing respects this formula, every element is accessed correctly.

Performance differences arise from how memory is accessed, not from correctness.

Complexity

operation	time
indexed access	$O(1)$
sequential scan	$O(n)$

However, constant factors vary due to cache behavior.

When to Use

Memory layout considerations are important when:

working with large arrays
performance is critical
cache locality affects runtime
processing matrices or tensors

Guidelines:

iterate in memory order
avoid large strides
group related data together
prefer contiguous storage

Implementation

def row_major_scan(matrix):
    for row in matrix:
        for value in row:
            process(value)

func RowMajorScan(matrix [][]int) {
    for i := 0; i < len(matrix); i++ {
        for j := 0; j < len(matrix[i]); j++ {
            process(matrix[i][j])
        }
    }
}

func process(x int) {
    // placeholder
}