Array Tiling

Array tiling divides a multidimensional array into small rectangular regions called tiles. Each tile is processed before moving to the next tile.

You use it when matrix or grid operations touch nearby elements repeatedly and cache locality affects performance.

Problem

Given a matrix $A$ with $r$ rows and $c$ columns, process all elements in tiles of size:

t_r \times t_c

where $t_r$ is the tile height and $t_c$ is the tile width.

Algorithm

Scan a matrix by tiles.

tile_scan(A, rows, cols, tile_rows, tile_cols):
    if tile_rows <= 0 or tile_cols <= 0:
        error "tile size must be positive"

    for row_start from 0 to rows - 1 step tile_rows:
        row_end = min(row_start + tile_rows, rows)

        for col_start from 0 to cols - 1 step tile_cols:
            col_end = min(col_start + tile_cols, cols)

            for i from row_start to row_end - 1:
                for j from col_start to col_end - 1:
                    process A[i][j]

Example

Let a matrix have $4$ rows and $5$ columns. Use tiles of size $2 \times 3$ .

tile	row range	column range
1	$[0, 2)$	$[0, 3)$
2	$[0, 2)$	$[3, 5)$
3	$[2, 4)$	$[0, 3)$
4	$[2, 4)$	$[3, 5)$

Each tile is processed as a small local region.

Correctness

The outer row loop partitions the row range into non-overlapping intervals. The outer column loop partitions the column range in the same way. Their Cartesian product forms non-overlapping tiles that cover the whole matrix.

Inside each tile, the nested loops visit every element in that tile exactly once. Therefore, every matrix element is processed exactly once.

Complexity

operation	time
tile scan	$O(rc)$

Space usage:

O(1)

for direct tile scanning.

If an algorithm copies each tile into a temporary buffer, extra space is:

O(t_r t_c)

When to Use

Array tiling is appropriate when:

processing matrices or grids
repeated access to nearby cells matters
cache locality affects runtime
tiles can be processed independently

It is less suitable when:

the array is small
each element is touched only once
access patterns are already contiguous and cache friendly

Implementation

def tile_scan(matrix, tile_rows, tile_cols, process):
    if tile_rows <= 0 or tile_cols <= 0:
        raise ValueError("tile size must be positive")

    rows = len(matrix)
    cols = len(matrix[0]) if rows else 0

    for row_start in range(0, rows, tile_rows):
        row_end = min(row_start + tile_rows, rows)

        for col_start in range(0, cols, tile_cols):
            col_end = min(col_start + tile_cols, cols)

            for i in range(row_start, row_end):
                for j in range(col_start, col_end):
                    process(matrix[i][j])

func TileScan[T any](matrix [][]T, tileRows, tileCols int, process func(T)) bool {
    if tileRows <= 0 || tileCols <= 0 {
        return false
    }

    rows := len(matrix)
    if rows == 0 {
        return true
    }

    cols := len(matrix[0])

    for rowStart := 0; rowStart < rows; rowStart += tileRows {
        rowEnd := rowStart + tileRows
        if rowEnd > rows {
            rowEnd = rows
        }

        for colStart := 0; colStart < cols; colStart += tileCols {
            colEnd := colStart + tileCols
            if colEnd > cols {
                colEnd = cols
            }

            for i := rowStart; i < rowEnd; i++ {
                for j := colStart; j < colEnd; j++ {
                    process(matrix[i][j])
                }
            }
        }
    }

    return true
}