# Strings and UTF-8

### Strings and UTF-8

A string is text.

```zig
const name = "Zig";
```

The text `"Zig"` is a string literal. A string literal is text written directly in the source code.

```zig
const message = "Hello";
const language = "Zig";
const path = "/usr/bin";
```

Strings look simple, but in Zig they are more explicit than in many beginner languages. Zig does not hide the fact that text is stored as bytes.

#### Strings are bytes

A Zig string literal is a sequence of bytes.

For ordinary English text, this is easy to see:

```zig
const text = "abc";
```

The letters have byte values:

| Character | Byte value |
|---|---:|
| `a` | 97 |
| `b` | 98 |
| `c` | 99 |

So `"abc"` is stored as three useful bytes:

```text
97 98 99
```

There is also a sentinel zero byte at the end of a string literal, which helps with C interoperability. For now, the important point is: the visible text is stored as bytes.

#### Printing strings

To print a string with `std.debug.print`, use `{s}`.

```zig
const std = @import("std");

pub fn main() void {
    const language = "Zig";

    std.debug.print("Language: {s}\n", .{language});
}
```

Output:

```text
Language: Zig
```

The formatter `{s}` means “print this as a string.”

This is different from `{}`, which is used for many ordinary values:

```zig
std.debug.print("Number: {}\n", .{123});
```

For strings, use `{s}`.

#### String length

A string has a length.

```zig
const text = "hello";
```

You can ask for the length:

```zig
const len = text.len;
```

Complete example:

```zig
const std = @import("std");

pub fn main() void {
    const text = "hello";

    std.debug.print("text = {s}\n", .{text});
    std.debug.print("length = {}\n", .{text.len});
}
```

Output:

```text
text = hello
length = 5
```

Here, `text.len` is `5` because the visible text has five bytes.

#### Length means bytes, not always characters

For plain ASCII text, one character usually equals one byte.

```zig
const text = "hello";
```

This has 5 characters and 5 bytes.

But many languages use characters that need more than one byte in UTF-8.

```zig
const text = "é";
```

The visible text has one human character, but UTF-8 stores it using more than one byte.

So in Zig:

```zig
const text = "é";
```

`text.len` is the number of bytes, not the number of human-visible characters.

This is very important. Zig treats strings as bytes. Unicode text is built on top of those bytes.

#### What is UTF-8

UTF-8 is a way to store Unicode text as bytes.

Unicode is a large system for representing text from many languages and symbol sets. UTF-8 is the common byte encoding used on the web, in source code, in JSON, and in many operating systems.

ASCII text uses one byte per character:

```text
A B C
```

Many non-ASCII characters use multiple bytes:

```text
é
字
🙂
```

Zig string literals are UTF-8 encoded. That means this is allowed:

```zig
const greeting = "こんにちは";
const icon = "✓";
```

But Zig does not pretend that every visible character is one byte. It keeps the byte representation visible.

#### Indexing a string

You can index a string to get a byte.

```zig
const text = "abc";
const first = text[0];
```

`first` is the byte for `a`.

Complete example:

```zig
const std = @import("std");

pub fn main() void {
    const text = "abc";

    std.debug.print("{}\n", .{text[0]});
    std.debug.print("{c}\n", .{text[0]});
}
```

Output:

```text
97
a
```

The first print uses `{}` and shows the byte value.

The second print uses `{c}` and shows it as a character.

#### String indexes start at zero

Indexes start at zero.

For this string:

```zig
const text = "abc";
```

The indexes are:

| Index | Byte | Character |
|---:|---:|---|
| `0` | 97 | `a` |
| `1` | 98 | `b` |
| `2` | 99 | `c` |

So:

```zig
text[0] // 'a'
text[1] // 'b'
text[2] // 'c'
```

This is invalid:

```zig
text[3] // error or runtime safety check failure
```

There is no visible byte at index `3`.

#### Slicing a string

A slice refers to part of a string.

```zig
const text = "hello";
const part = text[0..2];
```

The slice `text[0..2]` contains bytes from index `0` up to but not including index `2`.

So it contains:

```text
he
```

Complete example:

```zig
const std = @import("std");

pub fn main() void {
    const text = "hello";

    const first_two = text[0..2];
    const rest = text[2..];

    std.debug.print("{s}\n", .{first_two});
    std.debug.print("{s}\n", .{rest});
}
```

Output:

```text
he
llo
```

The range rule is:

```text
start included
end excluded
```

So `0..2` means indexes `0` and `1`.

#### Be careful slicing UTF-8

Because strings are bytes, slicing can cut through the middle of a UTF-8 character.

That creates invalid UTF-8.

For ASCII text, this is fine:

```zig
const text = "hello";
const part = text[0..2]; // "he"
```

For non-ASCII text, you must be more careful:

```zig
const text = "é";
```

The visible character may use multiple bytes. A slice that takes only the first byte would not be valid text.

Zig does not automatically protect you from every Unicode mistake. It gives you byte-level control. When you need proper Unicode processing, use UTF-8 aware logic.

#### Strings are usually immutable

A string literal should be treated as read-only.

```zig
const text = "hello";
```

You should not try to modify the contents of a string literal.

If you need mutable text, use an array or a buffer.

```zig
var buffer = [_]u8{ 'h', 'e', 'l', 'l', 'o' };
buffer[0] = 'H';
```

Complete example:

```zig
const std = @import("std");

pub fn main() void {
    var buffer = [_]u8{ 'h', 'e', 'l', 'l', 'o' };

    buffer[0] = 'H';

    std.debug.print("{s}\n", .{buffer[0..]});
}
```

Output:

```text
Hello
```

Here, `buffer` is a mutable array of bytes. The slice `buffer[0..]` can be printed as a string because it contains text bytes.

#### Escape sequences

A string can contain special escape sequences.

| Escape | Meaning |
|---|---|
| `\n` | newline |
| `\t` | tab |
| `\"` | double quote |
| `\\` | backslash |

Example:

```zig
const std = @import("std");

pub fn main() void {
    std.debug.print("one\ntwo\n", .{});
}
```

Output:

```text
one
two
```

The `\n` creates a new line.

To include quotes inside a string:

```zig
const text = "She said \"hello\"";
```

To include a backslash:

```zig
const path = "C:\\Users\\Ada";
```

#### Multiline strings

Zig supports multiline string literals using lines that begin with `\\`.

```zig
const text =
    \\first line
    \\second line
    \\third line
;
```

Complete example:

```zig
const std = @import("std");

pub fn main() void {
    const text =
        \\first line
        \\second line
        \\third line
    ;

    std.debug.print("{s}\n", .{text});
}
```

Output:

```text
first line
second line
third line
```

This is useful for embedded text, templates, generated code, SQL, JSON examples, and help messages.

#### Comparing strings

Do not compare strings with `==` when you mean “same text.”

Use `std.mem.eql`.

```zig
const std = @import("std");

pub fn main() void {
    const a = "zig";
    const b = "zig";

    if (std.mem.eql(u8, a, b)) {
        std.debug.print("same\n", .{});
    }
}
```

Output:

```text
same
```

The call:

```zig
std.mem.eql(u8, a, b)
```

means: compare these two sequences of `u8` bytes.

Zig is explicit because strings are byte slices.

#### Building strings

String literals are fixed. If you need to build a string at runtime, you usually use an allocator or a buffer.

A simple buffer example:

```zig
const std = @import("std");

pub fn main() void {
    var buffer: [64]u8 = undefined;

    const name = "Zig";
    const message = std.fmt.bufPrint(&buffer, "Hello, {s}!", .{name}) catch return;

    std.debug.print("{s}\n", .{message});
}
```

Output:

```text
Hello, Zig!
```

Here, `buffer` provides storage. `std.fmt.bufPrint` writes formatted text into that storage and returns a slice containing the initialized text.

This pattern is common in Zig: caller provides memory, function writes into it.

#### A complete example

```zig
const std = @import("std");

pub fn main() void {
    const language = "Zig";
    const description = "systems programming";

    std.debug.print("Language: {s}\n", .{language});
    std.debug.print("Length in bytes: {}\n", .{language.len});

    const first = language[0];
    std.debug.print("First byte: {}\n", .{first});
    std.debug.print("First character: {c}\n", .{first});

    const short = description[0..7];
    std.debug.print("Short description: {s}\n", .{short});

    if (std.mem.eql(u8, language, "Zig")) {
        std.debug.print("The language is Zig.\n", .{});
    }
}
```

Output:

```text
Language: Zig
Length in bytes: 3
First byte: 90
First character: Z
Short description: systems
The language is Zig.
```

This example shows the central facts: strings are byte sequences, `.len` counts bytes, indexing gets bytes, slicing gets byte ranges, and string comparison uses a memory comparison function.

#### The Main Idea

In Zig, strings are not magical objects. They are byte sequences, usually encoded as UTF-8.

That design gives you control. You can inspect bytes, slice text, pass strings to C, store raw data, and avoid hidden allocation. The tradeoff is that you must remember the difference between bytes and human characters.

For beginner Zig code, use string literals for fixed text, `{s}` for printing, `.len` for byte length, slicing for substrings, and `std.mem.eql(u8, a, b)` for string comparison.

