# Characters and Bytes

### Characters and Bytes

Zig does not have a separate character type for ordinary strings.

A string is a sequence of bytes.

```zig
const message = "hello";
```

The bytes are:

```text
104 101 108 108 111
```

These are the ASCII byte values for `h`, `e`, `l`, `l`, and `o`.

A byte has type `u8`.

```zig
const c: u8 = 'A';
```

The value of `c` is `65`.

A character literal with one ASCII character gives a `u8`.

```zig
const a: u8 = 'a';
const newline: u8 = '\n';
const tab: u8 = '\t';
```

Escape sequences are used for bytes that are hard to write directly.

| Escape | Meaning |
|---|---|
| `\n` | newline |
| `\t` | tab |
| `\r` | carriage return |
| `\\` | backslash |
| `\"` | double quote |
| `\'` | single quote |

A string literal may contain escape sequences.

```zig
const text = "one\ntwo\n";
```

This contains two newline bytes.

Printing it gives:

```text
one
two
```

A string literal has a sentinel-terminated array type. In ordinary code, it is often used as a slice of constant bytes.

```zig
const name: []const u8 = "zig";
```

Read this as: `name` is a slice of constant `u8` values.

The elements cannot be changed through `name`.

```zig
name[0] = 'Z'; // error
```

A mutable byte array can be changed.

```zig
var name = [_]u8{ 'z', 'i', 'g' };

name[0] = 'Z';
```

Now the array contains:

```text
Zig
```

A string is not the same thing as text in the full human sense. Zig strings are bytes. Text encoding is a separate matter.

Most modern text uses UTF-8. UTF-8 stores some characters in one byte and others in several bytes.

```zig
const s = "é";
```

This looks like one character, but in UTF-8 it uses two bytes.

```text
195 169
```

So this is not a good way to count human characters:

```zig
const s = "é";
const n = s.len; // 2
```

`len` counts bytes, not Unicode characters.

For ASCII text, one byte usually corresponds to one visible character.

```zig
const s = "abc";
```

Here `s.len` is `3`.

For UTF-8 text, byte length and character count may differ.

```zig
const s = "hello 世界";
```

The visible text has fewer characters than its byte length.

This is deliberate. Zig keeps the low-level representation clear. A string is bytes. If the program needs Unicode rules, it must use code that understands Unicode.

A byte can be printed as a character with `{c}`.

```zig
const std = @import("std");

pub fn main() void {
    const c: u8 = 'A';
    std.debug.print("{c}\n", .{c});
}
```

The output is:

```text
A
```

The same byte can be printed as a number with `{d}`.

```zig
std.debug.print("{d}\n", .{c});
```

The output is:

```text
65
```

This is often useful when inspecting data.

A simple loop over a string visits bytes.

```zig
const std = @import("std");

pub fn main() void {
    const s = "abc";

    for (s) |b| {
        std.debug.print("{c} {d}\n", .{ b, b });
    }
}
```

The output is:

```text
a 97
b 98
c 99
```

Each value `b` has type `u8`.

Use `u8` when you mean a byte. Use `[]const u8` when you mean a read-only byte string. Treat Unicode as an encoding problem, not as a hidden language feature.

Exercises:

1. Declare a `u8` with value `'A'` and print it with `{c}` and `{d}`.

2. Write a string containing a newline and print it.

3. Create a mutable array containing the bytes for `cat`, then change it to `bat`.

4. Loop over `"zig"` and print each byte as both a character and a number.

5. Check the `.len` of `"é"` and explain why it is not `1`.

