Skip to content

LeetCode 271: Encode and Decode Strings

A clear explanation of the Encode and Decode Strings problem using length-prefix encoding.

Problem Restatement

We need to design two methods:

encode(strs)
decode(s)

The encode method receives a list of strings and converts it into one string.

The decode method receives that encoded string and reconstructs the original list.

We cannot use serialization helpers such as eval.

The strings may contain any of the 256 valid ASCII characters, including delimiters, digits, spaces, and empty strings. The constraints are 1 <= strs.length <= 200 and 0 <= strs[i].length <= 200.

Input and Output

MethodInputOutput
encodeList of stringsOne encoded string
decodeOne encoded stringOriginal list of strings

Example class shape:

class Codec:

    def encode(self, strs: list[str]) -> str:
        ...

    def decode(self, s: str) -> list[str]:
        ...

Examples

Example 1:

strs = ["Hello", "World"]

One valid encoding is:

"5#Hello5#World"

Decode process:

5#Hello -> "Hello"
5#World -> "World"

Answer after decoding:

["Hello", "World"]

Example 2:

strs = [""]

The empty string has length 0.

Encoding:

"0#"

Decoding returns:

[""]

Example 3:

strs = ["a#b", "12", ""]

A delimiter-only solution would be risky because strings may contain #.

Length-prefix encoding handles this safely:

"3#a#b2#120#"

Decode result:

["a#b", "12", ""]

First Thought: Join With a Delimiter

A simple idea is to join strings with a separator:

"#".join(strs)

For example:

["Hello", "World"]

becomes:

"Hello#World"

But this fails if a string itself contains #.

For example:

["a#b", "c"]

would become:

"a#b#c"

When decoding, we cannot know whether this means:

["a", "b", "c"]

or:

["a#b", "c"]

So a plain delimiter is not enough.

Key Insight

Each encoded string should say how long the next original string is.

Use this format:

length + "#" + string

For example:

"Hello"

has length 5, so it becomes:

"5#Hello"

The decoder reads digits until #, converts them to a length, then reads exactly that many characters.

This works even if the string contains #, digits, or spaces, because the decoder does not search inside the content. It trusts the length.

Algorithm

For encoding:

  1. Create an empty list parts.
  2. For each string word:
    • Append str(len(word)).
    • Append "#".
    • Append word.
  3. Join all parts into one string.

For decoding:

  1. Start at index i = 0.
  2. Read characters until #. These characters form the length.
  3. Convert the length to an integer.
  4. Read exactly that many characters after #.
  5. Append that substring to the answer.
  6. Continue until the encoded string is fully consumed.

Walkthrough

Use:

strs = ["a#b", "12", ""]

Encoding:

StringLengthEncoded piece
"a#b"3"3#a#b"
"12"2"2#12"
""0"0#"

Full encoded string:

"3#a#b2#120#"

Decoding starts at index 0.

Read length:

3

Skip #.

Read next 3 characters:

"a#b"

Next index points to:

"2#120#"

Read length 2, then read:

"12"

Next read length 0, then read no characters.

Result:

["a#b", "12", ""]

Correctness

Each encoded item contains the exact length of the corresponding original string before the string content.

During decoding, the algorithm first reads this length prefix. Then it reads exactly that many characters as the next string.

Because the number of characters to read is known, characters inside the string content cannot confuse the decoder. The content may contain #, digits, spaces, or empty text, but it is read by length rather than by delimiter search.

Encoding writes the strings in order, and decoding consumes them in the same order. Therefore each decoded string equals the corresponding original string.

Thus decode(encode(strs)) always returns strs.

Complexity

Let m be the sum of the lengths of all strings, and let n be the number of strings.

MethodTimeSpace
encodeO(m + n)O(m + n)
decodeO(m + n)O(m + n)

The encoded string stores every original character plus length metadata.

Implementation

class Codec:

    def encode(self, strs: list[str]) -> str:
        parts = []

        for word in strs:
            parts.append(str(len(word)))
            parts.append("#")
            parts.append(word)

        return "".join(parts)

    def decode(self, s: str) -> list[str]:
        ans = []
        i = 0

        while i < len(s):
            j = i

            while s[j] != "#":
                j += 1

            length = int(s[i:j])
            start = j + 1
            end = start + length

            ans.append(s[start:end])
            i = end

        return ans

Code Explanation

During encoding, each string is written as:

str(len(word)) + "#" + word

Using a list and "".join(parts) avoids repeated string concatenation.

During decoding, i points to the beginning of the next length prefix.

j = i
while s[j] != "#":
    j += 1

This finds the end of the length prefix.

Then:

length = int(s[i:j])

converts the prefix to an integer.

The actual string starts after #:

start = j + 1
end = start + length

The decoder reads exactly that slice:

ans.append(s[start:end])

Then i moves to the next encoded item.

Testing

def run_tests():
    codec = Codec()

    cases = [
        ["Hello", "World"],
        [""],
        ["a#b", "12", ""],
        ["#", "##", "###"],
        ["0", "10#abc", "spaces are ok"],
        ["", "", ""],
    ]

    for strs in cases:
        assert codec.decode(codec.encode(strs)) == strs

    print("all tests passed")

run_tests()

Test meaning:

TestWhy
["Hello", "World"]Basic case
[""]Single empty string
["a#b", "12", ""]Delimiter and empty content
Strings containing only #Confirms delimiter is safe inside content
Digits and spacesConfirms length parsing is separate from content
Multiple empty stringsConfirms zero-length entries are preserved