A clear explanation of the Encode and Decode Strings problem using length-prefix encoding.
Problem Restatement
We need to design two methods:
encode(strs)
decode(s)The encode method receives a list of strings and converts it into one string.
The decode method receives that encoded string and reconstructs the original list.
We cannot use serialization helpers such as eval.
The strings may contain any of the 256 valid ASCII characters, including delimiters, digits, spaces, and empty strings. The constraints are 1 <= strs.length <= 200 and 0 <= strs[i].length <= 200.
Input and Output
| Method | Input | Output |
|---|---|---|
encode | List of strings | One encoded string |
decode | One encoded string | Original list of strings |
Example class shape:
class Codec:
def encode(self, strs: list[str]) -> str:
...
def decode(self, s: str) -> list[str]:
...Examples
Example 1:
strs = ["Hello", "World"]One valid encoding is:
"5#Hello5#World"Decode process:
5#Hello -> "Hello"
5#World -> "World"Answer after decoding:
["Hello", "World"]Example 2:
strs = [""]The empty string has length 0.
Encoding:
"0#"Decoding returns:
[""]Example 3:
strs = ["a#b", "12", ""]A delimiter-only solution would be risky because strings may contain #.
Length-prefix encoding handles this safely:
"3#a#b2#120#"Decode result:
["a#b", "12", ""]First Thought: Join With a Delimiter
A simple idea is to join strings with a separator:
"#".join(strs)For example:
["Hello", "World"]becomes:
"Hello#World"But this fails if a string itself contains #.
For example:
["a#b", "c"]would become:
"a#b#c"When decoding, we cannot know whether this means:
["a", "b", "c"]or:
["a#b", "c"]So a plain delimiter is not enough.
Key Insight
Each encoded string should say how long the next original string is.
Use this format:
length + "#" + stringFor example:
"Hello"has length 5, so it becomes:
"5#Hello"The decoder reads digits until #, converts them to a length, then reads exactly that many characters.
This works even if the string contains #, digits, or spaces, because the decoder does not search inside the content. It trusts the length.
Algorithm
For encoding:
- Create an empty list
parts. - For each string
word:- Append
str(len(word)). - Append
"#". - Append
word.
- Append
- Join all parts into one string.
For decoding:
- Start at index
i = 0. - Read characters until
#. These characters form the length. - Convert the length to an integer.
- Read exactly that many characters after
#. - Append that substring to the answer.
- Continue until the encoded string is fully consumed.
Walkthrough
Use:
strs = ["a#b", "12", ""]Encoding:
| String | Length | Encoded piece |
|---|---|---|
"a#b" | 3 | "3#a#b" |
"12" | 2 | "2#12" |
"" | 0 | "0#" |
Full encoded string:
"3#a#b2#120#"Decoding starts at index 0.
Read length:
3Skip #.
Read next 3 characters:
"a#b"Next index points to:
"2#120#"Read length 2, then read:
"12"Next read length 0, then read no characters.
Result:
["a#b", "12", ""]Correctness
Each encoded item contains the exact length of the corresponding original string before the string content.
During decoding, the algorithm first reads this length prefix. Then it reads exactly that many characters as the next string.
Because the number of characters to read is known, characters inside the string content cannot confuse the decoder. The content may contain #, digits, spaces, or empty text, but it is read by length rather than by delimiter search.
Encoding writes the strings in order, and decoding consumes them in the same order. Therefore each decoded string equals the corresponding original string.
Thus decode(encode(strs)) always returns strs.
Complexity
Let m be the sum of the lengths of all strings, and let n be the number of strings.
| Method | Time | Space |
|---|---|---|
encode | O(m + n) | O(m + n) |
decode | O(m + n) | O(m + n) |
The encoded string stores every original character plus length metadata.
Implementation
class Codec:
def encode(self, strs: list[str]) -> str:
parts = []
for word in strs:
parts.append(str(len(word)))
parts.append("#")
parts.append(word)
return "".join(parts)
def decode(self, s: str) -> list[str]:
ans = []
i = 0
while i < len(s):
j = i
while s[j] != "#":
j += 1
length = int(s[i:j])
start = j + 1
end = start + length
ans.append(s[start:end])
i = end
return ansCode Explanation
During encoding, each string is written as:
str(len(word)) + "#" + wordUsing a list and "".join(parts) avoids repeated string concatenation.
During decoding, i points to the beginning of the next length prefix.
j = i
while s[j] != "#":
j += 1This finds the end of the length prefix.
Then:
length = int(s[i:j])converts the prefix to an integer.
The actual string starts after #:
start = j + 1
end = start + lengthThe decoder reads exactly that slice:
ans.append(s[start:end])Then i moves to the next encoded item.
Testing
def run_tests():
codec = Codec()
cases = [
["Hello", "World"],
[""],
["a#b", "12", ""],
["#", "##", "###"],
["0", "10#abc", "spaces are ok"],
["", "", ""],
]
for strs in cases:
assert codec.decode(codec.encode(strs)) == strs
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
["Hello", "World"] | Basic case |
[""] | Single empty string |
["a#b", "12", ""] | Delimiter and empty content |
Strings containing only # | Confirms delimiter is safe inside content |
| Digits and spaces | Confirms length parsing is separate from content |
| Multiple empty strings | Confirms zero-length entries are preserved |