A clear explanation of finding the extra character added to a shuffled string using counting and XOR.
Problem Restatement
We are given two strings s and t.
String t is created by shuffling s and adding one extra letter at a random position.
We need to return the added letter.
For example:
s = "abcd"
t = "abcde"The added letter is:
"e"The strings contain only lowercase English letters. Also, t always has exactly one more character than s.
Input and Output
| Item | Meaning |
|---|---|
| Input | Two strings s and t |
| Output | The extra character added to t |
| Constraint | 0 <= s.length <= 1000 |
| Constraint | t.length == s.length + 1 |
| Constraint | s and t contain lowercase English letters |
Example function shape:
def findTheDifference(s: str, t: str) -> str:
...Examples
Example 1:
s = "abcd"
t = "abcde"The string t contains all letters from s, plus one extra e.
So the answer is:
"e"Example 2:
s = ""
t = "y"The original string is empty.
The only letter in t is the added letter.
So the answer is:
"y"Example 3:
s = "a"
t = "aa"The added letter is another a.
So the answer is:
"a"First Thought: Sort Both Strings
Since t is a shuffled version of s with one added character, we can sort both strings.
Then compare them character by character.
Example:
s = "abcd"
t = "abcde"Sorted:
s = "abcd"
t = "abcde"Compare each position. The first mismatch gives the added character.
If there is no mismatch, the added character is the last character of t.
This works, but sorting costs more than we need.
Key Insight
Only one character appears one extra time in t.
So we can count how many times each character appears in s, then subtract characters from t.
The first character whose count becomes negative must be the added character.
Because the strings contain only lowercase English letters, a fixed array of length 26 is enough.
Algorithm
Create a count array:
count = [0] * 26First, scan s.
For each character, add one to its count.
Then scan t.
For each character:
- Subtract one from its count.
- If the count becomes negative, return that character.
Why negative?
Because t has one extra occurrence of some character. Once we consume more of that character than s had, we found the added letter.
Correctness
After scanning s, the array count stores the frequency of each character in s.
When scanning t, each decrement matches one character from t against the supply from s.
For every character that appears the same number of times in both strings, the count never needs to go below zero before all matching copies are consumed.
The added character appears once more in t than in s. When that extra occurrence is processed, its count becomes negative.
Since exactly one extra character exists, the first character that makes a count negative is exactly the added character.
Therefore the algorithm returns the correct answer.
Complexity
Let n = len(t).
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | We scan s and t once |
| Space | O(1) | The count array always has length 26 |
Implementation
class Solution:
def findTheDifference(self, s: str, t: str) -> str:
count = [0] * 26
for ch in s:
idx = ord(ch) - ord("a")
count[idx] += 1
for ch in t:
idx = ord(ch) - ord("a")
count[idx] -= 1
if count[idx] < 0:
return ch
return ""Code Explanation
We create a fixed-size count array:
count = [0] * 26Then count all characters in s:
for ch in s:
idx = ord(ch) - ord("a")
count[idx] += 1The expression:
ord(ch) - ord("a")maps lowercase letters into array indices:
'a' -> 0
'b' -> 1
...
'z' -> 25Then we subtract characters from t:
for ch in t:
idx = ord(ch) - ord("a")
count[idx] -= 1If a count becomes negative:
if count[idx] < 0:
return chthen t contains one more copy of that character than s.
The final return is only for safety. Valid input always has an answer.
Alternative: XOR
There is also a compact bit manipulation solution.
XOR has two useful properties:
x ^ x = 0
x ^ 0 = xIf we XOR every character code from both strings, all matching characters cancel out. The only value left is the added character.
class Solution:
def findTheDifference(self, s: str, t: str) -> str:
code = 0
for ch in s:
code ^= ord(ch)
for ch in t:
code ^= ord(ch)
return chr(code)This also runs in O(n) time and uses O(1) space.
Testing
def test_solution():
s = Solution()
assert s.findTheDifference("abcd", "abcde") == "e"
assert s.findTheDifference("", "y") == "y"
assert s.findTheDifference("a", "aa") == "a"
assert s.findTheDifference("ae", "aea") == "a"
assert s.findTheDifference("xyz", "xzyz") == "z"
print("all tests passed")
test_solution()Test meaning:
| Test | Why |
|---|---|
"abcd", "abcde" | Basic added character |
"", "y" | Empty original string |
"a", "aa" | Added duplicate character |
"ae", "aea" | Extra character appears before the end |
"xyz", "xzyz" | Shuffled order with repeated extra character |