LeetCode 389: Find the Difference

Problem Restatement

We are given two strings s and t.

String t is created by shuffling s and adding one extra letter at a random position.

We need to return the added letter.

For example:

s = "abcd"
t = "abcde"

The added letter is:

"e"

The strings contain only lowercase English letters. Also, t always has exactly one more character than s.

Input and Output

Item	Meaning
Input	Two strings `s` and `t`
Output	The extra character added to `t`
Constraint	`0 <= s.length <= 1000`
Constraint	`t.length == s.length + 1`
Constraint	`s` and `t` contain lowercase English letters

Example function shape:

def findTheDifference(s: str, t: str) -> str:
    ...

Examples

Example 1:

s = "abcd"
t = "abcde"

The string t contains all letters from s, plus one extra e.

So the answer is:

"e"

Example 2:

s = ""
t = "y"

The original string is empty.

The only letter in t is the added letter.

So the answer is:

"y"

Example 3:

s = "a"
t = "aa"

The added letter is another a.

So the answer is:

"a"

First Thought: Sort Both Strings

Since t is a shuffled version of s with one added character, we can sort both strings.

Then compare them character by character.

Example:

s = "abcd"
t = "abcde"

Sorted:

s = "abcd"
t = "abcde"

Compare each position. The first mismatch gives the added character.

If there is no mismatch, the added character is the last character of t.

This works, but sorting costs more than we need.

Key Insight

Only one character appears one extra time in t.

So we can count how many times each character appears in s, then subtract characters from t.

The first character whose count becomes negative must be the added character.

Because the strings contain only lowercase English letters, a fixed array of length 26 is enough.

Algorithm

Create a count array:

count = [0] * 26

First, scan s.

For each character, add one to its count.

Then scan t.

For each character:

Subtract one from its count.
If the count becomes negative, return that character.

Why negative?

Because t has one extra occurrence of some character. Once we consume more of that character than s had, we found the added letter.

Correctness

After scanning s, the array count stores the frequency of each character in s.

When scanning t, each decrement matches one character from t against the supply from s.

For every character that appears the same number of times in both strings, the count never needs to go below zero before all matching copies are consumed.

The added character appears once more in t than in s. When that extra occurrence is processed, its count becomes negative.

Since exactly one extra character exists, the first character that makes a count negative is exactly the added character.

Therefore the algorithm returns the correct answer.

Complexity

Let n = len(t).

Metric	Value	Why
Time	`O(n)`	We scan `s` and `t` once
Space	`O(1)`	The count array always has length `26`

Implementation

class Solution:
    def findTheDifference(self, s: str, t: str) -> str:
        count = [0] * 26

        for ch in s:
            idx = ord(ch) - ord("a")
            count[idx] += 1

        for ch in t:
            idx = ord(ch) - ord("a")
            count[idx] -= 1

            if count[idx] < 0:
                return ch

        return ""

Code Explanation

We create a fixed-size count array:

count = [0] * 26

Then count all characters in s:

for ch in s:
    idx = ord(ch) - ord("a")
    count[idx] += 1

The expression:

ord(ch) - ord("a")

maps lowercase letters into array indices:

'a' -> 0
'b' -> 1
...
'z' -> 25

Then we subtract characters from t:

for ch in t:
    idx = ord(ch) - ord("a")
    count[idx] -= 1

If a count becomes negative:

if count[idx] < 0:
    return ch

then t contains one more copy of that character than s.

The final return is only for safety. Valid input always has an answer.

Alternative: XOR

There is also a compact bit manipulation solution.

XOR has two useful properties:

x ^ x = 0
x ^ 0 = x

If we XOR every character code from both strings, all matching characters cancel out. The only value left is the added character.

class Solution:
    def findTheDifference(self, s: str, t: str) -> str:
        code = 0

        for ch in s:
            code ^= ord(ch)

        for ch in t:
            code ^= ord(ch)

        return chr(code)

This also runs in O(n) time and uses O(1) space.

Testing

def test_solution():
    s = Solution()

    assert s.findTheDifference("abcd", "abcde") == "e"
    assert s.findTheDifference("", "y") == "y"
    assert s.findTheDifference("a", "aa") == "a"
    assert s.findTheDifference("ae", "aea") == "a"
    assert s.findTheDifference("xyz", "xzyz") == "z"

    print("all tests passed")

test_solution()

Test meaning:

Test	Why
`"abcd"`, `"abcde"`	Basic added character
`""`, `"y"`	Empty original string
`"a"`, `"aa"`	Added duplicate character
`"ae"`, `"aea"`	Extra character appears before the end
`"xyz"`, `"xzyz"`	Shuffled order with repeated extra character