LeetCode 583: Delete Operation for Two Strings

Problem Restatement

We are given two strings, word1 and word2.

In one step, we may delete exactly one character from either string.

We need to return the minimum number of deletion steps required to make the two strings equal. The strings contain only lowercase English letters, and each length is between 1 and 500.

Input and Output

Item	Meaning
`word1`	First lowercase English string
`word2`	Second lowercase English string
Output	Minimum number of deletions needed to make both strings equal

Example function shape:

def minDistance(word1: str, word2: str) -> int:
    ...

Examples

Example 1:

word1 = "sea"
word2 = "eat"

Output:

One optimal result is to make both strings equal to "ea".

Delete "s" from "sea":

"sea" -> "ea"

Delete "t" from "eat":

"eat" -> "ea"

So the answer is 2.

Example 2:

word1 = "leetcode"
word2 = "etco"

Output:

One longest sequence that can be kept in both strings is "etco". We delete the other four characters from "leetcode".

First Thought: Try All Deletions

A brute force solution could try every possible way to delete characters from both strings.

For each character, we might keep it or delete it. This creates many possible subsequences.

Then we could find a common remaining string with the fewest deletions.

This is too slow because the number of subsequences grows exponentially with string length.

Key Insight

After deleting characters from both strings, the final equal string must be a common subsequence of word1 and word2.

To minimize deletions, we should keep the longest possible common subsequence.

Let:

L = length of the longest common subsequence

Then:

deletions from word1 = len(word1) - L
deletions from word2 = len(word2) - L

So the answer is:

len(word1) + len(word2) - 2 * L

This means the problem can be solved by computing the Longest Common Subsequence length.

Dynamic Programming Definition

Let:

dp[i][j]

be the length of the longest common subsequence between:

word1[0:i]
word2[0:j]

That means dp[i][j] only looks at the first i characters of word1 and the first j characters of word2.

The answer for the LCS length is:

dp[m][n]

where:

m = len(word1)
n = len(word2)

Recurrence

If the last characters match:

word1[i - 1] == word2[j - 1]

then we can keep that character in both strings:

dp[i][j] = dp[i - 1][j - 1] + 1

If the last characters do not match, then the LCS must skip one of them:

dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

The base cases are:

dp[0][j] = 0
dp[i][0] = 0

An empty string has no common subsequence with any prefix except the empty subsequence.

Algorithm

Let m = len(word1) and n = len(word2).
Create a (m + 1) x (n + 1) DP table filled with 0.
For each i from 1 to m:
- For each j from 1 to n:
  - If word1[i - 1] == word2[j - 1], set dp[i][j] = dp[i - 1][j - 1] + 1.
  - Otherwise, set dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]).
Let lcs = dp[m][n].
Return m + n - 2 * lcs.

Correctness

After all deletions, both strings must become the same string. Since this final string is obtained by deleting characters without changing the order of the remaining characters, it must be a subsequence of word1 and also a subsequence of word2. Therefore, it must be a common subsequence.

To minimize deletions, we must maximize the number of characters kept. The maximum number of characters that can be kept in both strings is exactly the length of the longest common subsequence.

The DP table correctly computes this length. For each prefix pair, if the last characters match, any common subsequence can include that shared character after an optimal subsequence of the smaller prefixes. If the last characters do not match, an optimal subsequence must skip the last character of word1 or the last character of word2, so taking the maximum of those two cases is correct.

Once the LCS length is lcs, word1 must delete m - lcs characters and word2 must delete n - lcs characters. Thus the total number of deletions is m + n - 2 * lcs.

Therefore, the algorithm returns the minimum number of deletions needed to make the strings equal.

Complexity

Metric	Value	Why
Time	`O(m * n)`	We fill one DP cell for each pair of prefixes
Space	`O(m * n)`	The full DP table stores `(m + 1) * (n + 1)` values

Implementation

class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        m = len(word1)
        n = len(word2)

        dp = [[0] * (n + 1) for _ in range(m + 1)]

        for i in range(1, m + 1):
            for j in range(1, n + 1):
                if word1[i - 1] == word2[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1] + 1
                else:
                    dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

        lcs = dp[m][n]

        return m + n - 2 * lcs

Code Explanation

We first store the string lengths:

m = len(word1)
n = len(word2)

Then we create the DP table:

dp = [[0] * (n + 1) for _ in range(m + 1)]

The extra row and column represent empty prefixes.

This loop fills the table:

for i in range(1, m + 1):
    for j in range(1, n + 1):

The current characters are:

word1[i - 1]
word2[j - 1]

If they match, we extend the LCS from the smaller prefixes:

dp[i][j] = dp[i - 1][j - 1] + 1

If they do not match, we skip one side:

dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

Finally:

lcs = dp[m][n]
return m + n - 2 * lcs

The formula deletes everything outside the longest common subsequence.

Space-Optimized Implementation

We only need the previous DP row and the current DP row.

class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        m = len(word1)
        n = len(word2)

        prev = [0] * (n + 1)

        for i in range(1, m + 1):
            curr = [0] * (n + 1)

            for j in range(1, n + 1):
                if word1[i - 1] == word2[j - 1]:
                    curr[j] = prev[j - 1] + 1
                else:
                    curr[j] = max(prev[j], curr[j - 1])

            prev = curr

        lcs = prev[n]

        return m + n - 2 * lcs

Space complexity becomes O(n).

Testing

def run_tests():
    s = Solution()

    assert s.minDistance("sea", "eat") == 2
    assert s.minDistance("leetcode", "etco") == 4
    assert s.minDistance("a", "a") == 0
    assert s.minDistance("a", "b") == 2
    assert s.minDistance("abc", "def") == 6
    assert s.minDistance("abcde", "ace") == 2

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
`"sea"`, `"eat"`	Main sample
`"leetcode"`, `"etco"`	Main sample with longer first string
`"a"`, `"a"`	Already equal
`"a"`, `"b"`	Both characters must be deleted
`"abc"`, `"def"`	No common characters
`"abcde"`, `"ace"`	One string already contains the LCS