A clear dynamic programming solution for finding the minimum deletions needed to make two strings equal.
Problem Restatement
We are given two strings, word1 and word2.
In one step, we may delete exactly one character from either string.
We need to return the minimum number of deletion steps required to make the two strings equal. The strings contain only lowercase English letters, and each length is between 1 and 500.
Input and Output
| Item | Meaning |
|---|---|
word1 | First lowercase English string |
word2 | Second lowercase English string |
| Output | Minimum number of deletions needed to make both strings equal |
Example function shape:
def minDistance(word1: str, word2: str) -> int:
...Examples
Example 1:
word1 = "sea"
word2 = "eat"Output:
2One optimal result is to make both strings equal to "ea".
Delete "s" from "sea":
"sea" -> "ea"Delete "t" from "eat":
"eat" -> "ea"So the answer is 2.
Example 2:
word1 = "leetcode"
word2 = "etco"Output:
4One longest sequence that can be kept in both strings is "etco". We delete the other four characters from "leetcode".
First Thought: Try All Deletions
A brute force solution could try every possible way to delete characters from both strings.
For each character, we might keep it or delete it. This creates many possible subsequences.
Then we could find a common remaining string with the fewest deletions.
This is too slow because the number of subsequences grows exponentially with string length.
Key Insight
After deleting characters from both strings, the final equal string must be a common subsequence of word1 and word2.
To minimize deletions, we should keep the longest possible common subsequence.
Let:
L = length of the longest common subsequenceThen:
deletions from word1 = len(word1) - L
deletions from word2 = len(word2) - LSo the answer is:
len(word1) + len(word2) - 2 * LThis means the problem can be solved by computing the Longest Common Subsequence length.
Dynamic Programming Definition
Let:
dp[i][j]be the length of the longest common subsequence between:
word1[0:i]
word2[0:j]That means dp[i][j] only looks at the first i characters of word1 and the first j characters of word2.
The answer for the LCS length is:
dp[m][n]where:
m = len(word1)
n = len(word2)Recurrence
If the last characters match:
word1[i - 1] == word2[j - 1]then we can keep that character in both strings:
dp[i][j] = dp[i - 1][j - 1] + 1If the last characters do not match, then the LCS must skip one of them:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])The base cases are:
dp[0][j] = 0
dp[i][0] = 0An empty string has no common subsequence with any prefix except the empty subsequence.
Algorithm
- Let
m = len(word1)andn = len(word2). - Create a
(m + 1) x (n + 1)DP table filled with0. - For each
ifrom1tom:- For each
jfrom1ton:- If
word1[i - 1] == word2[j - 1], setdp[i][j] = dp[i - 1][j - 1] + 1. - Otherwise, set
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]).
- If
- For each
- Let
lcs = dp[m][n]. - Return
m + n - 2 * lcs.
Correctness
After all deletions, both strings must become the same string. Since this final string is obtained by deleting characters without changing the order of the remaining characters, it must be a subsequence of word1 and also a subsequence of word2. Therefore, it must be a common subsequence.
To minimize deletions, we must maximize the number of characters kept. The maximum number of characters that can be kept in both strings is exactly the length of the longest common subsequence.
The DP table correctly computes this length. For each prefix pair, if the last characters match, any common subsequence can include that shared character after an optimal subsequence of the smaller prefixes. If the last characters do not match, an optimal subsequence must skip the last character of word1 or the last character of word2, so taking the maximum of those two cases is correct.
Once the LCS length is lcs, word1 must delete m - lcs characters and word2 must delete n - lcs characters. Thus the total number of deletions is m + n - 2 * lcs.
Therefore, the algorithm returns the minimum number of deletions needed to make the strings equal.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(m * n) | We fill one DP cell for each pair of prefixes |
| Space | O(m * n) | The full DP table stores (m + 1) * (n + 1) values |
Implementation
class Solution:
def minDistance(self, word1: str, word2: str) -> int:
m = len(word1)
n = len(word2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if word1[i - 1] == word2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
lcs = dp[m][n]
return m + n - 2 * lcsCode Explanation
We first store the string lengths:
m = len(word1)
n = len(word2)Then we create the DP table:
dp = [[0] * (n + 1) for _ in range(m + 1)]The extra row and column represent empty prefixes.
This loop fills the table:
for i in range(1, m + 1):
for j in range(1, n + 1):The current characters are:
word1[i - 1]
word2[j - 1]If they match, we extend the LCS from the smaller prefixes:
dp[i][j] = dp[i - 1][j - 1] + 1If they do not match, we skip one side:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])Finally:
lcs = dp[m][n]
return m + n - 2 * lcsThe formula deletes everything outside the longest common subsequence.
Space-Optimized Implementation
We only need the previous DP row and the current DP row.
class Solution:
def minDistance(self, word1: str, word2: str) -> int:
m = len(word1)
n = len(word2)
prev = [0] * (n + 1)
for i in range(1, m + 1):
curr = [0] * (n + 1)
for j in range(1, n + 1):
if word1[i - 1] == word2[j - 1]:
curr[j] = prev[j - 1] + 1
else:
curr[j] = max(prev[j], curr[j - 1])
prev = curr
lcs = prev[n]
return m + n - 2 * lcsSpace complexity becomes O(n).
Testing
def run_tests():
s = Solution()
assert s.minDistance("sea", "eat") == 2
assert s.minDistance("leetcode", "etco") == 4
assert s.minDistance("a", "a") == 0
assert s.minDistance("a", "b") == 2
assert s.minDistance("abc", "def") == 6
assert s.minDistance("abcde", "ace") == 2
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
"sea", "eat" | Main sample |
"leetcode", "etco" | Main sample with longer first string |
"a", "a" | Already equal |
"a", "b" | Both characters must be deleted |
"abc", "def" | No common characters |
"abcde", "ace" | One string already contains the LCS |