A clear explanation of finding the shortest abbreviation that does not conflict with any dictionary word using bit masks.
Problem Restatement
We are given a target string and a dictionary of strings.
We need to find the shortest abbreviation of target that does not conflict with any abbreviation of any word in the dictionary.
An abbreviation may replace substrings with their lengths. Each number or letter in the abbreviation counts as length 1. For example, "a32bc" has abbreviation length 4. If multiple shortest answers exist, any one may be returned. The constraints include target.length <= 21, dictionary.length <= 1000, and log2(n) + m <= 20, where m is the target length and n is the dictionary size.
Input and Output
| Item | Meaning |
|---|---|
| Input | target: str, dictionary: list[str] |
| Output | The shortest valid unique abbreviation of target |
| Conflict | An abbreviation conflicts if it can also abbreviate a dictionary word |
| Tie rule | Return any shortest valid answer |
| Abbreviation length | Each letter or number group counts as 1 |
Example function shape:
def minAbbreviation(target: str, dictionary: list[str]) -> str:
...Examples
Example:
target = "apple"
dictionary = ["blade"]The answer can be:
"a4"Why not "5"?
Because "5" abbreviates any 5-letter word, including both "apple" and "blade".
Why not "4e"?
Because "4e" also matches "blade".
But "a4" requires the first character to be "a". The word "blade" starts with "b", so "a4" does not conflict.
First Thought: Generate All Abbreviations
Since target.length <= 21, we can represent every abbreviation pattern with a bit mask.
For each character position:
| Bit | Meaning |
|---|---|
1 | Keep this character |
0 | Abbreviate this character as part of a number |
For:
target = "apple"A mask can describe whether we keep or abbreviate each character.
For example:
10000means:
keep 'a', abbreviate the next 4 characterswhich gives:
"a4"So one direct solution is:
- Generate every mask.
- Convert the mask into an abbreviation.
- Check if the abbreviation conflicts with dictionary words.
- Keep the shortest valid abbreviation.
Key Insight
Only dictionary words with the same length as target matter.
If a dictionary word has a different length, it cannot share an abbreviation with target that consumes the full word length.
So we filter:
word for word in dictionary if len(word) == len(target)Now we need to know whether an abbreviation mask distinguishes target from a dictionary word.
For each dictionary word, build a difference mask.
A bit is 1 where the dictionary word differs from target.
Example:
target = "apple"
word = "blade"Compare each position:
a vs b -> different
p vs l -> different
p vs a -> different
l vs d -> different
e vs e -> sameDifference mask:
11110For an abbreviation to avoid conflict with this word, it must keep at least one position where target and the dictionary word differ.
In mask terms:
candidate_mask & diff_mask != 0If this is true for every same-length dictionary word, the abbreviation is unique.
Algorithm
- Let
m = len(target). - Build
diff_masksfor dictionary words with lengthm. - If
diff_masksis empty, returnstr(m). - Enumerate all masks from
0to2^m - 1. - For each mask:
- Check whether it distinguishes
targetfrom every same-length dictionary word. - Compute its abbreviation length.
- Track the best shortest valid mask.
- Check whether it distinguishes
- Convert the best mask into an abbreviation string.
- Return it.
Correctness
For a dictionary word with the same length as target, an abbreviation conflicts if every kept letter position matches between target and that word.
A candidate abbreviation keeps exactly the positions marked 1 in its mask.
The dictionary word differs from target exactly at positions marked 1 in its difference mask.
Therefore, the abbreviation avoids conflict with that word exactly when:
candidate_mask & diff_mask != 0This means the abbreviation keeps at least one character position where target and the dictionary word differ.
The algorithm checks this condition for every same-length dictionary word. So every accepted candidate is unique.
The algorithm enumerates every possible abbreviation mask for target, so it considers every possible abbreviation pattern.
Among all valid masks, it chooses one with the smallest abbreviation length.
Therefore, the returned abbreviation is a shortest unique abbreviation of target.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(2^m * n * m) | We enumerate masks, check dictionary masks, and compute abbreviation lengths |
| Space | O(n) | We store difference masks |
Here:
| Symbol | Meaning |
|---|---|
m | Length of target |
n | Number of same-length dictionary words |
Since the problem bounds m and n so that exhaustive bit-mask search is feasible, this approach fits the intended constraints.
Implementation
from typing import List
class Solution:
def minAbbreviation(self, target: str, dictionary: List[str]) -> str:
m = len(target)
diff_masks = []
for word in dictionary:
if len(word) != m:
continue
mask = 0
for i, (a, b) in enumerate(zip(target, word)):
if a != b:
mask |= 1 << i
diff_masks.append(mask)
if not diff_masks:
return str(m)
def abbr_len(mask: int) -> int:
length = 0
count = 0
for i in range(m):
if mask & (1 << i):
if count > 0:
length += 1
count = 0
length += 1
else:
count += 1
if count > 0:
length += 1
return length
def build_abbr(mask: int) -> str:
parts = []
count = 0
for i in range(m):
if mask & (1 << i):
if count > 0:
parts.append(str(count))
count = 0
parts.append(target[i])
else:
count += 1
if count > 0:
parts.append(str(count))
return "".join(parts)
best_mask = 0
best_len = float("inf")
for mask in range(1 << m):
valid = True
for diff in diff_masks:
if mask & diff == 0:
valid = False
break
if not valid:
continue
length = abbr_len(mask)
if length < best_len:
best_len = length
best_mask = mask
return build_abbr(best_mask)Code Explanation
We first keep only dictionary words with the same length as target:
if len(word) != m:
continueThen we build a difference mask:
if a != b:
mask |= 1 << iA 1 bit means this position can distinguish target from that dictionary word.
If there are no same-length dictionary words, the shortest abbreviation is simply the whole length:
return str(m)For example, if target = "apple", then "5" cannot conflict with any word of a different length.
The helper abbr_len computes abbreviation length without constructing the abbreviation.
Consecutive abbreviated characters count as one number group:
count += 1When we meet a kept character, any pending number group contributes 1 to the abbreviation length:
if count > 0:
length += 1
count = 0Then the kept character also contributes 1.
The main loop checks every mask:
for mask in range(1 << m):A mask is valid only if it has at least one kept differing position for every dictionary word:
if mask & diff == 0:
valid = FalseAfter finding the shortest valid mask, we convert it into the final abbreviation string:
return build_abbr(best_mask)Testing
def valid_abbr(word: str, abbr: str) -> bool:
i = 0
j = 0
while i < len(word) and j < len(abbr):
if abbr[j].isdigit():
if abbr[j] == "0":
return False
value = 0
while j < len(abbr) and abbr[j].isdigit():
value = value * 10 + int(abbr[j])
j += 1
i += value
else:
if word[i] != abbr[j]:
return False
i += 1
j += 1
return i == len(word) and j == len(abbr)
def test_min_abbreviation():
s = Solution()
ans1 = s.minAbbreviation("apple", ["blade"])
assert len(ans1) == 2
assert valid_abbr("apple", ans1)
assert not valid_abbr("blade", ans1)
ans2 = s.minAbbreviation("apple", ["plain", "amber", "blade"])
assert valid_abbr("apple", ans2)
assert all(not valid_abbr(w, ans2) for w in ["plain", "amber", "blade"])
assert s.minAbbreviation("apple", ["banana"]) == "5"
ans4 = s.minAbbreviation("abcdef", ["abqdef"])
assert valid_abbr("abcdef", ans4)
assert not valid_abbr("abqdef", ans4)
print("all tests passed")Test Notes
| Test | Why |
|---|---|
"apple", ["blade"] | Standard case where "a4" is one valid shortest answer |
| Multiple same-length words | Checks uniqueness against every dictionary word |
| Different-length dictionary word | Shortest answer can be the full length |
| One-position difference | Forces the abbreviation to keep a distinguishing character |