A clear explanation of the Alien Dictionary problem using graph construction and topological sorting.
Problem Restatement
We are given a list of words sorted according to an unknown alien alphabet.
We need to infer one possible order of the alien alphabet.
Return a string containing all unique letters in a valid order.
If no valid order exists, return:
""If multiple valid orders exist, any one of them is accepted.
The constraints are 1 <= words.length <= 100, 1 <= words[i].length <= 100, and each word contains only lowercase English letters. The official examples include ["wrt","wrf","er","ett","rftt"] -> "wertf", ["z","x"] -> "zx", and ["z","x","z"] -> "".
Input and Output
| Item | Meaning |
|---|---|
| Input | A list of words sorted by an alien alphabet |
| Output | One valid character order |
| Invalid case | Return "" |
| Characters | Unique lowercase letters that appear in words |
Example function shape:
def alienOrder(words: list[str]) -> str:
...Examples
Example 1:
words = ["wrt", "wrf", "er", "ett", "rftt"]Compare adjacent words.
| Pair | First difference | Rule |
|---|---|---|
"wrt", "wrf" | t vs f | t before f |
"wrf", "er" | w vs e | w before e |
"er", "ett" | r vs t | r before t |
"ett", "rftt" | e vs r | e before r |
These rules form:
w -> e -> r -> t -> fAnswer:
"wertf"Example 2:
words = ["z", "x"]The first word comes before the second word, so:
z -> xAnswer:
"zx"Example 3:
words = ["z", "x", "z"]From "z" before "x":
z -> xFrom "x" before "z":
x -> zThis creates a cycle.
No valid alphabet order exists.
Answer:
""First Thought: Compare All Words
A tempting idea is to compare every pair of words and collect ordering rules.
But only adjacent words are needed.
The input is already sorted. In a sorted list, each adjacent pair gives the strongest local constraint. Comparing non-adjacent pairs can add redundant information and can make reasoning harder.
So we compare:
words[i]
words[i + 1]for every adjacent pair.
Key Insight
When two sorted words differ, the first differing character determines the alphabet order.
For example:
"wrt"
"wrf"The first two characters are the same:
w == w
r == rThe first difference is:
t != fSince "wrt" appears before "wrf", we know:
t -> fThis is a directed graph edge.
After collecting all such edges, the problem becomes:
Return a topological ordering of the graph.
If the graph has a cycle, no valid ordering exists.
Invalid Prefix Case
There is one special invalid case.
If a longer word appears before its own prefix, the order is impossible.
Example:
words = ["abc", "ab"]No alphabet order can make "abc" come before "ab" when "ab" is its prefix.
So we must return:
""This case happens when:
len(first) > len(second)and no differing character is found.
Algorithm
- Create a graph node for every unique character in
words. - Compare every adjacent word pair.
- For each pair:
- Find the first differing character.
- Add a directed edge from the first word’s character to the second word’s character.
- If no difference exists and the first word is longer, return
"".
- Compute indegrees for every character.
- Put all characters with indegree
0into a queue. - Repeatedly:
- Pop a character from the queue.
- Append it to the answer.
- Decrease indegree of its neighbors.
- If a neighbor reaches indegree
0, push it into the queue.
- If the answer length equals the number of unique characters, return it.
- Otherwise, return
""because a cycle exists.
Walkthrough
Use:
words = ["wrt", "wrf", "er", "ett", "rftt"]Unique characters:
w, r, t, f, eBuild edges:
t -> f
w -> e
r -> t
e -> rIndegrees:
| Character | Indegree |
|---|---|
w | 0 |
e | 1 |
r | 1 |
t | 1 |
f | 1 |
Queue starts with:
["w"]Process w.
Answer:
"w"Decrease indegree of e to 0, so push e.
Process e.
Answer:
"we"Decrease indegree of r to 0.
Process r.
Answer:
"wer"Decrease indegree of t to 0.
Process t.
Answer:
"wert"Decrease indegree of f to 0.
Process f.
Answer:
"wertf"All characters are processed.
Return:
"wertf"Correctness
The graph contains one node for every character that appears in the input. Therefore every possible output character is represented.
For every adjacent pair of words, the first differing character is the only character comparison that determines their relative order. Adding an edge from the first word’s character to the second word’s character captures exactly that required constraint.
If a longer word appears before its own prefix, no lexicographic alphabet can satisfy the input order, so returning "" is correct.
After the graph is built, any valid alien alphabet must place every edge source before its destination. This is exactly the definition of a topological ordering.
Kahn’s algorithm repeatedly selects characters with no remaining prerequisites. When a character is removed, all of its outgoing constraints are satisfied, so reducing its neighbors’ indegrees preserves the remaining dependency structure.
If all characters are processed, the produced order satisfies every edge and is a valid alien alphabet order.
If some characters remain unprocessed, every remaining character depends on another remaining character. That means a cycle exists, so no valid order exists.
Therefore the algorithm returns a valid order exactly when one exists.
Complexity
Let C be the total number of characters across all words, and let U be the number of unique letters.
| Metric | Value | Why |
|---|---|---|
| Time | O(C + U) | We scan words, build edges, and topologically sort characters |
| Space | O(U + E) | Graph stores unique characters and ordering edges |
Since the alphabet is lowercase English letters, U <= 26.
Implementation
from collections import defaultdict, deque
from typing import List
class Solution:
def alienOrder(self, words: List[str]) -> str:
graph = {ch: set() for word in words for ch in word}
indegree = {ch: 0 for ch in graph}
for first, second in zip(words, words[1:]):
min_len = min(len(first), len(second))
found_difference = False
for i in range(min_len):
if first[i] != second[i]:
a = first[i]
b = second[i]
if b not in graph[a]:
graph[a].add(b)
indegree[b] += 1
found_difference = True
break
if not found_difference and len(first) > len(second):
return ""
queue = deque(ch for ch in graph if indegree[ch] == 0)
order = []
while queue:
ch = queue.popleft()
order.append(ch)
for nei in graph[ch]:
indegree[nei] -= 1
if indegree[nei] == 0:
queue.append(nei)
if len(order) != len(graph):
return ""
return "".join(order)Code Explanation
Create one graph node for every character:
graph = {ch: set() for word in words for ch in word}Initialize every indegree to zero:
indegree = {ch: 0 for ch in graph}Compare adjacent words:
for first, second in zip(words, words[1:]):Only the first differing character matters:
if first[i] != second[i]:Add an edge:
graph[a].add(b)
indegree[b] += 1The set prevents duplicate edges from increasing indegree multiple times.
If no difference exists and the first word is longer:
if not found_difference and len(first) > len(second):
return ""the input violates lexicographic order.
The queue starts with all characters that have no prerequisites:
queue = deque(ch for ch in graph if indegree[ch] == 0)Each popped character is appended to the answer.
When a neighbor’s indegree becomes zero, it is ready to be processed.
Finally:
if len(order) != len(graph):
return ""detects cycles.
Testing
def is_valid_order(order: str, words: list[str]) -> bool:
if order == "":
return False
rank = {ch: i for i, ch in enumerate(order)}
for first, second in zip(words, words[1:]):
min_len = min(len(first), len(second))
valid_pair = False
for i in range(min_len):
if first[i] != second[i]:
if rank[first[i]] > rank[second[i]]:
return False
valid_pair = True
break
if not valid_pair and len(first) > len(second):
return False
return True
def run_tests():
s = Solution()
words = ["wrt", "wrf", "er", "ett", "rftt"]
assert is_valid_order(s.alienOrder(words), words)
assert s.alienOrder(["z", "x"]) == "zx"
assert s.alienOrder(["z", "x", "z"]) == ""
assert s.alienOrder(["abc", "ab"]) == ""
words = ["ab", "adc"]
assert is_valid_order(s.alienOrder(words), words)
words = ["a"]
assert s.alienOrder(words) == "a"
words = ["za", "zb", "ca", "cb"]
assert is_valid_order(s.alienOrder(words), words)
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
| Standard example | Builds a full dependency chain |
["z", "x"] | Simple one-edge graph |
["z", "x", "z"] | Cycle detection |
["abc", "ab"] | Invalid prefix case |
["ab", "adc"] | Partial order with free characters |
| Single word | Any order of its characters is valid |
| Multiple valid orders | Confirms checker accepts any valid topological order |