A clear stack-based parser for validating nested XML-like tags with CDATA sections.
Problem Restatement
We are given a string called code.
We must determine whether it is a valid code snippet according to several XML-like rules.
The string may contain:
- Start tags:
<TAG>- End tags:
</TAG>- CDATA sections:
<![CDATA[ ... ]]>Rules:
- Tag names must contain only uppercase English letters.
- Tag name length must be between
1and9. - Tags must be properly nested.
- The entire code must be wrapped inside one valid closed tag.
- CDATA content is treated as plain text and ignored by the parser until
]]>.
Return True if the code is valid, otherwise return False. The official statement defines this as a parsing and stack problem with explicit rules for tags and CDATA handling. (leetcode.com, github.com)
Input and Output
| Item | Meaning |
|---|---|
code | XML-like string |
| Output | True if valid, otherwise False |
Example function shape:
def isValid(code: str) -> bool:
...Examples
Example 1:
code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>"Output:
TrueThe outer tag is DIV.
The CDATA section:
<![CDATA[<div>]]>is treated as plain text, so <div> inside it does not need validation.
Example 2:
code = "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"Output:
TrueThe CDATA section safely contains symbols that would otherwise look invalid.
Example 3:
code = "<A> <B> </A> </B>"Output:
FalseThe tags are not properly nested.
<B> is opened inside <A>, but </A> closes first.
First Thought: Match Tags Like Parentheses
A natural idea is to treat tags like parentheses.
For example:
<A><B></B></A>behaves like:
( ( ) )This suggests using a stack.
When we see a start tag, push it.
When we see an end tag, it must match the top of the stack.
But there are extra complications:
- Tag names must be validated.
- CDATA must be skipped entirely.
- The whole string must be inside exactly one root tag.
So we need a careful parser.
Key Insight
We process the string from left to right.
At each position, there are four possibilities:
- CDATA section
- End tag
- Start tag
- Normal text
The stack stores currently open tags.
For example:
<DIV><A></A></DIV>processing becomes:
push DIV
push A
pop A
pop DIVAt the end, the stack must be empty.
Also, after the root tag closes, no extra characters are allowed.
Tag Name Validation
A valid tag name:
- Has length between
1and9. - Contains only uppercase English letters.
So:
<A>
<DIV>
<ABCDEFGH>are valid.
But:
<>
<a>
<LONGTAGNAME>
<A1>are invalid.
We can validate using:
name.isupper() and name.isalpha()and length checks.
CDATA Handling
CDATA begins with:
<![CDATA[and ends with:
]]>Inside CDATA, every character is ignored.
For example:
<![CDATA[</INVALID><TAG>]]>is still valid CDATA.
The parser should skip directly to the closing ]]>.
Important rule:
CDATA is only allowed inside an open tag.
So:
<![CDATA[x]]>alone is invalid.
Algorithm
Use a stack and scan the string with index i.
While i < len(code):
If we see
<![CDATA[:- The stack must not be empty.
- Find the next
]]>. - If missing, return
False. - Skip the entire CDATA section.
Else if we see
</:- Find the next
>. - Extract the tag name.
- Validate the tag name.
- The stack top must match.
- Pop the stack.
- If the stack becomes empty before the end of the string, return
False.
- Find the next
Else if we see
<:- Find the next
>. - Extract the tag name.
- Validate the tag name.
- Push the tag name onto the stack.
- Find the next
Else:
- Normal text is allowed only inside an open tag.
- If the stack is empty, return
False.
At the end, return whether the stack is empty.
Correctness
The stack always stores exactly the currently open tags that have not yet been closed.
When a valid start tag is encountered, its tag name is pushed onto the stack. Therefore, the stack correctly tracks nesting depth.
When an end tag is encountered, the algorithm checks whether the tag name matches the top of the stack. If it does not match, the nesting structure is invalid, so returning False is correct. If it matches, popping the stack correctly closes that tag.
CDATA sections are skipped entirely between <![CDATA[ and ]]>, so any characters inside CDATA are ignored exactly as required by the problem statement.
Text outside tags is only allowed when some tag is currently open. Therefore, if the stack is empty while processing plain text, the code is invalid.
The requirement that the entire string must be wrapped in exactly one root tag is enforced by checking that once the stack becomes empty, the parser must already be at the end of the string.
At the end of parsing, the stack is empty exactly when every opened tag has been properly closed.
Therefore, the algorithm returns True exactly for valid code snippets.
Complexity
Let:
n = len(code)| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each character is processed at most a constant number of times |
| Space | O(n) | The stack may contain nested tags |
Implementation
class Solution:
def isValid(self, code: str) -> bool:
stack = []
n = len(code)
i = 0
while i < n:
if i > 0 and not stack:
return False
if code.startswith("<![CDATA[", i):
if not stack:
return False
j = code.find("]]>", i)
if j == -1:
return False
i = j + 3
elif code.startswith("</", i):
j = code.find(">", i)
if j == -1:
return False
tag = code[i + 2:j]
if (
len(tag) < 1
or len(tag) > 9
or not tag.isupper()
or not tag.isalpha()
):
return False
if not stack or stack[-1] != tag:
return False
stack.pop()
i = j + 1
elif code[i] == "<":
j = code.find(">", i)
if j == -1:
return False
tag = code[i + 1:j]
if (
len(tag) < 1
or len(tag) > 9
or not tag.isupper()
or not tag.isalpha()
):
return False
stack.append(tag)
i = j + 1
else:
if not stack:
return False
i += 1
return not stackCode Explanation
The stack stores currently open tags:
stack = []This rule enforces one outer root tag:
if i > 0 and not stack:
return FalseIf the stack becomes empty before the end of the string, extra content exists outside the root tag.
CDATA detection:
if code.startswith("<![CDATA[", i):CDATA is only valid inside a tag:
if not stack:
return FalseThen we skip directly to the next ]]>:
j = code.find("]]>", i)End tag parsing:
elif code.startswith("</", i):The tag name is extracted:
tag = code[i + 2:j]This validates the tag format:
len(tag) < 1
or len(tag) > 9
or not tag.isupper()
or not tag.isalpha()Then the top stack tag must match:
if not stack or stack[-1] != tag:
return FalseStart tags work similarly:
elif code[i] == "<":Valid tags are pushed:
stack.append(tag)Finally:
return not stackAll tags must be properly closed.
Testing
def run_tests():
s = Solution()
assert s.isValid(
"<DIV>This is the first line <![CDATA[<div>]]></DIV>"
) is True
assert s.isValid(
"<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
) is True
assert s.isValid(
"<A> <B> </A> </B>"
) is False
assert s.isValid(
"<DIV><A></A></DIV>"
) is True
assert s.isValid(
"<DIV><A></DIV></A>"
) is False
assert s.isValid(
"<![CDATA[test]]>"
) is False
assert s.isValid(
"<A></A><B></B>"
) is False
assert s.isValid(
"<LONGTAG></LONGTAG>"
) is False
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
| Valid CDATA example | Confirms CDATA skipping |
| Complex valid symbols | Confirms parser ignores CDATA content |
| Incorrect nesting | Detects mismatched close order |
| Proper nesting | Standard valid case |
| Wrong closing tag | Detects mismatch |
| CDATA outside root | Invalid |
| Multiple root tags | Invalid |
| Tag name longer than 9 | Invalid |