Skip to content

LeetCode 591: Tag Validator

A clear stack-based parser for validating nested XML-like tags with CDATA sections.

Problem Restatement

We are given a string called code.

We must determine whether it is a valid code snippet according to several XML-like rules.

The string may contain:

  1. Start tags:
<TAG>
  1. End tags:
</TAG>
  1. CDATA sections:
<![CDATA[ ... ]]>

Rules:

  • Tag names must contain only uppercase English letters.
  • Tag name length must be between 1 and 9.
  • Tags must be properly nested.
  • The entire code must be wrapped inside one valid closed tag.
  • CDATA content is treated as plain text and ignored by the parser until ]]>.

Return True if the code is valid, otherwise return False. The official statement defines this as a parsing and stack problem with explicit rules for tags and CDATA handling. (leetcode.com, github.com)

Input and Output

ItemMeaning
codeXML-like string
OutputTrue if valid, otherwise False

Example function shape:

def isValid(code: str) -> bool:
    ...

Examples

Example 1:

code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>"

Output:

True

The outer tag is DIV.

The CDATA section:

<![CDATA[<div>]]>

is treated as plain text, so <div> inside it does not need validation.

Example 2:

code = "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"

Output:

True

The CDATA section safely contains symbols that would otherwise look invalid.

Example 3:

code = "<A>  <B> </A>   </B>"

Output:

False

The tags are not properly nested.

<B> is opened inside <A>, but </A> closes first.

First Thought: Match Tags Like Parentheses

A natural idea is to treat tags like parentheses.

For example:

<A><B></B></A>

behaves like:

( ( ) )

This suggests using a stack.

When we see a start tag, push it.

When we see an end tag, it must match the top of the stack.

But there are extra complications:

  1. Tag names must be validated.
  2. CDATA must be skipped entirely.
  3. The whole string must be inside exactly one root tag.

So we need a careful parser.

Key Insight

We process the string from left to right.

At each position, there are four possibilities:

  1. CDATA section
  2. End tag
  3. Start tag
  4. Normal text

The stack stores currently open tags.

For example:

<DIV><A></A></DIV>

processing becomes:

push DIV
push A
pop A
pop DIV

At the end, the stack must be empty.

Also, after the root tag closes, no extra characters are allowed.

Tag Name Validation

A valid tag name:

  1. Has length between 1 and 9.
  2. Contains only uppercase English letters.

So:

<A>
<DIV>
<ABCDEFGH>

are valid.

But:

<>
<a>
<LONGTAGNAME>
<A1>

are invalid.

We can validate using:

name.isupper() and name.isalpha()

and length checks.

CDATA Handling

CDATA begins with:

<![CDATA[

and ends with:

]]>

Inside CDATA, every character is ignored.

For example:

<![CDATA[</INVALID><TAG>]]>

is still valid CDATA.

The parser should skip directly to the closing ]]>.

Important rule:

CDATA is only allowed inside an open tag.

So:

<![CDATA[x]]>

alone is invalid.

Algorithm

Use a stack and scan the string with index i.

While i < len(code):

  1. If we see <![CDATA[:

    • The stack must not be empty.
    • Find the next ]]>.
    • If missing, return False.
    • Skip the entire CDATA section.
  2. Else if we see </:

    • Find the next >.
    • Extract the tag name.
    • Validate the tag name.
    • The stack top must match.
    • Pop the stack.
    • If the stack becomes empty before the end of the string, return False.
  3. Else if we see <:

    • Find the next >.
    • Extract the tag name.
    • Validate the tag name.
    • Push the tag name onto the stack.
  4. Else:

    • Normal text is allowed only inside an open tag.
    • If the stack is empty, return False.

At the end, return whether the stack is empty.

Correctness

The stack always stores exactly the currently open tags that have not yet been closed.

When a valid start tag is encountered, its tag name is pushed onto the stack. Therefore, the stack correctly tracks nesting depth.

When an end tag is encountered, the algorithm checks whether the tag name matches the top of the stack. If it does not match, the nesting structure is invalid, so returning False is correct. If it matches, popping the stack correctly closes that tag.

CDATA sections are skipped entirely between <![CDATA[ and ]]>, so any characters inside CDATA are ignored exactly as required by the problem statement.

Text outside tags is only allowed when some tag is currently open. Therefore, if the stack is empty while processing plain text, the code is invalid.

The requirement that the entire string must be wrapped in exactly one root tag is enforced by checking that once the stack becomes empty, the parser must already be at the end of the string.

At the end of parsing, the stack is empty exactly when every opened tag has been properly closed.

Therefore, the algorithm returns True exactly for valid code snippets.

Complexity

Let:

n = len(code)
MetricValueWhy
TimeO(n)Each character is processed at most a constant number of times
SpaceO(n)The stack may contain nested tags

Implementation

class Solution:
    def isValid(self, code: str) -> bool:
        stack = []
        n = len(code)
        i = 0

        while i < n:
            if i > 0 and not stack:
                return False

            if code.startswith("<![CDATA[", i):
                if not stack:
                    return False

                j = code.find("]]>", i)

                if j == -1:
                    return False

                i = j + 3

            elif code.startswith("</", i):
                j = code.find(">", i)

                if j == -1:
                    return False

                tag = code[i + 2:j]

                if (
                    len(tag) < 1
                    or len(tag) > 9
                    or not tag.isupper()
                    or not tag.isalpha()
                ):
                    return False

                if not stack or stack[-1] != tag:
                    return False

                stack.pop()

                i = j + 1

            elif code[i] == "<":
                j = code.find(">", i)

                if j == -1:
                    return False

                tag = code[i + 1:j]

                if (
                    len(tag) < 1
                    or len(tag) > 9
                    or not tag.isupper()
                    or not tag.isalpha()
                ):
                    return False

                stack.append(tag)

                i = j + 1

            else:
                if not stack:
                    return False

                i += 1

        return not stack

Code Explanation

The stack stores currently open tags:

stack = []

This rule enforces one outer root tag:

if i > 0 and not stack:
    return False

If the stack becomes empty before the end of the string, extra content exists outside the root tag.

CDATA detection:

if code.startswith("<![CDATA[", i):

CDATA is only valid inside a tag:

if not stack:
    return False

Then we skip directly to the next ]]>:

j = code.find("]]>", i)

End tag parsing:

elif code.startswith("</", i):

The tag name is extracted:

tag = code[i + 2:j]

This validates the tag format:

len(tag) < 1
or len(tag) > 9
or not tag.isupper()
or not tag.isalpha()

Then the top stack tag must match:

if not stack or stack[-1] != tag:
    return False

Start tags work similarly:

elif code[i] == "<":

Valid tags are pushed:

stack.append(tag)

Finally:

return not stack

All tags must be properly closed.

Testing

def run_tests():
    s = Solution()

    assert s.isValid(
        "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
    ) is True

    assert s.isValid(
        "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
    ) is True

    assert s.isValid(
        "<A>  <B> </A>   </B>"
    ) is False

    assert s.isValid(
        "<DIV><A></A></DIV>"
    ) is True

    assert s.isValid(
        "<DIV><A></DIV></A>"
    ) is False

    assert s.isValid(
        "<![CDATA[test]]>"
    ) is False

    assert s.isValid(
        "<A></A><B></B>"
    ) is False

    assert s.isValid(
        "<LONGTAG></LONGTAG>"
    ) is False

    print("all tests passed")

run_tests()

Test meaning:

TestWhy
Valid CDATA exampleConfirms CDATA skipping
Complex valid symbolsConfirms parser ignores CDATA content
Incorrect nestingDetects mismatched close order
Proper nestingStandard valid case
Wrong closing tagDetects mismatch
CDATA outside rootInvalid
Multiple root tagsInvalid
Tag name longer than 9Invalid