# LeetCode 591: Tag Validator

## Problem Restatement

We are given a string called `code`.

We must determine whether it is a valid code snippet according to several XML-like rules.

The string may contain:

1. Start tags:

```text
<TAG>
```

2. End tags:

```text
</TAG>
```

3. CDATA sections:

```text
<![CDATA[ ... ]]>
```

Rules:

- Tag names must contain only uppercase English letters.
- Tag name length must be between `1` and `9`.
- Tags must be properly nested.
- The entire code must be wrapped inside one valid closed tag.
- CDATA content is treated as plain text and ignored by the parser until `]]>`.

Return `True` if the code is valid, otherwise return `False`. The official statement defines this as a parsing and stack problem with explicit rules for tags and CDATA handling. ([leetcode.com](https://leetcode.com/problems/tag-validator/), [github.com](https://github.com/doocs/leetcode/blob/main/solution/0500-0599/0591.Tag%20Validator/README_EN.md))

## Input and Output

| Item | Meaning |
|---|---|
| `code` | XML-like string |
| Output | `True` if valid, otherwise `False` |

Example function shape:

```python
def isValid(code: str) -> bool:
    ...
```

## Examples

Example 1:

```python
code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
```

Output:

```python
True
```

The outer tag is `DIV`.

The CDATA section:

```text
<![CDATA[<div>]]>
```

is treated as plain text, so `<div>` inside it does not need validation.

Example 2:

```python
code = "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
```

Output:

```python
True
```

The CDATA section safely contains symbols that would otherwise look invalid.

Example 3:

```python
code = "<A>  <B> </A>   </B>"
```

Output:

```python
False
```

The tags are not properly nested.

`<B>` is opened inside `<A>`, but `</A>` closes first.

## First Thought: Match Tags Like Parentheses

A natural idea is to treat tags like parentheses.

For example:

```text
<A><B></B></A>
```

behaves like:

```text
( ( ) )
```

This suggests using a stack.

When we see a start tag, push it.

When we see an end tag, it must match the top of the stack.

But there are extra complications:

1. Tag names must be validated.
2. CDATA must be skipped entirely.
3. The whole string must be inside exactly one root tag.

So we need a careful parser.

## Key Insight

We process the string from left to right.

At each position, there are four possibilities:

1. CDATA section
2. End tag
3. Start tag
4. Normal text

The stack stores currently open tags.

For example:

```text
<DIV><A></A></DIV>
```

processing becomes:

```text
push DIV
push A
pop A
pop DIV
```

At the end, the stack must be empty.

Also, after the root tag closes, no extra characters are allowed.

## Tag Name Validation

A valid tag name:

1. Has length between `1` and `9`.
2. Contains only uppercase English letters.

So:

```text
<A>
<DIV>
<ABCDEFGH>
```

are valid.

But:

```text
<>
<a>
<LONGTAGNAME>
<A1>
```

are invalid.

We can validate using:

```python
name.isupper() and name.isalpha()
```

and length checks.

## CDATA Handling

CDATA begins with:

```text
<![CDATA[
```

and ends with:

```text
]]>
```

Inside CDATA, every character is ignored.

For example:

```text
<![CDATA[</INVALID><TAG>]]>
```

is still valid CDATA.

The parser should skip directly to the closing `]]>`.

Important rule:

CDATA is only allowed inside an open tag.

So:

```text
<![CDATA[x]]>
```

alone is invalid.

## Algorithm

Use a stack and scan the string with index `i`.

While `i < len(code)`:

1. If we see `<![CDATA[`:
   - The stack must not be empty.
   - Find the next `]]>`.
   - If missing, return `False`.
   - Skip the entire CDATA section.

2. Else if we see `</`:
   - Find the next `>`.
   - Extract the tag name.
   - Validate the tag name.
   - The stack top must match.
   - Pop the stack.
   - If the stack becomes empty before the end of the string, return `False`.

3. Else if we see `<`:
   - Find the next `>`.
   - Extract the tag name.
   - Validate the tag name.
   - Push the tag name onto the stack.

4. Else:
   - Normal text is allowed only inside an open tag.
   - If the stack is empty, return `False`.

At the end, return whether the stack is empty.

## Correctness

The stack always stores exactly the currently open tags that have not yet been closed.

When a valid start tag is encountered, its tag name is pushed onto the stack. Therefore, the stack correctly tracks nesting depth.

When an end tag is encountered, the algorithm checks whether the tag name matches the top of the stack. If it does not match, the nesting structure is invalid, so returning `False` is correct. If it matches, popping the stack correctly closes that tag.

CDATA sections are skipped entirely between `<![CDATA[` and `]]>`, so any characters inside CDATA are ignored exactly as required by the problem statement.

Text outside tags is only allowed when some tag is currently open. Therefore, if the stack is empty while processing plain text, the code is invalid.

The requirement that the entire string must be wrapped in exactly one root tag is enforced by checking that once the stack becomes empty, the parser must already be at the end of the string.

At the end of parsing, the stack is empty exactly when every opened tag has been properly closed.

Therefore, the algorithm returns `True` exactly for valid code snippets.

## Complexity

Let:

```text
n = len(code)
```

| Metric | Value | Why |
|---|---|---|
| Time | `O(n)` | Each character is processed at most a constant number of times |
| Space | `O(n)` | The stack may contain nested tags |

## Implementation

```python
class Solution:
    def isValid(self, code: str) -> bool:
        stack = []
        n = len(code)
        i = 0

        while i < n:
            if i > 0 and not stack:
                return False

            if code.startswith("<![CDATA[", i):
                if not stack:
                    return False

                j = code.find("]]>", i)

                if j == -1:
                    return False

                i = j + 3

            elif code.startswith("</", i):
                j = code.find(">", i)

                if j == -1:
                    return False

                tag = code[i + 2:j]

                if (
                    len(tag) < 1
                    or len(tag) > 9
                    or not tag.isupper()
                    or not tag.isalpha()
                ):
                    return False

                if not stack or stack[-1] != tag:
                    return False

                stack.pop()

                i = j + 1

            elif code[i] == "<":
                j = code.find(">", i)

                if j == -1:
                    return False

                tag = code[i + 1:j]

                if (
                    len(tag) < 1
                    or len(tag) > 9
                    or not tag.isupper()
                    or not tag.isalpha()
                ):
                    return False

                stack.append(tag)

                i = j + 1

            else:
                if not stack:
                    return False

                i += 1

        return not stack
```

## Code Explanation

The stack stores currently open tags:

```python
stack = []
```

This rule enforces one outer root tag:

```python
if i > 0 and not stack:
    return False
```

If the stack becomes empty before the end of the string, extra content exists outside the root tag.

CDATA detection:

```python
if code.startswith("<![CDATA[", i):
```

CDATA is only valid inside a tag:

```python
if not stack:
    return False
```

Then we skip directly to the next `]]>`:

```python
j = code.find("]]>", i)
```

End tag parsing:

```python
elif code.startswith("</", i):
```

The tag name is extracted:

```python
tag = code[i + 2:j]
```

This validates the tag format:

```python
len(tag) < 1
or len(tag) > 9
or not tag.isupper()
or not tag.isalpha()
```

Then the top stack tag must match:

```python
if not stack or stack[-1] != tag:
    return False
```

Start tags work similarly:

```python
elif code[i] == "<":
```

Valid tags are pushed:

```python
stack.append(tag)
```

Finally:

```python
return not stack
```

All tags must be properly closed.

## Testing

```python
def run_tests():
    s = Solution()

    assert s.isValid(
        "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
    ) is True

    assert s.isValid(
        "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
    ) is True

    assert s.isValid(
        "<A>  <B> </A>   </B>"
    ) is False

    assert s.isValid(
        "<DIV><A></A></DIV>"
    ) is True

    assert s.isValid(
        "<DIV><A></DIV></A>"
    ) is False

    assert s.isValid(
        "<![CDATA[test]]>"
    ) is False

    assert s.isValid(
        "<A></A><B></B>"
    ) is False

    assert s.isValid(
        "<LONGTAG></LONGTAG>"
    ) is False

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Why |
|---|---|
| Valid CDATA example | Confirms CDATA skipping |
| Complex valid symbols | Confirms parser ignores CDATA content |
| Incorrect nesting | Detects mismatched close order |
| Proper nesting | Standard valid case |
| Wrong closing tag | Detects mismatch |
| CDATA outside root | Invalid |
| Multiple root tags | Invalid |
| Tag name longer than 9 | Invalid |

