Facebook Pixel

591. Tag Validator

Problem Description

This problem asks you to implement a tag validator that checks if a given code snippet string follows specific XML-like formatting rules.

The code snippet is considered valid if:

  1. The entire code must be wrapped in a valid closed tag - meaning it starts with <TAG_NAME> and ends with </TAG_NAME> where both tag names match.

  2. Tag names must follow strict rules:

    • Only contain uppercase letters (A-Z)
    • Length must be between 1 and 9 characters inclusive
  3. Tag content rules:

    • Can contain other valid nested tags
    • Can contain CDATA sections
    • Can contain any regular characters
    • Cannot contain unmatched < symbols
    • Cannot contain unmatched or improperly closed tags
  4. CDATA sections have the format <![CDATA[CDATA_CONTENT]]>:

    • Everything between <![CDATA[ and the first ]]> is treated as raw text
    • Characters inside CDATA are not parsed as tags, even if they look like tags
  5. Matching rules:

    • Every start tag <TAG_NAME> must have a corresponding end tag </TAG_NAME>
    • Tags must be properly nested (like balanced parentheses)
    • When a < or </ is encountered, all characters until the next > are parsed as a tag name

The solution uses a stack-based approach to track opening tags and ensure they match with closing tags. It processes the string character by character, handling three main cases:

  • CDATA sections (which are skipped over)
  • Closing tags (which must match the most recent opening tag on the stack)
  • Opening tags (which are pushed onto the stack)

The code is valid only if after processing the entire string, the stack is empty (all tags were properly closed) and there was always at least one tag on the stack during processing (except at the very beginning).

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is recognizing that this is fundamentally a bracket matching problem with additional constraints. Just like checking if parentheses are balanced, we need to ensure every opening tag has a corresponding closing tag in the correct order.

Why use a stack? When we encounter nested tags like <A><B></B></A>, we need to remember which tags are still open. The most recently opened tag must be closed first (LIFO - Last In First Out), which naturally leads us to use a stack data structure.

The parsing strategy emerges from understanding that we need to handle three distinct patterns while scanning the string:

  1. CDATA sections - These are special because everything inside them should be ignored. We need to identify them first and skip over their content entirely.
  2. Closing tags (</...>) - These should match and pop from our stack of open tags.
  3. Opening tags (<...>) - These should be validated and pushed onto the stack.

A critical observation is that the entire code must be wrapped in at least one valid tag. This means:

  • We should never have an empty stack while processing (except at the very start)
  • The stack should be empty at the end (all tags properly closed)

The linear scanning approach makes sense because we need to process characters in order to maintain the correct nesting structure. We can't jump around - we must respect the sequential nature of opening and closing tags.

The validation rules for tag names (uppercase only, length 1-9) are straightforward checks that can be done as we parse each tag. If any tag fails these checks, we can immediately return false.

By combining stack-based tracking with careful string parsing and validation checks, we arrive at a solution that handles all the edge cases: nested tags, CDATA sections, invalid tag names, and unmatched tags.

Learn more about Stack patterns.

Solution Approach

The solution uses a stack-based parsing approach with linear scanning to validate the XML-like code structure.

Key Data Structure:

  • stk: A stack (list) to keep track of currently open tag names

Main Algorithm Flow:

  1. Initialize variables:

    • i as the current position pointer (starts at 0)
    • n as the length of the code string
    • Empty stack stk for tracking open tags
  2. Main parsing loop (while i < n):

    First, we perform a crucial check: if i and not stk. If we've started processing (i > 0) but the stack is empty, it means we're outside any valid tag wrapper, making the code invalid.

  3. Handle three types of patterns in order:

    a) CDATA Section Detection (<![CDATA[):

    • Check if current position starts with <![CDATA[
    • Find the closing ]]> using code.find(']]>', i + 9)
    • If not found (returns -1), the CDATA is unclosed - return False
    • Skip to the position after ]]> by setting i += 2 after finding the end

    b) Closing Tag (</):

    • Detect closing tag by checking for </ prefix
    • Extract tag name between </ and >
    • Validate the tag name using check() function
    • Verify it matches the top of the stack: stk.pop() != t
    • If validation fails or stack is empty or names don't match, return False

    c) Opening Tag (<):

    • Detect opening tag by single < character
    • Extract tag name between < and >
    • Validate tag name with check() function
    • If valid, push the tag name onto the stack: stk.append(t)
  4. Tag Name Validation (check() function):

    • Verifies length is between 1 and 9: 1 <= len(tag) <= 9
    • Ensures all characters are uppercase: all(c.isupper() for c in tag)
  5. Increment position (i += 1) after processing each pattern

  6. Final validation:

    • Return not stk - the stack must be empty (all tags closed)

Why this approach works:

  • The order of pattern checking matters: CDATA first (longest match), then closing tags, then opening tags
  • The stack ensures proper nesting - tags must be closed in reverse order of opening
  • The position tracking with find() efficiently locates closing markers
  • Early termination on any validation failure optimizes performance
  • The empty stack check during processing ensures the entire code is wrapped in tags

This implementation handles all edge cases: unmatched tags, invalid tag names, improperly nested tags, and CDATA sections containing tag-like content.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's trace through the algorithm with the input: "<A>text<![CDATA[<B>]]></A>"

Initial State:

  • code = "<A>text<![CDATA[<B>]]></A>"
  • i = 0, n = 27, stk = []

Step 1: Position 0 (<A>)

  • Check: i and not stk0 and [] = False (continue)
  • Not CDATA (doesn't start with <![CDATA[)
  • Not closing tag (doesn't start with </)
  • Is opening tag (starts with <)
  • Find > at position 2
  • Extract tag name: "A"
  • Validate: length=1 ✓, uppercase ✓
  • Push to stack: stk = ["A"]
  • Move to position 3

Step 2: Positions 3-6 (text)

  • For each character in "text":
    • Check: i and not stk → stack not empty ✓
    • Not <, so increment i
  • Move through positions 3→4→5→6→7

Step 3: Position 7 (<![CDATA[<B>]]>)

  • Check: i and not stk → stack not empty ✓
  • Is CDATA! (starts with <![CDATA[)
  • Find ]]> starting from position 16 (i+9)
  • Found at position 19
  • Jump over entire CDATA: i = 19 + 3 = 22

Step 4: Position 22 (</A>)

  • Check: i and not stk → stack not empty ✓
  • Not CDATA
  • Is closing tag (starts with </)
  • Find > at position 25
  • Extract tag name: "A"
  • Validate: length=1 ✓, uppercase ✓
  • Pop from stack: "A" (matches!)
  • Stack now empty: stk = []
  • Move to position 26

Step 5: Position 26

  • Check: i and not stk26 and [] = True
  • This would fail! But we're at position 26
  • Actually i = 26, continue to next iteration

Step 6: Loop ends

  • i = 27 which equals n
  • Exit loop
  • Final check: not stknot [] = True ✓

Result: Valid! The code is properly wrapped in <A> tags, and the CDATA section was correctly skipped over, preventing <B> from being parsed as a tag.

Key observations from this walkthrough:

  1. The CDATA section protects <B> from being parsed as a tag
  2. The stack tracks that we're inside the <A> tag throughout processing
  3. The algorithm correctly jumps over the entire CDATA content
  4. The final empty stack confirms all tags were properly matched

Solution Implementation

1class Solution:
2    def isValid(self, code: str) -> bool:
3        """
4        Validates if the given code string follows proper tag format rules.
5        Rules:
6        - Tags must be 1-9 characters long and contain only uppercase letters
7        - Tags must be properly nested and closed
8        - CDATA sections are handled specially
9        - The entire code must be wrapped in a tag
10        """
11      
12        def is_valid_tag(tag: str) -> bool:
13            """Check if a tag name is valid (1-9 uppercase letters)"""
14            return 1 <= len(tag) <= 9 and all(char.isupper() for char in tag)
15      
16        tag_stack = []  # Stack to track open tags
17        index = 0
18        code_length = len(code)
19      
20        while index < code_length:
21            # If we're not at the start and stack is empty, code is not wrapped in tags
22            if index > 0 and not tag_stack:
23                return False
24          
25            # Check for CDATA section: <![CDATA[...]]>
26            if code[index:index + 9] == '<![CDATA[':
27                # Find the closing ]]> for CDATA
28                closing_index = code.find(']]>', index + 9)
29                if closing_index < 0:
30                    return False  # CDATA section not properly closed
31                index = closing_index + 2  # Move past ]]>
32              
33            # Check for closing tag: </TAG>
34            elif code[index:index + 2] == '</':
35                start_tag_name = index + 2
36                closing_bracket = code.find('>', start_tag_name)
37                if closing_bracket < 0:
38                    return False  # No closing bracket found
39              
40                tag_name = code[start_tag_name:closing_bracket]
41                # Validate closing tag matches the most recent opening tag
42                if not is_valid_tag(tag_name) or not tag_stack or tag_stack.pop() != tag_name:
43                    return False
44                index = closing_bracket
45              
46            # Check for opening tag: <TAG>
47            elif code[index] == '<':
48                start_tag_name = index + 1
49                closing_bracket = code.find('>', start_tag_name)
50                if closing_bracket < 0:
51                    return False  # No closing bracket found
52              
53                tag_name = code[start_tag_name:closing_bracket]
54                if not is_valid_tag(tag_name):
55                    return False
56                tag_stack.append(tag_name)  # Push opening tag to stack
57                index = closing_bracket
58          
59            index += 1
60      
61        # All tags should be closed (stack should be empty)
62        return not tag_stack
63
1class Solution {
2    /**
3     * Validates if the given code string follows valid XML-like tag rules.
4     * Rules include: proper tag matching, CDATA sections, and tag format requirements.
5     * 
6     * @param code The input string to validate
7     * @return true if the code is valid, false otherwise
8     */
9    public boolean isValid(String code) {
10        // Stack to keep track of opening tags
11        Deque<String> tagStack = new ArrayDeque<>();
12      
13        // Iterate through each character in the code
14        for (int i = 0; i < code.length(); ++i) {
15            // After processing the first character, the stack should not be empty
16            // (the entire content must be wrapped in a tag)
17            if (i > 0 && tagStack.isEmpty()) {
18                return false;
19            }
20          
21            // Handle CDATA section: <![CDATA[...]]>
22            if (code.startsWith("<![CDATA[", i)) {
23                // Find the closing "]]>" after the CDATA opening
24                int cdataEndIndex = code.indexOf("]]>", i + 9);
25                if (cdataEndIndex < 0) {
26                    return false;  // No closing CDATA found
27                }
28                // Move index past the closing "]]>"
29                i = cdataEndIndex + 2;
30            } 
31            // Handle closing tag: </TAG_NAME>
32            else if (code.startsWith("</", i)) {
33                int tagStartIndex = i + 2;
34                int tagEndIndex = code.indexOf(">", tagStartIndex);
35                if (tagEndIndex < 0) {
36                    return false;  // No closing bracket found
37                }
38              
39                // Extract the tag name
40                String tagName = code.substring(tagStartIndex, tagEndIndex);
41              
42                // Validate tag format and check if it matches the most recent opening tag
43                if (!isValidTag(tagName) || tagStack.isEmpty() || !tagStack.pop().equals(tagName)) {
44                    return false;
45                }
46              
47                // Move index to the closing bracket
48                i = tagEndIndex;
49            } 
50            // Handle opening tag: <TAG_NAME>
51            else if (code.startsWith("<", i)) {
52                int tagStartIndex = i + 1;
53                int tagEndIndex = code.indexOf(">", tagStartIndex);
54                if (tagEndIndex < 0) {
55                    return false;  // No closing bracket found
56                }
57              
58                // Extract the tag name
59                String tagName = code.substring(tagStartIndex, tagEndIndex);
60              
61                // Validate tag format
62                if (!isValidTag(tagName)) {
63                    return false;
64                }
65              
66                // Push the opening tag onto the stack
67                tagStack.push(tagName);
68              
69                // Move index to the closing bracket
70                i = tagEndIndex;
71            }
72        }
73      
74        // All tags should be properly closed (stack should be empty)
75        return tagStack.isEmpty();
76    }
77
78    /**
79     * Checks if a tag name is valid.
80     * Valid tags must be 1-9 characters long and contain only uppercase letters.
81     * 
82     * @param tag The tag name to validate
83     * @return true if the tag is valid, false otherwise
84     */
85    private boolean check(String tag) {
86        int tagLength = tag.length();
87      
88        // Check length constraints (1-9 characters)
89        if (tagLength < 1 || tagLength > 9) {
90            return false;
91        }
92      
93        // Check that all characters are uppercase letters
94        for (char c : tag.toCharArray()) {
95            if (!Character.isUpperCase(c)) {
96                return false;
97            }
98        }
99      
100        return true;
101    }
102  
103    /**
104     * Helper method to improve readability (delegates to check method).
105     * 
106     * @param tagName The tag name to validate
107     * @return true if the tag is valid, false otherwise
108     */
109    private boolean isValidTag(String tagName) {
110        return check(tagName);
111    }
112}
113
1class Solution {
2public:
3    bool isValid(string code) {
4        stack<string> tagStack;
5      
6        for (int i = 0; i < code.size(); ++i) {
7            // The code must be wrapped in a valid closed tag from the beginning
8            if (i > 0 && tagStack.empty()) {
9                return false;
10            }
11          
12            // Handle CDATA section: <![CDATA[...]]>
13            if (code.substr(i, 9) == "<![CDATA[") {
14                // Find the end of CDATA section
15                int endPos = code.find("]]>", i + 9);
16                if (endPos == string::npos) {
17                    return false;
18                }
19                // Skip to the position after "]]>"
20                i = endPos + 2;
21            }
22            // Handle closing tag: </TAG_NAME>
23            else if (code.substr(i, 2) == "</") {
24                int tagStartPos = i + 2;
25                int tagEndPos = code.find('>', tagStartPos);
26              
27                // Check if closing '>' exists
28                if (tagEndPos == string::npos) {
29                    return false;
30                }
31              
32                // Extract tag name
33                string tagName = code.substr(tagStartPos, tagEndPos - tagStartPos);
34              
35                // Validate tag name and check if it matches the most recent opening tag
36                if (!isValidTagName(tagName) || tagStack.empty() || tagStack.top() != tagName) {
37                    return false;
38                }
39              
40                tagStack.pop();
41                i = tagEndPos;
42            }
43            // Handle opening tag: <TAG_NAME>
44            else if (code[i] == '<') {
45                int tagStartPos = i + 1;
46                int tagEndPos = code.find('>', tagStartPos);
47              
48                // Check if closing '>' exists
49                if (tagEndPos == string::npos) {
50                    return false;
51                }
52              
53                // Extract tag name
54                string tagName = code.substr(tagStartPos, tagEndPos - tagStartPos);
55              
56                // Validate tag name
57                if (!isValidTagName(tagName)) {
58                    return false;
59                }
60              
61                tagStack.push(tagName);
62                i = tagEndPos;
63            }
64        }
65      
66        // All tags must be properly closed
67        return tagStack.empty();
68    }
69
70private:
71    bool isValidTagName(const string& tagName) {
72        // Tag name must be between 1 and 9 characters
73        if (tagName.length() < 1 || tagName.length() > 9) {
74            return false;
75        }
76      
77        // All characters must be uppercase letters
78        for (const char& ch : tagName) {
79            if (!isupper(ch)) {
80                return false;
81            }
82        }
83      
84        return true;
85    }
86};
87
1function isValid(code: string): boolean {
2    const tagStack: string[] = [];
3  
4    for (let i = 0; i < code.length; i++) {
5        // The code must be wrapped in a valid closed tag from the beginning
6        if (i > 0 && tagStack.length === 0) {
7            return false;
8        }
9      
10        // Handle CDATA section: <![CDATA[...]]>
11        if (code.substring(i, i + 9) === "<![CDATA[") {
12            // Find the end of CDATA section
13            const endPos = code.indexOf("]]>", i + 9);
14            if (endPos === -1) {
15                return false;
16            }
17            // Skip to the position after "]]>"
18            i = endPos + 2;
19        }
20        // Handle closing tag: </TAG_NAME>
21        else if (code.substring(i, i + 2) === "</") {
22            const tagStartPos = i + 2;
23            const tagEndPos = code.indexOf('>', tagStartPos);
24          
25            // Check if closing '>' exists
26            if (tagEndPos === -1) {
27                return false;
28            }
29          
30            // Extract tag name
31            const tagName = code.substring(tagStartPos, tagEndPos);
32          
33            // Validate tag name and check if it matches the most recent opening tag
34            if (!isValidTagName(tagName) || tagStack.length === 0 || tagStack[tagStack.length - 1] !== tagName) {
35                return false;
36            }
37          
38            tagStack.pop();
39            i = tagEndPos;
40        }
41        // Handle opening tag: <TAG_NAME>
42        else if (code[i] === '<') {
43            const tagStartPos = i + 1;
44            const tagEndPos = code.indexOf('>', tagStartPos);
45          
46            // Check if closing '>' exists
47            if (tagEndPos === -1) {
48                return false;
49            }
50          
51            // Extract tag name
52            const tagName = code.substring(tagStartPos, tagEndPos);
53          
54            // Validate tag name
55            if (!isValidTagName(tagName)) {
56                return false;
57            }
58          
59            tagStack.push(tagName);
60            i = tagEndPos;
61        }
62    }
63  
64    // All tags must be properly closed
65    return tagStack.length === 0;
66}
67
68function isValidTagName(tagName: string): boolean {
69    // Tag name must be between 1 and 9 characters
70    if (tagName.length < 1 || tagName.length > 9) {
71        return false;
72    }
73  
74    // All characters must be uppercase letters
75    for (const ch of tagName) {
76        if (ch < 'A' || ch > 'Z') {
77            return false;
78        }
79    }
80  
81    return true;
82}
83

Time and Space Complexity

Time Complexity: O(n)

The algorithm performs a single pass through the string with a while loop that iterates through each character position. Within each iteration:

  • String slicing operations like code[i:i+9] and code[i:i+2] take O(1) time for fixed-length slices
  • The find() method searches for patterns, but each character is visited at most a constant number of times throughout the entire execution
  • The check() function validates tags with length up to 9, taking O(1) time since tag length is bounded
  • Stack operations (append() and pop()) are O(1)

Although there are nested operations with find(), each character in the string is processed a bounded number of times, resulting in overall linear time complexity.

Space Complexity: O(n)

The space complexity is determined by:

  • The stack stk which stores tag names. In the worst case, there could be O(n/3) nested tags (minimum tag is <A> with 3 characters), giving O(n) space
  • String slicing creates temporary strings, but the maximum slice size is bounded by the tag length constraint (≤9 characters), contributing O(1) additional space
  • The recursive depth is O(1) since check() is not recursive

Therefore, the dominant factor is the stack storage, resulting in O(n) space complexity.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Order of Pattern Matching

One of the most critical pitfalls is checking for patterns in the wrong order. The code must check for CDATA sections BEFORE checking for closing tags </, and closing tags BEFORE opening tags <.

Why this matters:

  • <![CDATA[ starts with <, so if we check for opening tags first, we'd incorrectly try to parse ![CDATA[ as a tag name
  • </ starts with <, so if we check for opening tags before closing tags, we'd miss the closing tag pattern

Incorrect approach:

if code[index] == '<':  # Checking opening tag first - WRONG!
    # This would catch CDATA and closing tags incorrectly
elif code[index:index + 2] == '</':
    # Never reached for closing tags
elif code[index:index + 9] == '<![CDATA[':
    # Never reached for CDATA

Correct approach:

# Check patterns from most specific to least specific
if code[index:index + 9] == '<![CDATA[':  # CDATA first
    # Handle CDATA
elif code[index:index + 2] == '</':  # Then closing tags
    # Handle closing tag
elif code[index] == '<':  # Finally opening tags
    # Handle opening tag

2. Not Validating That Entire Code is Wrapped in Tags

A subtle but crucial requirement is that the ENTIRE code must be wrapped in at least one valid tag. Simply checking if the stack is empty at the end isn't sufficient.

The problem:

# Input: "text<A></A>"
# This would pass if we only check stack at the end, but it's invalid
# because "text" appears outside any tag

Solution: The code correctly handles this with:

if index > 0 and not tag_stack:
    return False

This ensures that once we've started processing (index > 0), we're always inside at least one tag (stack is never empty during processing).

3. Incorrectly Handling CDATA Content

A common mistake is trying to parse content inside CDATA sections. Everything between <![CDATA[ and ]]> should be treated as raw text, even if it looks like tags.

Example that could break incorrect implementations:

code = "<A><![CDATA[<B>]]></A>"
# The <B> inside CDATA should NOT be parsed as a tag

Solution: Skip directly to the end of CDATA section:

closing_index = code.find(']]>', index + 9)
index = closing_index + 2  # Jump past the entire CDATA content

4. Off-by-One Errors in Index Management

Managing the index correctly when jumping past processed sections is error-prone.

Common mistakes:

  • After finding a closing >, forgetting to set index to the position of > (not past it)
  • After CDATA, setting index to closing_index + 3 instead of closing_index + 2

Correct approach:

# For tags: set index to the position of '>'
index = closing_bracket
# The main loop will increment it by 1

# For CDATA: set index to position after ']]'
index = closing_index + 2  # Points to the last '>'
# The main loop will increment it by 1 to move past

5. Not Handling Edge Cases in Tag Name Extraction

When extracting tag names, ensure proper bounds checking:

Potential issue:

# If code ends with "<" or "</" without closing ">"
closing_bracket = code.find('>', start_tag_name)
if closing_bracket < 0:  # Must check this!
    return False

Without this check, code[start_tag_name:closing_bracket] would use -1 as the end index, causing incorrect slicing behavior.

Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

What's the relationship between a tree and a graph?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More