591. Tag Validator
Problem Description
This problem asks you to implement a tag validator that checks if a given code snippet string follows specific XML-like formatting rules.
The code snippet is considered valid if:
-
The entire code must be wrapped in a valid closed tag - meaning it starts with
<TAG_NAME>
and ends with</TAG_NAME>
where both tag names match. -
Tag names must follow strict rules:
- Only contain uppercase letters (A-Z)
- Length must be between 1 and 9 characters inclusive
-
Tag content rules:
- Can contain other valid nested tags
- Can contain CDATA sections
- Can contain any regular characters
- Cannot contain unmatched
<
symbols - Cannot contain unmatched or improperly closed tags
-
CDATA sections have the format
<![CDATA[CDATA_CONTENT]]>
:- Everything between
<![CDATA[
and the first]]>
is treated as raw text - Characters inside CDATA are not parsed as tags, even if they look like tags
- Everything between
-
Matching rules:
- Every start tag
<TAG_NAME>
must have a corresponding end tag</TAG_NAME>
- Tags must be properly nested (like balanced parentheses)
- When a
<
or</
is encountered, all characters until the next>
are parsed as a tag name
- Every start tag
The solution uses a stack-based approach to track opening tags and ensure they match with closing tags. It processes the string character by character, handling three main cases:
- CDATA sections (which are skipped over)
- Closing tags (which must match the most recent opening tag on the stack)
- Opening tags (which are pushed onto the stack)
The code is valid only if after processing the entire string, the stack is empty (all tags were properly closed) and there was always at least one tag on the stack during processing (except at the very beginning).
Intuition
The key insight is recognizing that this is fundamentally a bracket matching problem with additional constraints. Just like checking if parentheses are balanced, we need to ensure every opening tag has a corresponding closing tag in the correct order.
Why use a stack? When we encounter nested tags like <A><B></B></A>
, we need to remember which tags are still open. The most recently opened tag must be closed first (LIFO - Last In First Out), which naturally leads us to use a stack data structure.
The parsing strategy emerges from understanding that we need to handle three distinct patterns while scanning the string:
- CDATA sections - These are special because everything inside them should be ignored. We need to identify them first and skip over their content entirely.
- Closing tags (
</...>
) - These should match and pop from our stack of open tags. - Opening tags (
<...>
) - These should be validated and pushed onto the stack.
A critical observation is that the entire code must be wrapped in at least one valid tag. This means:
- We should never have an empty stack while processing (except at the very start)
- The stack should be empty at the end (all tags properly closed)
The linear scanning approach makes sense because we need to process characters in order to maintain the correct nesting structure. We can't jump around - we must respect the sequential nature of opening and closing tags.
The validation rules for tag names (uppercase only, length 1-9) are straightforward checks that can be done as we parse each tag. If any tag fails these checks, we can immediately return false
.
By combining stack-based tracking with careful string parsing and validation checks, we arrive at a solution that handles all the edge cases: nested tags, CDATA sections, invalid tag names, and unmatched tags.
Learn more about Stack patterns.
Solution Approach
The solution uses a stack-based parsing approach with linear scanning to validate the XML-like code structure.
Key Data Structure:
stk
: A stack (list) to keep track of currently open tag names
Main Algorithm Flow:
-
Initialize variables:
i
as the current position pointer (starts at 0)n
as the length of the code string- Empty stack
stk
for tracking open tags
-
Main parsing loop (
while i < n
):First, we perform a crucial check:
if i and not stk
. If we've started processing (i > 0
) but the stack is empty, it means we're outside any valid tag wrapper, making the code invalid. -
Handle three types of patterns in order:
a) CDATA Section Detection (
<![CDATA[
):- Check if current position starts with
<![CDATA[
- Find the closing
]]>
usingcode.find(']]>', i + 9)
- If not found (returns -1), the CDATA is unclosed - return
False
- Skip to the position after
]]>
by settingi += 2
after finding the end
b) Closing Tag (
</
):- Detect closing tag by checking for
</
prefix - Extract tag name between
</
and>
- Validate the tag name using
check()
function - Verify it matches the top of the stack:
stk.pop() != t
- If validation fails or stack is empty or names don't match, return
False
c) Opening Tag (
<
):- Detect opening tag by single
<
character - Extract tag name between
<
and>
- Validate tag name with
check()
function - If valid, push the tag name onto the stack:
stk.append(t)
- Check if current position starts with
-
Tag Name Validation (
check()
function):- Verifies length is between 1 and 9:
1 <= len(tag) <= 9
- Ensures all characters are uppercase:
all(c.isupper() for c in tag)
- Verifies length is between 1 and 9:
-
Increment position (
i += 1
) after processing each pattern -
Final validation:
- Return
not stk
- the stack must be empty (all tags closed)
- Return
Why this approach works:
- The order of pattern checking matters: CDATA first (longest match), then closing tags, then opening tags
- The stack ensures proper nesting - tags must be closed in reverse order of opening
- The position tracking with
find()
efficiently locates closing markers - Early termination on any validation failure optimizes performance
- The empty stack check during processing ensures the entire code is wrapped in tags
This implementation handles all edge cases: unmatched tags, invalid tag names, improperly nested tags, and CDATA sections containing tag-like content.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's trace through the algorithm with the input: "<A>text<![CDATA[<B>]]></A>"
Initial State:
code = "<A>text<![CDATA[<B>]]></A>"
i = 0
,n = 27
,stk = []
Step 1: Position 0 (<A>
)
- Check:
i and not stk
→0 and []
= False (continue) - Not CDATA (doesn't start with
<![CDATA[
) - Not closing tag (doesn't start with
</
) - Is opening tag (starts with
<
) - Find
>
at position 2 - Extract tag name:
"A"
- Validate: length=1 ✓, uppercase ✓
- Push to stack:
stk = ["A"]
- Move to position 3
Step 2: Positions 3-6 (text
)
- For each character in "text":
- Check:
i and not stk
→ stack not empty ✓ - Not
<
, so incrementi
- Check:
- Move through positions 3→4→5→6→7
Step 3: Position 7 (<![CDATA[<B>]]>
)
- Check:
i and not stk
→ stack not empty ✓ - Is CDATA! (starts with
<![CDATA[
) - Find
]]>
starting from position 16 (i+9) - Found at position 19
- Jump over entire CDATA:
i = 19 + 3 = 22
Step 4: Position 22 (</A>
)
- Check:
i and not stk
→ stack not empty ✓ - Not CDATA
- Is closing tag (starts with
</
) - Find
>
at position 25 - Extract tag name:
"A"
- Validate: length=1 ✓, uppercase ✓
- Pop from stack:
"A"
(matches!) - Stack now empty:
stk = []
- Move to position 26
Step 5: Position 26
- Check:
i and not stk
→26 and []
= True - This would fail! But we're at position 26
- Actually
i = 26
, continue to next iteration
Step 6: Loop ends
i = 27
which equalsn
- Exit loop
- Final check:
not stk
→not []
= True ✓
Result: Valid! The code is properly wrapped in <A>
tags, and the CDATA section was correctly skipped over, preventing <B>
from being parsed as a tag.
Key observations from this walkthrough:
- The CDATA section protects
<B>
from being parsed as a tag - The stack tracks that we're inside the
<A>
tag throughout processing - The algorithm correctly jumps over the entire CDATA content
- The final empty stack confirms all tags were properly matched
Solution Implementation
1class Solution:
2 def isValid(self, code: str) -> bool:
3 """
4 Validates if the given code string follows proper tag format rules.
5 Rules:
6 - Tags must be 1-9 characters long and contain only uppercase letters
7 - Tags must be properly nested and closed
8 - CDATA sections are handled specially
9 - The entire code must be wrapped in a tag
10 """
11
12 def is_valid_tag(tag: str) -> bool:
13 """Check if a tag name is valid (1-9 uppercase letters)"""
14 return 1 <= len(tag) <= 9 and all(char.isupper() for char in tag)
15
16 tag_stack = [] # Stack to track open tags
17 index = 0
18 code_length = len(code)
19
20 while index < code_length:
21 # If we're not at the start and stack is empty, code is not wrapped in tags
22 if index > 0 and not tag_stack:
23 return False
24
25 # Check for CDATA section: <![CDATA[...]]>
26 if code[index:index + 9] == '<![CDATA[':
27 # Find the closing ]]> for CDATA
28 closing_index = code.find(']]>', index + 9)
29 if closing_index < 0:
30 return False # CDATA section not properly closed
31 index = closing_index + 2 # Move past ]]>
32
33 # Check for closing tag: </TAG>
34 elif code[index:index + 2] == '</':
35 start_tag_name = index + 2
36 closing_bracket = code.find('>', start_tag_name)
37 if closing_bracket < 0:
38 return False # No closing bracket found
39
40 tag_name = code[start_tag_name:closing_bracket]
41 # Validate closing tag matches the most recent opening tag
42 if not is_valid_tag(tag_name) or not tag_stack or tag_stack.pop() != tag_name:
43 return False
44 index = closing_bracket
45
46 # Check for opening tag: <TAG>
47 elif code[index] == '<':
48 start_tag_name = index + 1
49 closing_bracket = code.find('>', start_tag_name)
50 if closing_bracket < 0:
51 return False # No closing bracket found
52
53 tag_name = code[start_tag_name:closing_bracket]
54 if not is_valid_tag(tag_name):
55 return False
56 tag_stack.append(tag_name) # Push opening tag to stack
57 index = closing_bracket
58
59 index += 1
60
61 # All tags should be closed (stack should be empty)
62 return not tag_stack
63
1class Solution {
2 /**
3 * Validates if the given code string follows valid XML-like tag rules.
4 * Rules include: proper tag matching, CDATA sections, and tag format requirements.
5 *
6 * @param code The input string to validate
7 * @return true if the code is valid, false otherwise
8 */
9 public boolean isValid(String code) {
10 // Stack to keep track of opening tags
11 Deque<String> tagStack = new ArrayDeque<>();
12
13 // Iterate through each character in the code
14 for (int i = 0; i < code.length(); ++i) {
15 // After processing the first character, the stack should not be empty
16 // (the entire content must be wrapped in a tag)
17 if (i > 0 && tagStack.isEmpty()) {
18 return false;
19 }
20
21 // Handle CDATA section: <![CDATA[...]]>
22 if (code.startsWith("<![CDATA[", i)) {
23 // Find the closing "]]>" after the CDATA opening
24 int cdataEndIndex = code.indexOf("]]>", i + 9);
25 if (cdataEndIndex < 0) {
26 return false; // No closing CDATA found
27 }
28 // Move index past the closing "]]>"
29 i = cdataEndIndex + 2;
30 }
31 // Handle closing tag: </TAG_NAME>
32 else if (code.startsWith("</", i)) {
33 int tagStartIndex = i + 2;
34 int tagEndIndex = code.indexOf(">", tagStartIndex);
35 if (tagEndIndex < 0) {
36 return false; // No closing bracket found
37 }
38
39 // Extract the tag name
40 String tagName = code.substring(tagStartIndex, tagEndIndex);
41
42 // Validate tag format and check if it matches the most recent opening tag
43 if (!isValidTag(tagName) || tagStack.isEmpty() || !tagStack.pop().equals(tagName)) {
44 return false;
45 }
46
47 // Move index to the closing bracket
48 i = tagEndIndex;
49 }
50 // Handle opening tag: <TAG_NAME>
51 else if (code.startsWith("<", i)) {
52 int tagStartIndex = i + 1;
53 int tagEndIndex = code.indexOf(">", tagStartIndex);
54 if (tagEndIndex < 0) {
55 return false; // No closing bracket found
56 }
57
58 // Extract the tag name
59 String tagName = code.substring(tagStartIndex, tagEndIndex);
60
61 // Validate tag format
62 if (!isValidTag(tagName)) {
63 return false;
64 }
65
66 // Push the opening tag onto the stack
67 tagStack.push(tagName);
68
69 // Move index to the closing bracket
70 i = tagEndIndex;
71 }
72 }
73
74 // All tags should be properly closed (stack should be empty)
75 return tagStack.isEmpty();
76 }
77
78 /**
79 * Checks if a tag name is valid.
80 * Valid tags must be 1-9 characters long and contain only uppercase letters.
81 *
82 * @param tag The tag name to validate
83 * @return true if the tag is valid, false otherwise
84 */
85 private boolean check(String tag) {
86 int tagLength = tag.length();
87
88 // Check length constraints (1-9 characters)
89 if (tagLength < 1 || tagLength > 9) {
90 return false;
91 }
92
93 // Check that all characters are uppercase letters
94 for (char c : tag.toCharArray()) {
95 if (!Character.isUpperCase(c)) {
96 return false;
97 }
98 }
99
100 return true;
101 }
102
103 /**
104 * Helper method to improve readability (delegates to check method).
105 *
106 * @param tagName The tag name to validate
107 * @return true if the tag is valid, false otherwise
108 */
109 private boolean isValidTag(String tagName) {
110 return check(tagName);
111 }
112}
113
1class Solution {
2public:
3 bool isValid(string code) {
4 stack<string> tagStack;
5
6 for (int i = 0; i < code.size(); ++i) {
7 // The code must be wrapped in a valid closed tag from the beginning
8 if (i > 0 && tagStack.empty()) {
9 return false;
10 }
11
12 // Handle CDATA section: <![CDATA[...]]>
13 if (code.substr(i, 9) == "<![CDATA[") {
14 // Find the end of CDATA section
15 int endPos = code.find("]]>", i + 9);
16 if (endPos == string::npos) {
17 return false;
18 }
19 // Skip to the position after "]]>"
20 i = endPos + 2;
21 }
22 // Handle closing tag: </TAG_NAME>
23 else if (code.substr(i, 2) == "</") {
24 int tagStartPos = i + 2;
25 int tagEndPos = code.find('>', tagStartPos);
26
27 // Check if closing '>' exists
28 if (tagEndPos == string::npos) {
29 return false;
30 }
31
32 // Extract tag name
33 string tagName = code.substr(tagStartPos, tagEndPos - tagStartPos);
34
35 // Validate tag name and check if it matches the most recent opening tag
36 if (!isValidTagName(tagName) || tagStack.empty() || tagStack.top() != tagName) {
37 return false;
38 }
39
40 tagStack.pop();
41 i = tagEndPos;
42 }
43 // Handle opening tag: <TAG_NAME>
44 else if (code[i] == '<') {
45 int tagStartPos = i + 1;
46 int tagEndPos = code.find('>', tagStartPos);
47
48 // Check if closing '>' exists
49 if (tagEndPos == string::npos) {
50 return false;
51 }
52
53 // Extract tag name
54 string tagName = code.substr(tagStartPos, tagEndPos - tagStartPos);
55
56 // Validate tag name
57 if (!isValidTagName(tagName)) {
58 return false;
59 }
60
61 tagStack.push(tagName);
62 i = tagEndPos;
63 }
64 }
65
66 // All tags must be properly closed
67 return tagStack.empty();
68 }
69
70private:
71 bool isValidTagName(const string& tagName) {
72 // Tag name must be between 1 and 9 characters
73 if (tagName.length() < 1 || tagName.length() > 9) {
74 return false;
75 }
76
77 // All characters must be uppercase letters
78 for (const char& ch : tagName) {
79 if (!isupper(ch)) {
80 return false;
81 }
82 }
83
84 return true;
85 }
86};
87
1function isValid(code: string): boolean {
2 const tagStack: string[] = [];
3
4 for (let i = 0; i < code.length; i++) {
5 // The code must be wrapped in a valid closed tag from the beginning
6 if (i > 0 && tagStack.length === 0) {
7 return false;
8 }
9
10 // Handle CDATA section: <![CDATA[...]]>
11 if (code.substring(i, i + 9) === "<![CDATA[") {
12 // Find the end of CDATA section
13 const endPos = code.indexOf("]]>", i + 9);
14 if (endPos === -1) {
15 return false;
16 }
17 // Skip to the position after "]]>"
18 i = endPos + 2;
19 }
20 // Handle closing tag: </TAG_NAME>
21 else if (code.substring(i, i + 2) === "</") {
22 const tagStartPos = i + 2;
23 const tagEndPos = code.indexOf('>', tagStartPos);
24
25 // Check if closing '>' exists
26 if (tagEndPos === -1) {
27 return false;
28 }
29
30 // Extract tag name
31 const tagName = code.substring(tagStartPos, tagEndPos);
32
33 // Validate tag name and check if it matches the most recent opening tag
34 if (!isValidTagName(tagName) || tagStack.length === 0 || tagStack[tagStack.length - 1] !== tagName) {
35 return false;
36 }
37
38 tagStack.pop();
39 i = tagEndPos;
40 }
41 // Handle opening tag: <TAG_NAME>
42 else if (code[i] === '<') {
43 const tagStartPos = i + 1;
44 const tagEndPos = code.indexOf('>', tagStartPos);
45
46 // Check if closing '>' exists
47 if (tagEndPos === -1) {
48 return false;
49 }
50
51 // Extract tag name
52 const tagName = code.substring(tagStartPos, tagEndPos);
53
54 // Validate tag name
55 if (!isValidTagName(tagName)) {
56 return false;
57 }
58
59 tagStack.push(tagName);
60 i = tagEndPos;
61 }
62 }
63
64 // All tags must be properly closed
65 return tagStack.length === 0;
66}
67
68function isValidTagName(tagName: string): boolean {
69 // Tag name must be between 1 and 9 characters
70 if (tagName.length < 1 || tagName.length > 9) {
71 return false;
72 }
73
74 // All characters must be uppercase letters
75 for (const ch of tagName) {
76 if (ch < 'A' || ch > 'Z') {
77 return false;
78 }
79 }
80
81 return true;
82}
83
Time and Space Complexity
Time Complexity: O(n)
The algorithm performs a single pass through the string with a while loop that iterates through each character position. Within each iteration:
- String slicing operations like
code[i:i+9]
andcode[i:i+2]
takeO(1)
time for fixed-length slices - The
find()
method searches for patterns, but each character is visited at most a constant number of times throughout the entire execution - The
check()
function validates tags with length up to 9, takingO(1)
time since tag length is bounded - Stack operations (
append()
andpop()
) areO(1)
Although there are nested operations with find()
, each character in the string is processed a bounded number of times, resulting in overall linear time complexity.
Space Complexity: O(n)
The space complexity is determined by:
- The stack
stk
which stores tag names. In the worst case, there could beO(n/3)
nested tags (minimum tag is<A>
with 3 characters), givingO(n)
space - String slicing creates temporary strings, but the maximum slice size is bounded by the tag length constraint (≤9 characters), contributing
O(1)
additional space - The recursive depth is
O(1)
sincecheck()
is not recursive
Therefore, the dominant factor is the stack storage, resulting in O(n)
space complexity.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrect Order of Pattern Matching
One of the most critical pitfalls is checking for patterns in the wrong order. The code must check for CDATA sections BEFORE checking for closing tags </
, and closing tags BEFORE opening tags <
.
Why this matters:
<![CDATA[
starts with<
, so if we check for opening tags first, we'd incorrectly try to parse![CDATA[
as a tag name</
starts with<
, so if we check for opening tags before closing tags, we'd miss the closing tag pattern
Incorrect approach:
if code[index] == '<': # Checking opening tag first - WRONG! # This would catch CDATA and closing tags incorrectly elif code[index:index + 2] == '</': # Never reached for closing tags elif code[index:index + 9] == '<![CDATA[': # Never reached for CDATA
Correct approach:
# Check patterns from most specific to least specific if code[index:index + 9] == '<![CDATA[': # CDATA first # Handle CDATA elif code[index:index + 2] == '</': # Then closing tags # Handle closing tag elif code[index] == '<': # Finally opening tags # Handle opening tag
2. Not Validating That Entire Code is Wrapped in Tags
A subtle but crucial requirement is that the ENTIRE code must be wrapped in at least one valid tag. Simply checking if the stack is empty at the end isn't sufficient.
The problem:
# Input: "text<A></A>" # This would pass if we only check stack at the end, but it's invalid # because "text" appears outside any tag
Solution: The code correctly handles this with:
if index > 0 and not tag_stack: return False
This ensures that once we've started processing (index > 0), we're always inside at least one tag (stack is never empty during processing).
3. Incorrectly Handling CDATA Content
A common mistake is trying to parse content inside CDATA sections. Everything between <![CDATA[
and ]]>
should be treated as raw text, even if it looks like tags.
Example that could break incorrect implementations:
code = "<A><![CDATA[<B>]]></A>" # The <B> inside CDATA should NOT be parsed as a tag
Solution: Skip directly to the end of CDATA section:
closing_index = code.find(']]>', index + 9) index = closing_index + 2 # Jump past the entire CDATA content
4. Off-by-One Errors in Index Management
Managing the index correctly when jumping past processed sections is error-prone.
Common mistakes:
- After finding a closing
>
, forgetting to set index to the position of>
(not past it) - After CDATA, setting index to
closing_index + 3
instead ofclosing_index + 2
Correct approach:
# For tags: set index to the position of '>' index = closing_bracket # The main loop will increment it by 1 # For CDATA: set index to position after ']]' index = closing_index + 2 # Points to the last '>' # The main loop will increment it by 1 to move past
5. Not Handling Edge Cases in Tag Name Extraction
When extracting tag names, ensure proper bounds checking:
Potential issue:
# If code ends with "<" or "</" without closing ">" closing_bracket = code.find('>', start_tag_name) if closing_bracket < 0: # Must check this! return False
Without this check, code[start_tag_name:closing_bracket]
would use -1 as the end index, causing incorrect slicing behavior.
What's the relationship between a tree and a graph?
Recommended Readings
Stack Intro Imagine you have a pile of books on your desk If you want to add a new book you place it on top If you want to read a book you take it from the top And if you simply want to see which book is on the top you
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!