591. Tag Validator
Problem Description
The problem presents a challenge to validate a block of code that is meant to be written in a simplified XML-like format. Specifically, you need to check if the string representing this code snippet is valid based on a number of specific criteria:
- The entire code must be enclosed within a pair of valid tags.
- A valid tag is composed of a start tag
<TAG_NAME>
and an end tag</TAG_NAME>
with the sameTAG_NAME
. - A
TAG_NAME
is valid only if it has a length between 1 to 9 characters and contains only uppercase letters. - The content within these tags (referred to as
TAG_CONTENT
) must be valid. This means it can contain other valid closed tags, specialCDATA
sections, and any other characters except for certain conditions where symbol sequences might imply invalid or mismatched tags. - Tags are considered unmatched if there's no corresponding start or end tag with the same
TAG_NAME
. - A
<
symbol is unmatched if there's no subsequent>
to form a complete tag. CDATA
sections are defined by the sequence<![CDATA[
followed byCDATA_CONTENT
ending with the first subsequent]]>
. TheCDATA_CONTENT
can include any characters and should be treated as regular text, not parsed as tags or other structures.
The aim is to parse the code and ensure all these rules are respected, in which case the method should return true
, indicating the code snippet is valid.
Intuition
To determine if the code is valid according to the given rules, a stack-based approach is used to process the tags and CDATA
sections. Here's how it works:
-
Stack for Tag Names: A stack (
stk
) is used to keep track of opening tags. When an opening tag is encountered, itsTAG_NAME
is pushed onto the stack. When the corresponding closing tag is found, we check if this tag matches the top of the stack. -
Sequential Scanning: The string
code
is scanned character by character, paying attention to the opening<
and closing>
brackets to identify tags andCDATA
sections. -
Validation Checks:
- Unwrapped Code: If there is any character outside of a tag and the stack is empty (meaning we're not within a valid tag), then the code is not wrapped in a valid closed tag, so it's invalid.
- CDATA: When
<![CDATA[
is encountered, search for the closing]]>
. If not found, theCDATA
is invalid. Otherwise, skip over this section since it's allowed to contain any characters. - End Tag: When
</
is encountered, find the next>
to identify theTAG_NAME
, then check if it's valid and matches the last opened tag by popping from the stack. If it's not valid or doesn't match, the code is invalid. - Start Tag: When
<
(but not</
) is encountered, find the next>
to identify theTAG_NAME
, check if it's valid according to the rules (upper-case and proper length), and then put it on the stack.
-
Final Check: After scanning, if the stack is not empty, some tags were not closed, so the code is invalid. If the stack is empty, all tags were properly opened and closed, and the code is valid.
This algorithm captures the nested tag structure typical in XML-like languages and uses the rules provided to validate the input string as it goes, which is a common approach for parsing problems.
Learn more about Stack patterns.
Solution Approach
The solution is implemented in Python and follows a step-by-step approach to parse and validate the code snippet:
-
Stack Data Structure: The primary data structure used is a stack, represented by a list in Python, named
stk
. This stack is used to track open tags. When an open tag is encountered, itsTAG_NAME
is pushed onto the stack. When a closing tag is found, the stack is popped to check if the closing tag matches the most recent open tag. If not, the code is invalid. -
Parsing Loop: The solution uses a
while
loop to iterate over the characters in the code string. The variablei
represents the current index, whilen
is the length of the entire code string. -
Check for Unwrapped Content: Before processing any character, the algorithm checks if there is any content when the stack is empty (indicating that we are not within a tag) with
if i and not stk:
. If this condition is true, the function returnsFalse
as this means there is unwrapped code. -
CDATA Sections: When the substring
<![CDATA[
is encountered, a search is initiated for the corresponding closing]]>
. If not found, the code is invalid which results in aFalse
. Upon finding a valid closing section, the indexi
is increased to skip the entireCDATA
section. -
Closing Tag Validation: When
</
is found, the next>
is looked for, and the enclosed string is considered aTAG_NAME
. Here, several checks are applied:- If there's no
>
, it implies an unclosed tag and returnsFalse
. - The tag name is validated for proper length and uppercase characters using the
check()
helper function. If invalid, the function returnsFalse
. - If the tag does not match the last one pushed onto the stack, this indicates unbalanced tags, leading to a return value of
False
.
- If there's no
-
Opening Tag Processing: Similar to closing tag validation, when a
<
is spotted that's not a part of a closing tag or aCDATA
section, it denotes the start of an opening tag. The name is extracted and validated, and if valid, pushed onto the stack. -
Increment Index: The index
i
is incremented after each iteration to continue parsing the next character. -
Final Stack Check: After completing the parsing loop, if the stack is not empty, it implies that there are tags that were not closed properly. If the stack is empty, this means every tag was matched with its pair, and hence the code is valid.
This solution covers all the rules mentioned for a valid code snippet, validating tags and their contents while properly handling CDATA sections. The algorithm's efficacy comes from its ability to use a stack to maintain nesting levels and validate opening and closing tags in a way that reflects common XML parsing strategies.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's examine an example to illustrate the solution approach. Suppose the input string to our algorithm is:
<A><B><![CDATA[</B>]]></B></A>
Now, let's walk through the steps of the solution using this example:
Step 1: Initialize the Stack
We start with an empty stack, stk
. It will be used to keep track of the tag names as we encounter them.
Step 2: Parsing Loop
We start scanning our code string from the beginning by iterating over each character.
Step 3: Check for Unwrapped Content
- At the very beginning,
stk
is empty, and we're at the first character<
, starting the parsing for an opening tag.
Step 4: CDATA Sections
- As we scan, we come across
<B><![CDATA[
. We ignore any tags inside the CDATA section. - We see the opening of a CDATA section with
<![CDATA[
. This signals the start of the CDATA content. - We continue to scan until we find the corresponding closing
]]>
of the CDATA section and skip over this entire section.stk
remains unaffected as the CDATA section is considered regular text.
Step 5: Closing Tag Validation
- After the CDATA section, we encounter the closing tag
</B>
. At this point, we expectB
to be on top of the stack. - We compare the enclosed
TAG_NAME
with the top element of thestk
. - Since
B
matches with the top of the stack, it means we have found a correct matching pair of tags. We popB
from the stack.
Step 6: Opening Tag Processing
- Going back to the beginning, when we first found
<A>
, we would have pushedA
to the stack, and then upon encountering<B>
,B
would have been pushed to the stack. - After having popped
B
in the previous step for the closing</B>
, onlyA
remains on the stack.
Step 7: Increment Index
- As we find each tag, we increment the index
i
to move past the processed characters.
Step 8: Final Stack Check
- After we process the entire string, we check the stack.
- The stack should be empty if all tags are properly opened and closed.
- In this example, after parsing the entire string, we should find the closing tag
</A>
, which would match withA
at the top of the stack. After poppingA
, the stack would be empty. - With an empty stack, we conclude that the input string represents a valid block of code according to the provided rules.
At the end of this process, since we have found opening and closing tags for A
and B
and properly handled the CDATA section, and the stack is empty, the algorithm would return true
, indicating that the code snippet is valid.
Solution Implementation
1class Solution:
2 def is_valid(self, code: str) -> bool:
3 # Helper function to check if the tag name is valid
4 def is_tag_valid(tag):
5 # A tag is valid if its length is between 1 and 9 and contains only uppercase letters
6 return 1 <= len(tag) <= 9 and all(c.isupper() for c in tag)
7
8 # Stack for storing opening tags
9 stack = []
10 i, n = 0, len(code)
11 while i < n:
12 # If the stack is empty and we're not at the beginning, the code is invalid
13 if i and not stack:
14 return False
15 # Handling CDATA sections
16 if code[i: i + 9] == '<![CDATA[':
17 # Find the closing sequence for the CDATA section
18 i = code.find(']]>', i + 9)
19 if i < 0:
20 return False
21 i += 2 # Move the index to the end of the ']]>' marker
22 # Handling end tags
23 elif code[i: i + 2] == '</':
24 j = i + 2
25 # Find the index of the closing bracket '>'
26 i = code.find('>', j)
27 if i < 0:
28 return False
29 tag_name = code[j:i]
30 # Check if the tag is valid, whether there is a corresponding start tag, and if they match
31 if not is_tag_valid(tag_name) or not stack or stack.pop() != tag_name:
32 return False
33 # Handling start tags
34 elif code[i] == '<':
35 j = i + 1
36 # Find the index of the closing bracket '>'
37 i = code.find('>', j)
38 if i < 0:
39 return False
40 tag_name = code[j:i]
41 # Check if the tag name is valid
42 if not is_tag_valid(tag_name):
43 return False
44 # Add the tag to the stack
45 stack.append(tag_name)
46 i += 1 # Move to the next character
47 # The code is valid if the stack is empty at the end (all tags are closed)
48 return not stack
49
1class Solution {
2 public boolean isValid(String code) {
3 // Stack to hold the opening tags
4 Deque<String> stack = new ArrayDeque<>();
5
6 for (int i = 0; i < code.length(); ++i) {
7 // Check if there is an orphan closing tag or text outside of the root tag
8 if (i > 0 && stack.isEmpty()) {
9 return false;
10 }
11
12 // Check for CDATA sections
13 if (code.startsWith("<![CDATA[", i)) {
14 i = code.indexOf("]]>", i + 9);
15 if (i < 0) { // No closing tag for CDATA
16 return false;
17 }
18 i += 2; // Move past the closing "]]>"
19 } else if (code.startsWith("</", i)) { // Check for closing tags
20 int j = i + 2;
21 i = code.indexOf(">", j);
22 if (i < 0) { // No closing '>'
23 return false;
24 }
25 String tag = code.substring(j, i);
26 // Validate tag, check stack, and pop tag if it matches
27 if (!isValidTagName(tag) || stack.isEmpty() || !stack.pop().equals(tag)) {
28 return false;
29 }
30 } else if (code.startsWith("<", i)) { // Check for opening tags
31 int j = i + 1;
32 i = code.indexOf(">", j);
33 if (i < 0) { // No closing '>'
34 return false;
35 }
36 String tag = code.substring(j, i);
37 // Validate tag and push it to the stack
38 if (!isValidTagName(tag)) {
39 return false;
40 }
41 stack.push(tag);
42 }
43 }
44 // True if all tags are properly closed
45 return stack.isEmpty();
46 }
47
48 // Helper method to validate tag names
49 private boolean isValidTagName(String tag) {
50 int length = tag.length();
51
52 // Tag length must be between 1 and 9
53 if (length < 1 || length > 9) {
54 return false;
55 }
56
57 // All characters must be uppercase letters
58 for (char ch : tag.toCharArray()) {
59 if (!Character.isUpperCase(ch)) {
60 return false;
61 }
62 }
63 return true;
64 }
65}
66
1#include <stack>
2#include <string>
3
4class Solution {
5public:
6 // Function to check if the input XML code is valid
7 bool isValid(std::string code) {
8 std::stack<std::string> tags;
9
10 for (int i = 0; i < code.size(); ++i) {
11 // If not at the start and tags are empty, it means there's a mismatch
12 if (i > 0 && tags.empty()) return false;
13
14 // Check for CDATA section
15 if (code.substr(i, 9) == "<![CDATA[") {
16 int cdataEnd = code.find("]]>", i + 9);
17 if (cdataEnd < 0) return false; // CDATA not properly closed
18 i = cdataEnd + 2; // Move index to the end of CDATA
19 }
20 // Check for closing tag
21 else if (code.substr(i, 2) == "</") {
22 int tagStart = i + 2;
23 int tagEnd = code.find('>', tagStart);
24 if (tagEnd < 0) return false; // Closing tag not properly closed
25
26 std::string tag = code.substr(tagStart, tagEnd - tagStart);
27 if (!isTagValid(tag) || tags.empty() || tags.top() != tag) return false;
28 tags.pop();
29 }
30 // Check for opening tag
31 else if (code.substr(i, 1) == "<") {
32 int tagStart = i + 1;
33 int tagEnd = code.find('>', tagStart);
34 if (tagEnd < 0) return false; // Opening tag not properly closed
35
36 std::string tag = code.substr(tagStart, tagEnd - tagStart);
37 if (!isTagValid(tag)) return false;
38 tags.push(tag);
39 }
40 }
41
42 return tags.empty(); // If no tags left, XML is valid
43 }
44
45 // Helper function to validate tag name
46 bool isTagValid(std::string tag) {
47 int tagLength = tag.size();
48
49 // Tag lengths are between 1 and 9 inclusive
50 if (tagLength < 1 || tagLength > 9) return false;
51
52 // All characters in the tag should be uppercase
53 for (char c : tag) {
54 if (!std::isupper(c)) return false;
55 }
56
57 return true;
58 }
59};
60
1// Function to check if given tag name is valid
2function isTagValid(tag: string): boolean {
3 const tagLength = tag.length;
4
5 // Tag lengths are between 1 and 9 inclusive
6 if (tagLength < 1 || tagLength > 9) return false;
7
8 // All characters in the tag should be uppercase
9 for (let c of tag) {
10 if (!c.match(/[A-Z]/)) return false;
11 }
12
13 return true;
14}
15
16// Function to check if the input XML code is valid
17function isValid(code: string): boolean {
18 const tags: string[] = [];
19
20 for (let i = 0; i < code.length; i++) {
21 // If not at the start and tags are empty, it means there's a mismatch
22 if (i > 0 && tags.length === 0) return false;
23
24 // Check for CDATA section
25 if (code.startsWith("<![CDATA[", i)) {
26 const cdataEnd = code.indexOf("]]>", i + 9);
27 if (cdataEnd < 0) return false; // CDATA not properly closed
28 i = cdataEnd + 2; // Move index to the end of CDATA
29 }
30 // Check for closing tag
31 else if (code.startsWith("</", i)) {
32 const tagStart = i + 2;
33 const tagEnd = code.indexOf('>', tagStart);
34 if (tagEnd < 0) return false; // Closing tag not properly closed
35
36 const tag = code.substring(tagStart, tagEnd);
37 if (!isTagValid(tag) || tags.length === 0 || tags[tags.length - 1] !== tag) return false;
38 tags.pop();
39 }
40 // Check for opening tag
41 else if (code[i] === "<") {
42 const tagStart = i + 1;
43 const tagEnd = code.indexOf('>', tagStart);
44 if (tagEnd < 0) return false; // Opening tag not properly closed
45
46 const tag = code.substring(tagStart, tagEnd);
47 if (!isTagValid(tag)) return false;
48 tags.push(tag);
49 }
50 }
51
52 return tags.length === 0; // If no tags left, XML is valid
53}
54
Time and Space Complexity
Time Complexity
The time complexity of the code is primarily determined by the while loop that iterates over every character in the code string. The number of iterations is bounded by n
, the length of the string.
-
The
while
loop runs inO(n)
time since it processes each character in the input string exactly once. Incrementingi
at each step guarantees linear progression through the string. -
The
code.find
function is called within loop iterations, but most instances are bounded by the length of the specific section being processed, such as the tag name or CDATA section. In the worst-case scenario, thecode.find
function for CDATA could beO(n)
if all remaining characters in the string make up part of a CDATA section. -
The
check
function is called multiple times and runs inO(k)
time, wherek
is the length of the tag, which is capped at 9, therefore it's a constant time operationO(1)
.
By accounting for the nested operations, the time complexity of the code is O(n)
, where n
is the length of the input string code
.
Space Complexity
For space complexity, the primary consideration is the stack stk
:
-
In the worst case, every tag in the string requires an entry in the stack, so the space used by the stack could grow proportional to the number of tags, which is bounded by
n/2
since a valid tag requires at least two characters. Therefore, the stack’s size isO(n)
in the worst-case scenario. -
Auxiliary space from other variables and the substring operations within the
find
methods are either constant or result in strings whose lengths are limited by the lengths of tag names and are not dependent onn
.
Hence, the space complexity of the code is O(n)
.
Learn more about how to find time and space complexity quickly using problem constraints.
A heap is a ...?
Recommended Readings
Stack Intro Imagine you have a pile of books on your desk If you want to add a new book you place it on top If you want to read a book you take it from the top And if you simply want to see which book is on the top you
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!