591. Tag Validator


Problem Description

The problem presents a challenge to validate a block of code that is meant to be written in a simplified XML-like format. Specifically, you need to check if the string representing this code snippet is valid based on a number of specific criteria:

  1. The entire code must be enclosed within a pair of valid tags.
  2. A valid tag is composed of a start tag <TAG_NAME> and an end tag </TAG_NAME> with the same TAG_NAME.
  3. A TAG_NAME is valid only if it has a length between 1 to 9 characters and contains only uppercase letters.
  4. The content within these tags (referred to as TAG_CONTENT) must be valid. This means it can contain other valid closed tags, special CDATA sections, and any other characters except for certain conditions where symbol sequences might imply invalid or mismatched tags.
  5. Tags are considered unmatched if there's no corresponding start or end tag with the same TAG_NAME.
  6. A < symbol is unmatched if there's no subsequent > to form a complete tag.
  7. CDATA sections are defined by the sequence <![CDATA[ followed by CDATA_CONTENT ending with the first subsequent ]]>. The CDATA_CONTENT can include any characters and should be treated as regular text, not parsed as tags or other structures.

The aim is to parse the code and ensure all these rules are respected, in which case the method should return true, indicating the code snippet is valid.

Intuition

To determine if the code is valid according to the given rules, a stack-based approach is used to process the tags and CDATA sections. Here's how it works:

  1. Stack for Tag Names: A stack (stk) is used to keep track of opening tags. When an opening tag is encountered, its TAG_NAME is pushed onto the stack. When the corresponding closing tag is found, we check if this tag matches the top of the stack.

  2. Sequential Scanning: The string code is scanned character by character, paying attention to the opening < and closing > brackets to identify tags and CDATA sections.

  3. Validation Checks:

    • Unwrapped Code: If there is any character outside of a tag and the stack is empty (meaning we're not within a valid tag), then the code is not wrapped in a valid closed tag, so it's invalid.
    • CDATA: When <![CDATA[ is encountered, search for the closing ]]>. If not found, the CDATA is invalid. Otherwise, skip over this section since it's allowed to contain any characters.
    • End Tag: When </ is encountered, find the next > to identify the TAG_NAME, then check if it's valid and matches the last opened tag by popping from the stack. If it's not valid or doesn't match, the code is invalid.
    • Start Tag: When < (but not </) is encountered, find the next > to identify the TAG_NAME, check if it's valid according to the rules (upper-case and proper length), and then put it on the stack.
  4. Final Check: After scanning, if the stack is not empty, some tags were not closed, so the code is invalid. If the stack is empty, all tags were properly opened and closed, and the code is valid.

This algorithm captures the nested tag structure typical in XML-like languages and uses the rules provided to validate the input string as it goes, which is a common approach for parsing problems.

Learn more about Stack patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which of these properties could exist for a graph but not a tree?

Solution Approach

The solution is implemented in Python and follows a step-by-step approach to parse and validate the code snippet:

  1. Stack Data Structure: The primary data structure used is a stack, represented by a list in Python, named stk. This stack is used to track open tags. When an open tag is encountered, its TAG_NAME is pushed onto the stack. When a closing tag is found, the stack is popped to check if the closing tag matches the most recent open tag. If not, the code is invalid.

  2. Parsing Loop: The solution uses a while loop to iterate over the characters in the code string. The variable i represents the current index, while n is the length of the entire code string.

  3. Check for Unwrapped Content: Before processing any character, the algorithm checks if there is any content when the stack is empty (indicating that we are not within a tag) with if i and not stk:. If this condition is true, the function returns False as this means there is unwrapped code.

  4. CDATA Sections: When the substring <![CDATA[ is encountered, a search is initiated for the corresponding closing ]]>. If not found, the code is invalid which results in a False. Upon finding a valid closing section, the index i is increased to skip the entire CDATA section.

  5. Closing Tag Validation: When </ is found, the next > is looked for, and the enclosed string is considered a TAG_NAME. Here, several checks are applied:

    • If there's no >, it implies an unclosed tag and returns False.
    • The tag name is validated for proper length and uppercase characters using the check() helper function. If invalid, the function returns False.
    • If the tag does not match the last one pushed onto the stack, this indicates unbalanced tags, leading to a return value of False.
  6. Opening Tag Processing: Similar to closing tag validation, when a < is spotted that's not a part of a closing tag or a CDATA section, it denotes the start of an opening tag. The name is extracted and validated, and if valid, pushed onto the stack.

  7. Increment Index: The index i is incremented after each iteration to continue parsing the next character.

  8. Final Stack Check: After completing the parsing loop, if the stack is not empty, it implies that there are tags that were not closed properly. If the stack is empty, this means every tag was matched with its pair, and hence the code is valid.

This solution covers all the rules mentioned for a valid code snippet, validating tags and their contents while properly handling CDATA sections. The algorithm's efficacy comes from its ability to use a stack to maintain nesting levels and validate opening and closing tags in a way that reflects common XML parsing strategies.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

What is the running time of the following code?

1int sqrt(int n) {
2  for (int guess = 1; guess * guess <= n; guess++) {
3    if (guess * guess == n) {
4      return guess;
5    }
6  }
7  return -1;
8}

Example Walkthrough

Let's examine an example to illustrate the solution approach. Suppose the input string to our algorithm is:

1<A><B><![CDATA[</B>]]></B></A>

Now, let's walk through the steps of the solution using this example:

Step 1: Initialize the Stack

We start with an empty stack, stk. It will be used to keep track of the tag names as we encounter them.

Step 2: Parsing Loop

We start scanning our code string from the beginning by iterating over each character.

Step 3: Check for Unwrapped Content

  • At the very beginning, stk is empty, and we're at the first character <, starting the parsing for an opening tag.

Step 4: CDATA Sections

  • As we scan, we come across <B><![CDATA[. We ignore any tags inside the CDATA section.
  • We see the opening of a CDATA section with <![CDATA[. This signals the start of the CDATA content.
  • We continue to scan until we find the corresponding closing ]]> of the CDATA section and skip over this entire section. stk remains unaffected as the CDATA section is considered regular text.

Step 5: Closing Tag Validation

  • After the CDATA section, we encounter the closing tag </B>. At this point, we expect B to be on top of the stack.
  • We compare the enclosed TAG_NAME with the top element of the stk.
  • Since B matches with the top of the stack, it means we have found a correct matching pair of tags. We pop B from the stack.

Step 6: Opening Tag Processing

  • Going back to the beginning, when we first found <A>, we would have pushed A to the stack, and then upon encountering <B>, B would have been pushed to the stack.
  • After having popped B in the previous step for the closing </B>, only A remains on the stack.

Step 7: Increment Index

  • As we find each tag, we increment the index i to move past the processed characters.

Step 8: Final Stack Check

  • After we process the entire string, we check the stack.
  • The stack should be empty if all tags are properly opened and closed.
  • In this example, after parsing the entire string, we should find the closing tag </A>, which would match with A at the top of the stack. After popping A, the stack would be empty.
  • With an empty stack, we conclude that the input string represents a valid block of code according to the provided rules.

At the end of this process, since we have found opening and closing tags for A and B and properly handled the CDATA section, and the stack is empty, the algorithm would return true, indicating that the code snippet is valid.

Solution Implementation

1class Solution:
2    def is_valid(self, code: str) -> bool:
3        # Helper function to check if the tag name is valid
4        def is_tag_valid(tag):
5            # A tag is valid if its length is between 1 and 9 and contains only uppercase letters
6            return 1 <= len(tag) <= 9 and all(c.isupper() for c in tag)
7
8        # Stack for storing opening tags
9        stack = []
10        i, n = 0, len(code)
11        while i < n:
12            # If the stack is empty and we're not at the beginning, the code is invalid
13            if i and not stack:
14                return False
15            # Handling CDATA sections
16            if code[i: i + 9] == '<![CDATA[':
17                # Find the closing sequence for the CDATA section
18                i = code.find(']]>', i + 9)
19                if i < 0:
20                    return False
21                i += 2  # Move the index to the end of the ']]>' marker
22            # Handling end tags
23            elif code[i: i + 2] == '</':
24                j = i + 2
25                # Find the index of the closing bracket '>'
26                i = code.find('>', j)
27                if i < 0:
28                    return False
29                tag_name = code[j:i]
30                # Check if the tag is valid, whether there is a corresponding start tag, and if they match
31                if not is_tag_valid(tag_name) or not stack or stack.pop() != tag_name:
32                    return False
33            # Handling start tags
34            elif code[i] == '<':
35                j = i + 1
36                # Find the index of the closing bracket '>'
37                i = code.find('>', j)
38                if i < 0:
39                    return False
40                tag_name = code[j:i]
41                # Check if the tag name is valid
42                if not is_tag_valid(tag_name):
43                    return False
44                # Add the tag to the stack
45                stack.append(tag_name)
46            i += 1  # Move to the next character
47        # The code is valid if the stack is empty at the end (all tags are closed)
48        return not stack
49
1class Solution {
2    public boolean isValid(String code) {
3        // Stack to hold the opening tags
4        Deque<String> stack = new ArrayDeque<>();
5      
6        for (int i = 0; i < code.length(); ++i) {
7            // Check if there is an orphan closing tag or text outside of the root tag
8            if (i > 0 && stack.isEmpty()) {
9                return false;
10            }
11          
12            // Check for CDATA sections
13            if (code.startsWith("<![CDATA[", i)) {
14                i = code.indexOf("]]>", i + 9);
15                if (i < 0) { // No closing tag for CDATA
16                    return false;
17                }
18                i += 2; // Move past the closing "]]>"
19            } else if (code.startsWith("</", i)) { // Check for closing tags
20                int j = i + 2;
21                i = code.indexOf(">", j);
22                if (i < 0) { // No closing '>'
23                    return false;
24                }
25                String tag = code.substring(j, i);
26                // Validate tag, check stack, and pop tag if it matches
27                if (!isValidTagName(tag) || stack.isEmpty() || !stack.pop().equals(tag)) {
28                    return false;
29                }
30            } else if (code.startsWith("<", i)) { // Check for opening tags
31                int j = i + 1;
32                i = code.indexOf(">", j);
33                if (i < 0) { // No closing '>'
34                    return false;
35                }
36                String tag = code.substring(j, i);
37                // Validate tag and push it to the stack
38                if (!isValidTagName(tag)) {
39                    return false;
40                }
41                stack.push(tag);
42            }
43        }
44        // True if all tags are properly closed
45        return stack.isEmpty();
46    }
47
48    // Helper method to validate tag names
49    private boolean isValidTagName(String tag) {
50        int length = tag.length();
51      
52        // Tag length must be between 1 and 9
53        if (length < 1 || length > 9) {
54            return false;
55        }
56      
57        // All characters must be uppercase letters
58        for (char ch : tag.toCharArray()) {
59            if (!Character.isUpperCase(ch)) {
60                return false;
61            }
62        }
63        return true;
64    }
65}
66
1#include <stack>
2#include <string>
3
4class Solution {
5public:
6    // Function to check if the input XML code is valid
7    bool isValid(std::string code) {
8        std::stack<std::string> tags;
9      
10        for (int i = 0; i < code.size(); ++i) {
11            // If not at the start and tags are empty, it means there's a mismatch
12            if (i > 0 && tags.empty()) return false;
13          
14            // Check for CDATA section
15            if (code.substr(i, 9) == "<![CDATA[") {
16                int cdataEnd = code.find("]]>", i + 9);
17                if (cdataEnd < 0) return false; // CDATA not properly closed
18                i = cdataEnd + 2; // Move index to the end of CDATA
19            }
20            // Check for closing tag
21            else if (code.substr(i, 2) == "</") {
22                int tagStart = i + 2;
23                int tagEnd = code.find('>', tagStart);
24                if (tagEnd < 0) return false; // Closing tag not properly closed
25              
26                std::string tag = code.substr(tagStart, tagEnd - tagStart);
27                if (!isTagValid(tag) || tags.empty() || tags.top() != tag) return false;
28                tags.pop();
29            }
30            // Check for opening tag
31            else if (code.substr(i, 1) == "<") {
32                int tagStart = i + 1;
33                int tagEnd = code.find('>', tagStart);
34                if (tagEnd < 0) return false; // Opening tag not properly closed
35              
36                std::string tag = code.substr(tagStart, tagEnd - tagStart);
37                if (!isTagValid(tag)) return false;
38                tags.push(tag);
39            }
40        }
41      
42        return tags.empty(); // If no tags left, XML is valid
43    }
44
45    // Helper function to validate tag name
46    bool isTagValid(std::string tag) {
47        int tagLength = tag.size();
48      
49        // Tag lengths are between 1 and 9 inclusive
50        if (tagLength < 1 || tagLength > 9) return false;
51      
52        // All characters in the tag should be uppercase
53        for (char c : tag) {
54            if (!std::isupper(c)) return false;
55        }
56      
57        return true;
58    }
59};
60
1// Function to check if given tag name is valid
2function isTagValid(tag: string): boolean {
3    const tagLength = tag.length;
4  
5    // Tag lengths are between 1 and 9 inclusive
6    if (tagLength < 1 || tagLength > 9) return false;
7  
8    // All characters in the tag should be uppercase
9    for (let c of tag) {
10        if (!c.match(/[A-Z]/)) return false;
11    }
12  
13    return true;
14}
15
16// Function to check if the input XML code is valid
17function isValid(code: string): boolean {
18    const tags: string[] = [];
19  
20    for (let i = 0; i < code.length; i++) {
21        // If not at the start and tags are empty, it means there's a mismatch
22        if (i > 0 && tags.length === 0) return false;
23      
24        // Check for CDATA section
25        if (code.startsWith("<![CDATA[", i)) {
26            const cdataEnd = code.indexOf("]]>", i + 9);
27            if (cdataEnd < 0) return false; // CDATA not properly closed
28            i = cdataEnd + 2; // Move index to the end of CDATA
29        }
30        // Check for closing tag
31        else if (code.startsWith("</", i)) {
32            const tagStart = i + 2;
33            const tagEnd = code.indexOf('>', tagStart);
34            if (tagEnd < 0) return false; // Closing tag not properly closed
35          
36            const tag = code.substring(tagStart, tagEnd);
37            if (!isTagValid(tag) || tags.length === 0 || tags[tags.length - 1] !== tag) return false;
38            tags.pop();
39        }
40        // Check for opening tag
41        else if (code[i] === "<") {
42            const tagStart = i + 1;
43            const tagEnd = code.indexOf('>', tagStart);
44            if (tagEnd < 0) return false; // Opening tag not properly closed
45          
46            const tag = code.substring(tagStart, tagEnd);
47            if (!isTagValid(tag)) return false;
48            tags.push(tag);
49        }
50    }
51  
52    return tags.length === 0; // If no tags left, XML is valid
53}
54
Not Sure What to Study? Take the 2-min Quiz:

How many times is a tree node visited in a depth first search?

Time and Space Complexity

Time Complexity

The time complexity of the code is primarily determined by the while loop that iterates over every character in the code string. The number of iterations is bounded by n, the length of the string.

  1. The while loop runs in O(n) time since it processes each character in the input string exactly once. Incrementing i at each step guarantees linear progression through the string.

  2. The code.find function is called within loop iterations, but most instances are bounded by the length of the specific section being processed, such as the tag name or CDATA section. In the worst-case scenario, the code.find function for CDATA could be O(n) if all remaining characters in the string make up part of a CDATA section.

  3. The check function is called multiple times and runs in O(k) time, where k is the length of the tag, which is capped at 9, therefore it's a constant time operation O(1).

By accounting for the nested operations, the time complexity of the code is O(n), where n is the length of the input string code.

Space Complexity

For space complexity, the primary consideration is the stack stk:

  1. In the worst case, every tag in the string requires an entry in the stack, so the space used by the stack could grow proportional to the number of tags, which is bounded by n/2 since a valid tag requires at least two characters. Therefore, the stack’s size is O(n) in the worst-case scenario.

  2. Auxiliary space from other variables and the substring operations within the find methods are either constant or result in strings whose lengths are limited by the lengths of tag names and are not dependent on n.

Hence, the space complexity of the code is O(n).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following array represent a max heap?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫