811. Subdomain Visit Count

MediumArrayHash TableStringCounting
Leetcode Link

Problem Description

In this problem, we're working with domain visit counts. A domain may have multiple subdomains, and the count given for a subdomain implies that all of its parent domains were visited the same number of times. A "count-paired domain" will be specified in the format rep d1.d2.d3, where rep represents the number of times the domain d1.d2.d3 was visited.

The task is to analyze an array of these count-paired domains, and output an array of count-paired domains that represent the counts of all possible subdomains. Let's take the example given 9001 discuss.leetcode.com. The count 9001 applies not only to discuss.leetcode.com, but also to its parent domains leetcode.com and com. Therefore, the output should reflect the count for all of these domains.

Intuition

The intuition behind the solution is to break down the given domains into all possible subdomains while keeping track of their visit counts. We'll iterate through each count-paired domain given in the array and parse it to extract the visit count and the domain itself. Once we have those, we'll increment the count for that domain and all of its subdomains.

To achieve this, we can use a data structure like a Python Counter to map domains to their visit counts. As we encounter each domain and its subdomains, we add the visit count to the total already stored in the Counter for that domain or subdomain.

Let's break this down further:

  1. We traverse the given list of count-paired domains.
  2. For each count-paired domain, we identify the visit count and the full domain.
  3. We iterate through the characters in the full domain and whenever we encounter a dot('.') or a space (' '), we recognize a subdomain.
  4. We update the Counter with the visit count for the domain or subdomain found.
  5. After processing all count-paired domains, we'll have a Counter that contains all the domains and their respective total visit counts.
  6. The final step is to format the output as requested, which would be to convert the domain and counts back into the rep d1.d2.d3 format for the result array.

By using this approach, we make sure that all the subdomains are accounted for, and we collect the total visit counts in a straightforward and efficient manner.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

How does quick sort divide the problem into subproblems?

Solution Approach

The solution uses the Counter data structure from Python's collections module, which is a subclass of dict. It's specifically designed to count objects and is an ideal choice for this problem since we need to count visits for each domain and its subdomains.

Here's the step-by-step approach of the algorithm using the Counter:

  1. Initialize a Counter instance (named cnt in the code) to keep track of the count of domains.
  2. Iterate over the list of cpdomains:
    • For each domain string, find the first space character which delimits the visit count and the domain.
    • Extract the visit count and convert it to an integer (v).
    • Iterate through each character of the domain:
      • When a dot '.' or space ' ' is encountered, it marks the beginning of a (sub)domain.
      • Update the Counter by adding the visit count v to the current (sub)domain.
      • This is done using the notation cnt[s[i + 1 :]] += v, which incrementally adds the count to the existing value for the subdomain string s[i + 1 :]. The slicing s[i + 1 :] creates substrings representing the current domain and its subsequent subdomains each time a dot or space is encountered.
  3. After the loop, cnt contains all the (sub)domains and their respective total counts.
  4. Finally, the solution prepares the output by formatting the Counter items into a list of strings (f'{v} {s}'), each corresponding to a "count-paired domain".

The algorithm is efficient for a couple of reasons. First, iterating through the cpdomains array takes O(N) time, where N is the total number of domains and subdomains. Second, updating the Counter takes constant time O(1) for each subdomain because dictionaries in Python have an average-case time complexity of O(1) for updates. Lastly, the Counter nicely organizes our counts and domains, making the final output generation step simple.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

A person thinks of a number between 1 and 1000. You may ask any number questions to them, provided that the question can be answered with either "yes" or "no".

What is the minimum number of questions you needed to ask so that you are guaranteed to know the number that the person is thinking?

Example Walkthrough

Let's consider a simple example with the following input array: ["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]. We need to output the counts for all domains and subdomains.

Follow these steps:

  1. Initialize a Counter instance to keep track of domain counts.
  2. Start with the first domain string "900 google.mail.com". Split the string at the space to get the count 900 and the domain google.mail.com.
  3. Begin parsing the domain:
    • Since google.mail.com has no spaces, add "google.mail.com": 900 to the Counter.
    • Encounter the first dot, yielding the subdomain mail.com. Add "mail.com": 900 to the Counter.
    • Encounter the second dot, yielding the subdomain com. Add "com": 900 to the Counter.
  4. Move to the second domain string "50 yahoo.com":
    • Split and obtain count 50 and domain yahoo.com.
    • Add "yahoo.com": 50 to the Counter.
    • Add "com": 50 to the existing count of com in the Counter, now com has a total of 950.
  5. Process "1 intel.mail.com" similarly, resulting in:
    • "intel.mail.com": 1 added to the Counter.
    • "mail.com": 1 is added to existing mail.com count, now mail.com has 901.
    • "com": 1 is added to existing com count, now com has 951.
  6. Lastly, "5 wiki.org":
    • Split into count 5 and domain wiki.org.
    • Add "wiki.org": 5 to the Counter.
    • Add "org": 5 to the Counter, as it's the first time we encounter org.

The Counter now has the correct total counts for all domains and subdomains. Here's what the final Counter looks like:

  • {"google.mail.com": 900, "mail.com": 901, "com": 951, "yahoo.com": 50, "wiki.org": 5, "org": 5, "intel.mail.com": 1}

The next step is to format this output according to the rep d1.d2.d3 format. Iterate through the Counter and construct the result strings:

  • "900 google.mail.com"
  • "901 mail.com"
  • "951 com"
  • "50 yahoo.com"
  • "5 wiki.org"
  • "5 org"
  • "1 intel.mail.com"

The final output is the array: ["900 google.mail.com", "901 mail.com", "951 com", "50 yahoo.com", "5 wiki.org", "5 org", "1 intel.mail.com"]. This array represents the visit counts for each domain and subdomain.

Solution Implementation

1from collections import Counter
2
3class Solution:
4    def subdomainVisits(self, cpdomains: List[str]) -> List[str]:
5        # Create a Counter object to keep track of the domain visit counts
6        domain_visits_count = Counter()
7      
8        # Iterate through the list of domain visit counts
9        for cpdomain in cpdomains:
10            # Extract the visit count and the domain from the current string
11            visit_count = int(cpdomain.split(' ')[0])
12            domain = cpdomain.split(' ')[1]
13          
14            # Split the domain on '.' to find all subdomains
15            subdomains = domain.split('.')
16          
17            # For each subdomain, build the subdomain string and update the counter
18            for i in range(len(subdomains)):
19                # Join the subdomain parts to form the subdomain to count visits
20                subdomain = '.'.join(subdomains[i:])
21                domain_visits_count[subdomain] += visit_count
22      
23        # Format the results as specified in the problem statement
24        result = [f"{count} {domain}" for domain, count in domain_visits_count.items()]
25      
26        # Return the formatted list of domain visit counts
27        return result
28
1class Solution {
2  
3    public List<String> subdomainVisits(String[] cpdomains) {
4        // Create a map to store subdomain visit counts.
5        Map<String, Integer> visitCounts = new HashMap<>();
6      
7        // Iterate over each combined domain visit count entry.
8        for (String domainInfo : cpdomains) {
9            // Find the index of the first space to separate number of visits from the domain.
10            int spaceIndex = domainInfo.indexOf(" ");
11            // Parse the number of visits using the substring before the space.
12            int visitCount = Integer.parseInt(domainInfo.substring(0, spaceIndex));
13          
14            // Start iterating characters at the space to check for subdomains.
15            for (int i = spaceIndex; i < domainInfo.length(); ++i) {
16                // Check for boundaries of subdomains.
17                if (domainInfo.charAt(i) == ' ' || domainInfo.charAt(i) == '.') {
18                    // Extract the subdomain using substring.
19                    String subdomain = domainInfo.substring(i + 1);
20                    // Update the visit count for the current subdomain.
21                    visitCounts.put(subdomain, visitCounts.getOrDefault(subdomain, 0) + visitCount);
22                }
23            }
24        }
25      
26        // Prepare the answer list containing formatted results.
27        List<String> results = new ArrayList<>();
28        // Iterate through the entry set of the visitCounts map.
29        for (Map.Entry<String, Integer> entry : visitCounts.entrySet()) {
30            // Format the entry as "visit_count domain" and add it to the results list.
31            results.add(entry.getValue() + " " + entry.getKey());
32        }
33      
34        return results;
35    }
36}
37
1#include <vector>
2#include <string>
3#include <unordered_map>
4
5class Solution {
6public:
7    // Function to process a vector of cpdomains to count the visits and split them by subdomains
8    vector<string> subdomainVisits(vector<string>& cpdomains) {
9        // Create a hashmap to store the counts of domain visits
10        unordered_map<string, int> domainVisitCounts;
11      
12        // Loop through each combined domain visit count and domain string
13        for (const auto& cpdomain : cpdomains) {
14            // Find the space character to separate the count from the domain
15            size_t spaceIndex = cpdomain.find(' ');
16
17            // Convert the count portion of the string to an integer
18            int visitCount = stoi(cpdomain.substr(0, spaceIndex));
19
20            // Loop through the domain portion of the string
21            for (size_t i = spaceIndex; i < cpdomain.size(); ++i) {
22                // Check for a space or a dot character to identify subdomains
23                if (cpdomain[i] == ' ' || cpdomain[i] == '.') {
24                    // Add the visit count to the corresponding subdomain in the hashmap
25                    domainVisitCounts[cpdomain.substr(i + 1)] += visitCount;
26                }
27            }
28        }
29      
30        // Prepare the answer vector to hold the results in the required format
31        vector<string> results;
32
33        // Traverse the hashmap and format the output accordingly
34        for (const auto& [domain, count] : domainVisitCounts) {
35            // Concatenate the count and the domain with a space separator
36            results.push_back(to_string(count) + " " + domain);
37        }
38
39        // Return the formatted list of visit counts per subdomain
40        return results;
41    }
42};
43
1// Import statements are not required in TypeScript for basic types like Array and String
2
3// A function to process an array of cpdomains to count the visits and split them by subdomains
4function subdomainVisits(cpdomains: string[]): string[] {
5    // Create a map to store the counts of domain visits
6    const domainVisitCounts: Record<string, number> = {};
7  
8    // Loop through each combined domain visit count and domain string
9    cpdomains.forEach(cpdomain => {
10        // Find the space character to separate the count from the domain
11        const spaceIndex = cpdomain.indexOf(' ');
12
13        // Convert the count portion of the string to an integer
14        const visitCount = parseInt(cpdomain.substring(0, spaceIndex), 10);
15
16        // Get the full domain string from the cpdomain
17        // Need to add 1 to the space index to skip the space character
18        let domain = cpdomain.substring(spaceIndex + 1);
19
20        // Process all subdomains, including the full domain itself
21        while (domain) {
22            // Add the visit count to the corresponding subdomain in the map
23            domainVisitCounts[domain] = (domainVisitCounts[domain] || 0) + visitCount;
24
25            // Find the next dot character to move to the higher-level domain
26            const dotIndex = domain.indexOf('.');
27
28            // If no dot is found, or it is the last position, we stop
29            if (dotIndex < 0) break;
30
31            // Update the domain to be the string after the dot
32            domain = domain.substring(dotIndex + 1);
33        }
34    });
35
36    // Prepare the answer array to hold the results in the required format
37    const results: string[] = [];
38
39    // Traverse the map and format the output accordingly
40    for (const domain in domainVisitCounts) {
41        const count = domainVisitCounts[domain];
42        // Concatenate the count and the domain with a space separator
43        results.push(`${count} ${domain}`);
44    }
45
46    // Return the formatted list of visit counts per subdomain
47    return results;
48}
49
Not Sure What to Study? Take the 2-min Quiz:

What's the output of running the following function using the following tree as input?

1def serialize(root):
2    res = []
3    def dfs(root):
4        if not root:
5            res.append('x')
6            return
7        res.append(root.val)
8        dfs(root.left)
9        dfs(root.right)
10    dfs(root)
11    return ' '.join(res)
12
1import java.util.StringJoiner;
2
3public static String serialize(Node root) {
4    StringJoiner res = new StringJoiner(" ");
5    serializeDFS(root, res);
6    return res.toString();
7}
8
9private static void serializeDFS(Node root, StringJoiner result) {
10    if (root == null) {
11        result.add("x");
12        return;
13    }
14    result.add(Integer.toString(root.val));
15    serializeDFS(root.left, result);
16    serializeDFS(root.right, result);
17}
18
1function serialize(root) {
2    let res = [];
3    serialize_dfs(root, res);
4    return res.join(" ");
5}
6
7function serialize_dfs(root, res) {
8    if (!root) {
9        res.push("x");
10        return;
11    }
12    res.push(root.val);
13    serialize_dfs(root.left, res);
14    serialize_dfs(root.right, res);
15}
16

Time and Space Complexity

Time Complexity

The time complexity of the code is determined by several factors:

  1. Iterating through each domain in cpdomains, which gives us O(n) where n is the number of domains.
  2. Inside the loop, the index() method is called to find the first space, which operates in O(m), where m is the length of the string s.
  3. The inner loop enumerates over each character in the domain string s which is O(m) in the worst case.
  4. The slicing of strings s[i + 1 :] is done in each loop iteration, which, in Python, is also O(m).

The second to the fourth steps occur within the loop of the first step, so the total time complexity is O(n * m) since we have to assume that each domain could be of variable length, and we are bound by the length of both the domains and the list itself.

Space Complexity

For space complexity, we consider the memory allocations:

  1. The Counter object cnt storage grows with the number of distinct subdomains found, in the worst case this may contain all the subdomains which would be O(d) where d is the total number of distinct subdomains.
  2. The created list for the final output will have as many elements as there are entries in cnt which is also O(d).

The space complexity, therefore, is O(d), which is determined by the number of unique subdomains.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which data structure is used to implement priority queue?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫