Facebook Pixel

811. Subdomain Visit Count

MediumArrayHash TableStringCounting
Leetcode Link

Problem Description

You are given a list of count-paired domains, where each entry represents how many times a particular domain was visited. When a domain is visited, all of its parent domains are also implicitly visited with the same count.

A count-paired domain is formatted as "count domain", where:

  • count is an integer representing the number of visits
  • domain is a string like "discuss.leetcode.com" or "leetcode.com"

For example, if "9001 discuss.leetcode.com" is given:

  • "discuss.leetcode.com" was visited 9001 times
  • "leetcode.com" was also visited 9001 times (parent domain)
  • "com" was also visited 9001 times (top-level domain)

Given an array cpdomains containing count-paired domains, you need to return an array that shows the total visit count for each domain and subdomain across all inputs. The output format should also be count-paired domains in the format "count domain".

Example breakdown:

  • Input: ["9001 discuss.leetcode.com"]
  • Output should include:
    • "9001 discuss.leetcode.com"
    • "9001 leetcode.com"
    • "9001 com"

If multiple entries contribute to the same domain, their counts should be summed. The order of the output doesn't matter.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that we need to extract all possible subdomains from each given domain and accumulate their visit counts. When we see a domain like "discuss.leetcode.com", we need to identify all its parent domains by looking at the dot separators.

Think of a domain as a hierarchy: the rightmost part ("com") is the top level, and as we add more parts to the left, we get more specific subdomains. We can find all parent domains by locating each dot . in the string and taking the substring from that position onwards.

For parsing, we notice that each count-paired domain has a space separating the count from the domain. After extracting the count, we need to find all valid subdomains. The trick is to iterate through the string and whenever we encounter a space or a dot ., we know that what comes after it is a valid domain or subdomain.

For example, with "9001 discuss.leetcode.com":

  • After the space: "discuss.leetcode.com"
  • After the first dot: "leetcode.com"
  • After the second dot: "com"

We use a hash map (Counter in Python) to accumulate counts because the same domain might appear in multiple input entries. This allows us to efficiently sum up visits for each unique domain.

The solution cleverly combines the initial space and the dots into a single check if c in ' .' to identify all positions where a valid domain starts (at index i + 1). This unified approach elegantly handles both the full domain after the space and all subdomains after each dot.

Solution Approach

The solution uses a hash map (Counter) to accumulate visit counts for all domains and subdomains. Here's the step-by-step implementation:

  1. Initialize a Counter: Create a Counter object cnt to store the accumulated visit counts for each domain.

  2. Process each count-paired domain:

    • For each string s in cpdomains, extract the visit count by finding the substring before the first space: v = int(s[:s.index(' ')])
  3. Extract all subdomains:

    • Iterate through each character in the string s with its index
    • Check if the current character is either a space ' ' or a dot '.'
    • When we find a space or dot at position i, the substring starting from i + 1 to the end represents a valid domain or subdomain
    • Add the visit count v to this domain in our counter: cnt[s[i + 1:]] += v
  4. Format the output:

    • Convert the counter entries back to the required format
    • For each domain s with count v in the counter, create a string f'{v} {s}'
    • Return a list of all these formatted strings

Example walkthrough with "9001 discuss.leetcode.com":

  • Extract count: v = 9001
  • At index 4 (space): s[5:] = "discuss.leetcode.com", add 9001 visits
  • At index 12 (first dot): s[13:] = "leetcode.com", add 9001 visits
  • At index 21 (second dot): s[22:] = "com", add 9001 visits

The algorithm efficiently processes each input string once with O(n) time complexity per string, where n is the length of the string. The space complexity is O(m) where m is the total number of unique domains across all inputs.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example with input: ["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]

Step 1: Initialize Counter

  • Create an empty counter: cnt = {}

Step 2: Process "900 google.mail.com"

  • Extract count: v = 900 (substring before space)
  • Iterate through string:
    • Index 3: character is space ' ' → extract "google.mail.com"cnt["google.mail.com"] = 900
    • Index 10: character is dot '.' → extract "mail.com"cnt["mail.com"] = 900
    • Index 15: character is dot '.' → extract "com"cnt["com"] = 900
  • Counter state: {"google.mail.com": 900, "mail.com": 900, "com": 900}

Step 3: Process "50 yahoo.com"

  • Extract count: v = 50
  • Iterate through string:
    • Index 2: character is space ' ' → extract "yahoo.com"cnt["yahoo.com"] = 50
    • Index 8: character is dot '.' → extract "com"cnt["com"] = 900 + 50 = 950
  • Counter state: {"google.mail.com": 900, "mail.com": 900, "com": 950, "yahoo.com": 50}

Step 4: Process "1 intel.mail.com"

  • Extract count: v = 1
  • Iterate through string:
    • Index 1: character is space ' ' → extract "intel.mail.com"cnt["intel.mail.com"] = 1
    • Index 7: character is dot '.' → extract "mail.com"cnt["mail.com"] = 900 + 1 = 901
    • Index 12: character is dot '.' → extract "com"cnt["com"] = 950 + 1 = 951
  • Counter state: {"google.mail.com": 900, "mail.com": 901, "com": 951, "yahoo.com": 50, "intel.mail.com": 1}

Step 5: Process "5 wiki.org"

  • Extract count: v = 5
  • Iterate through string:
    • Index 1: character is space ' ' → extract "wiki.org"cnt["wiki.org"] = 5
    • Index 6: character is dot '.' → extract "org"cnt["org"] = 5
  • Final counter: {"google.mail.com": 900, "mail.com": 901, "com": 951, "yahoo.com": 50, "intel.mail.com": 1, "wiki.org": 5, "org": 5}

Step 6: Format output Convert each counter entry to "count domain" format:

  • Output: ["900 google.mail.com", "901 mail.com", "951 com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org", "5 org"]

The key insight is how the algorithm accumulates counts when the same subdomain appears in multiple inputs (like "mail.com" appearing in both "google.mail.com" and "intel.mail.com").

Solution Implementation

1from collections import Counter
2from typing import List
3
4class Solution:
5    def subdomainVisits(self, cpdomains: List[str]) -> List[str]:
6        # Counter to store the visit count for each domain/subdomain
7        domain_counter = Counter()
8      
9        # Process each count-paired domain
10        for count_paired_domain in cpdomains:
11            # Extract the visit count from the beginning of the string
12            space_index = count_paired_domain.index(' ')
13            visit_count = int(count_paired_domain[:space_index])
14          
15            # Iterate through each character to find domain separators
16            for index, char in enumerate(count_paired_domain):
17                # When we find a space or dot, the substring after it is a valid domain
18                if char in ' .':
19                    # Extract the domain starting from the character after the separator
20                    domain = count_paired_domain[index + 1:]
21                    # Add the visit count to this domain
22                    domain_counter[domain] += visit_count
23      
24        # Format the results as "count domain" strings
25        result = []
26        for domain, count in domain_counter.items():
27            result.append(f'{count} {domain}')
28      
29        return result
30
1class Solution {
2    public List<String> subdomainVisits(String[] cpdomains) {
3        // Map to store subdomain visit counts
4        Map<String, Integer> visitCounts = new HashMap<>();
5      
6        // Process each count-paired domain
7        for (String countPairedDomain : cpdomains) {
8            // Find the space separator between count and domain
9            int spaceIndex = countPairedDomain.indexOf(" ");
10          
11            // Extract the visit count from the string
12            int visitCount = Integer.parseInt(countPairedDomain.substring(0, spaceIndex));
13          
14            // Iterate through the string to find all subdomains
15            for (int i = spaceIndex; i < countPairedDomain.length(); i++) {
16                // Check if current character is a space or dot (subdomain separator)
17                if (countPairedDomain.charAt(i) == ' ' || countPairedDomain.charAt(i) == '.') {
18                    // Extract subdomain starting from the character after space/dot
19                    String subdomain = countPairedDomain.substring(i + 1);
20                  
21                    // Update the visit count for this subdomain
22                    visitCounts.put(subdomain, visitCounts.getOrDefault(subdomain, 0) + visitCount);
23                }
24            }
25        }
26      
27        // Build the result list with formatted count-paired domains
28        List<String> result = new ArrayList<>();
29        for (Map.Entry<String, Integer> entry : visitCounts.entrySet()) {
30            // Format: "count domain"
31            result.add(entry.getValue() + " " + entry.getKey());
32        }
33      
34        return result;
35    }
36}
37
1class Solution {
2public:
3    vector<string> subdomainVisits(vector<string>& cpdomains) {
4        // Hash map to store domain visit counts
5        unordered_map<string, int> domainCount;
6      
7        // Process each count-paired domain
8        for (const auto& cpdomain : cpdomains) {
9            // Find the space separator between count and domain
10            int spacePos = cpdomain.find(' ');
11          
12            // Extract the visit count from the string
13            int visitCount = stoi(cpdomain.substr(0, spacePos));
14          
15            // Iterate through the domain string to find all subdomains
16            for (int i = spacePos; i < cpdomain.size(); ++i) {
17                // When we encounter a space or dot, the substring after it is a valid domain
18                if (cpdomain[i] == ' ' || cpdomain[i] == '.') {
19                    // Extract subdomain starting from position i+1 and add visit count
20                    string subdomain = cpdomain.substr(i + 1);
21                    domainCount[subdomain] += visitCount;
22                }
23            }
24        }
25      
26        // Build the result vector with formatted strings
27        vector<string> result;
28        for (const auto& [domain, count] : domainCount) {
29            // Format: "count domain"
30            result.push_back(to_string(count) + " " + domain);
31        }
32      
33        return result;
34    }
35};
36
1function subdomainVisits(cpdomains: string[]): string[] {
2    // Hash map to store domain visit counts
3    const domainCount: Map<string, number> = new Map();
4  
5    // Process each count-paired domain
6    for (const cpdomain of cpdomains) {
7        // Find the space separator between count and domain
8        const spacePos = cpdomain.indexOf(' ');
9      
10        // Extract the visit count from the string
11        const visitCount = parseInt(cpdomain.substring(0, spacePos));
12      
13        // Iterate through the domain string to find all subdomains
14        for (let i = spacePos; i < cpdomain.length; i++) {
15            // When we encounter a space or dot, the substring after it is a valid domain
16            if (cpdomain[i] === ' ' || cpdomain[i] === '.') {
17                // Extract subdomain starting from position i+1 and add visit count
18                const subdomain = cpdomain.substring(i + 1);
19                domainCount.set(subdomain, (domainCount.get(subdomain) || 0) + visitCount);
20            }
21        }
22    }
23  
24    // Build the result array with formatted strings
25    const result: string[] = [];
26    for (const [domain, count] of domainCount) {
27        // Format: "count domain"
28        result.push(`${count} ${domain}`);
29    }
30  
31    return result;
32}
33

Time and Space Complexity

Time Complexity: O(N * M) where N is the number of domains in the input list and M is the average length of each domain string.

  • The outer loop iterates through all N domain strings in cpdomains
  • For each domain string, we extract the count value using s.index(' ') which takes O(M) time
  • The inner loop iterates through each character in the domain string, which is O(M) operations
  • For each space or dot character found, we:
    • Create a substring s[i + 1:] which takes O(M) time in the worst case
    • Update the counter with this substring, which is O(1) average case for hash table operations
  • The maximum number of subdomains for any domain is bounded by the length of the string (at most M characters means at most M/2 dots plus the space)
  • Finally, creating the result list requires iterating through all unique subdomains in the counter, which is at most O(N * M) entries
  • Overall: O(N * M) for processing all domains

Space Complexity: O(N * M) where N is the number of domains and M is the average length of each domain string.

  • The Counter object cnt stores all unique subdomains and their counts
  • In the worst case, each domain contributes multiple unique subdomains (e.g., "discuss.leetcode.com" contributes "discuss.leetcode.com", "leetcode.com", and "com")
  • The total number of unique subdomains is bounded by O(N * M) since each domain of length M can contribute at most O(M) subdomains
  • The output list also requires O(N * M) space to store all the formatted strings
  • Overall space complexity: O(N * M)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Forgetting to Process the Full Domain

A common mistake is only processing subdomains that appear after dots (.), forgetting that the initial full domain (appearing after the space) also needs to be counted.

Incorrect approach:

# This would miss counting "discuss.leetcode.com"
for i, c in enumerate(s):
    if c == '.':  # Only looking for dots
        cnt[s[i + 1:]] += v

Solution: Include both space and dot as separators in the condition:

for i, c in enumerate(s):
    if c in ' .':  # Check for both space and dot
        cnt[s[i + 1:]] += v

2. String Parsing Errors

Using split() incorrectly or making assumptions about string format can lead to errors.

Incorrect approach:

# Assumes there's always exactly one space
parts = s.split(' ')
v = int(parts[0])
domain = parts[1]  # Could fail if format is unexpected

Solution: Use index() to find the first space position for more robust parsing:

space_idx = s.index(' ')
v = int(s[:space_idx])
# Then process from this known position

3. Not Accumulating Counts

Overwriting counts instead of accumulating them when the same domain appears multiple times.

Incorrect approach:

cnt[domain] = v  # This overwrites instead of accumulating

Solution: Always use addition to accumulate counts:

cnt[domain] += v  # Correctly accumulates visits

4. Off-by-One Errors in Substring Extraction

Getting the wrong substring by not adding 1 to skip the separator character.

Incorrect approach:

if c in ' .':
    cnt[s[i:]] += v  # Includes the separator in the domain

Solution: Always use i + 1 to skip the separator:

if c in ' .':
    cnt[s[i + 1:]] += v  # Correctly skips the separator
Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

Which type of traversal does breadth first search do?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More