811. Subdomain Visit Count
Problem Description
In this problem, we're working with domain visit counts. A domain may have multiple subdomains, and the count given for a subdomain implies that all of its parent domains were visited the same number of times. A "count-paired domain" will be specified in the format rep d1.d2.d3
, where rep
represents the number of times the domain d1.d2.d3
was visited.
The task is to analyze an array of these count-paired domains, and output an array of count-paired domains that represent the counts of all possible subdomains. Let's take the example given 9001 discuss.leetcode.com
. The count 9001
applies not only to discuss.leetcode.com
, but also to its parent domains leetcode.com
and com
. Therefore, the output should reflect the count for all of these domains.
Intuition
The intuition behind the solution is to break down the given domains into all possible subdomains while keeping track of their visit counts. We'll iterate through each count-paired domain given in the array and parse it to extract the visit count and the domain itself. Once we have those, we'll increment the count for that domain and all of its subdomains.
To achieve this, we can use a data structure like a Python Counter to map domains to their visit counts. As we encounter each domain and its subdomains, we add the visit count to the total already stored in the Counter for that domain or subdomain.
Let's break this down further:
- We traverse the given list of count-paired domains.
- For each count-paired domain, we identify the visit count and the full domain.
- We iterate through the characters in the full domain and whenever we encounter a dot('.') or a space (' '), we recognize a subdomain.
- We update the Counter with the visit count for the domain or subdomain found.
- After processing all count-paired domains, we'll have a Counter that contains all the domains and their respective total visit counts.
- The final step is to format the output as requested, which would be to convert the domain and counts back into the
rep d1.d2.d3
format for the result array.
By using this approach, we make sure that all the subdomains are accounted for, and we collect the total visit counts in a straightforward and efficient manner.
Solution Approach
The solution uses the Counter
data structure from Python's collections
module, which is a subclass of dict
. It's specifically designed to count objects and is an ideal choice for this problem since we need to count visits for each domain and its subdomains.
Here's the step-by-step approach of the algorithm using the Counter
:
- Initialize a
Counter
instance (namedcnt
in the code) to keep track of the count of domains. - Iterate over the list of
cpdomains
:- For each domain string, find the first space character which delimits the visit count and the domain.
- Extract the visit count and convert it to an integer (
v
). - Iterate through each character of the domain:
- When a dot
'.'
or space' '
is encountered, it marks the beginning of a (sub)domain. - Update the
Counter
by adding the visit countv
to the current (sub)domain. - This is done using the notation
cnt[s[i + 1 :]] += v
, which incrementally adds the count to the existing value for the subdomain strings[i + 1 :]
. The slicings[i + 1 :]
creates substrings representing the current domain and its subsequent subdomains each time a dot or space is encountered.
- When a dot
- After the loop,
cnt
contains all the (sub)domains and their respective total counts. - Finally, the solution prepares the output by formatting the
Counter
items into a list of strings (f'{v} {s}'
), each corresponding to a "count-paired domain".
The algorithm is efficient for a couple of reasons. First, iterating through the cpdomains array takes O(N)
time, where N
is the total number of domains and subdomains. Second, updating the Counter
takes constant time O(1)
for each subdomain because dictionaries in Python have an average-case time complexity of O(1)
for updates. Lastly, the Counter
nicely organizes our counts and domains, making the final output generation step simple.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's consider a simple example with the following input array: ["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]
. We need to output the counts for all domains and subdomains.
Follow these steps:
- Initialize a Counter instance to keep track of domain counts.
- Start with the first domain string
"900 google.mail.com"
. Split the string at the space to get the count900
and the domaingoogle.mail.com
. - Begin parsing the domain:
- Since
google.mail.com
has no spaces, add"google.mail.com": 900
to the Counter. - Encounter the first dot, yielding the subdomain
mail.com
. Add"mail.com": 900
to the Counter. - Encounter the second dot, yielding the subdomain
com
. Add"com": 900
to the Counter.
- Since
- Move to the second domain string
"50 yahoo.com"
:- Split and obtain count
50
and domainyahoo.com
. - Add
"yahoo.com": 50
to the Counter. - Add
"com": 50
to the existing count ofcom
in the Counter, nowcom
has a total of950
.
- Split and obtain count
- Process
"1 intel.mail.com"
similarly, resulting in:"intel.mail.com": 1
added to the Counter."mail.com": 1
is added to existingmail.com
count, nowmail.com
has901
."com": 1
is added to existingcom
count, nowcom
has951
.
- Lastly,
"5 wiki.org"
:- Split into count
5
and domainwiki.org
. - Add
"wiki.org": 5
to the Counter. - Add
"org": 5
to the Counter, as it's the first time we encounterorg
.
- Split into count
The Counter now has the correct total counts for all domains and subdomains. Here's what the final Counter looks like:
{"google.mail.com": 900, "mail.com": 901, "com": 951, "yahoo.com": 50, "wiki.org": 5, "org": 5, "intel.mail.com": 1}
The next step is to format this output according to the rep d1.d2.d3
format. Iterate through the Counter and construct the result strings:
- "900 google.mail.com"
- "901 mail.com"
- "951 com"
- "50 yahoo.com"
- "5 wiki.org"
- "5 org"
- "1 intel.mail.com"
The final output is the array: ["900 google.mail.com", "901 mail.com", "951 com", "50 yahoo.com", "5 wiki.org", "5 org", "1 intel.mail.com"]
. This array represents the visit counts for each domain and subdomain.
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def subdomainVisits(self, cpdomains: List[str]) -> List[str]:
5 # Create a Counter object to keep track of the domain visit counts
6 domain_visits_count = Counter()
7
8 # Iterate through the list of domain visit counts
9 for cpdomain in cpdomains:
10 # Extract the visit count and the domain from the current string
11 visit_count = int(cpdomain.split(' ')[0])
12 domain = cpdomain.split(' ')[1]
13
14 # Split the domain on '.' to find all subdomains
15 subdomains = domain.split('.')
16
17 # For each subdomain, build the subdomain string and update the counter
18 for i in range(len(subdomains)):
19 # Join the subdomain parts to form the subdomain to count visits
20 subdomain = '.'.join(subdomains[i:])
21 domain_visits_count[subdomain] += visit_count
22
23 # Format the results as specified in the problem statement
24 result = [f"{count} {domain}" for domain, count in domain_visits_count.items()]
25
26 # Return the formatted list of domain visit counts
27 return result
28
1class Solution {
2
3 public List<String> subdomainVisits(String[] cpdomains) {
4 // Create a map to store subdomain visit counts.
5 Map<String, Integer> visitCounts = new HashMap<>();
6
7 // Iterate over each combined domain visit count entry.
8 for (String domainInfo : cpdomains) {
9 // Find the index of the first space to separate number of visits from the domain.
10 int spaceIndex = domainInfo.indexOf(" ");
11 // Parse the number of visits using the substring before the space.
12 int visitCount = Integer.parseInt(domainInfo.substring(0, spaceIndex));
13
14 // Start iterating characters at the space to check for subdomains.
15 for (int i = spaceIndex; i < domainInfo.length(); ++i) {
16 // Check for boundaries of subdomains.
17 if (domainInfo.charAt(i) == ' ' || domainInfo.charAt(i) == '.') {
18 // Extract the subdomain using substring.
19 String subdomain = domainInfo.substring(i + 1);
20 // Update the visit count for the current subdomain.
21 visitCounts.put(subdomain, visitCounts.getOrDefault(subdomain, 0) + visitCount);
22 }
23 }
24 }
25
26 // Prepare the answer list containing formatted results.
27 List<String> results = new ArrayList<>();
28 // Iterate through the entry set of the visitCounts map.
29 for (Map.Entry<String, Integer> entry : visitCounts.entrySet()) {
30 // Format the entry as "visit_count domain" and add it to the results list.
31 results.add(entry.getValue() + " " + entry.getKey());
32 }
33
34 return results;
35 }
36}
37
1#include <vector>
2#include <string>
3#include <unordered_map>
4
5class Solution {
6public:
7 // Function to process a vector of cpdomains to count the visits and split them by subdomains
8 vector<string> subdomainVisits(vector<string>& cpdomains) {
9 // Create a hashmap to store the counts of domain visits
10 unordered_map<string, int> domainVisitCounts;
11
12 // Loop through each combined domain visit count and domain string
13 for (const auto& cpdomain : cpdomains) {
14 // Find the space character to separate the count from the domain
15 size_t spaceIndex = cpdomain.find(' ');
16
17 // Convert the count portion of the string to an integer
18 int visitCount = stoi(cpdomain.substr(0, spaceIndex));
19
20 // Loop through the domain portion of the string
21 for (size_t i = spaceIndex; i < cpdomain.size(); ++i) {
22 // Check for a space or a dot character to identify subdomains
23 if (cpdomain[i] == ' ' || cpdomain[i] == '.') {
24 // Add the visit count to the corresponding subdomain in the hashmap
25 domainVisitCounts[cpdomain.substr(i + 1)] += visitCount;
26 }
27 }
28 }
29
30 // Prepare the answer vector to hold the results in the required format
31 vector<string> results;
32
33 // Traverse the hashmap and format the output accordingly
34 for (const auto& [domain, count] : domainVisitCounts) {
35 // Concatenate the count and the domain with a space separator
36 results.push_back(to_string(count) + " " + domain);
37 }
38
39 // Return the formatted list of visit counts per subdomain
40 return results;
41 }
42};
43
1// Import statements are not required in TypeScript for basic types like Array and String
2
3// A function to process an array of cpdomains to count the visits and split them by subdomains
4function subdomainVisits(cpdomains: string[]): string[] {
5 // Create a map to store the counts of domain visits
6 const domainVisitCounts: Record<string, number> = {};
7
8 // Loop through each combined domain visit count and domain string
9 cpdomains.forEach(cpdomain => {
10 // Find the space character to separate the count from the domain
11 const spaceIndex = cpdomain.indexOf(' ');
12
13 // Convert the count portion of the string to an integer
14 const visitCount = parseInt(cpdomain.substring(0, spaceIndex), 10);
15
16 // Get the full domain string from the cpdomain
17 // Need to add 1 to the space index to skip the space character
18 let domain = cpdomain.substring(spaceIndex + 1);
19
20 // Process all subdomains, including the full domain itself
21 while (domain) {
22 // Add the visit count to the corresponding subdomain in the map
23 domainVisitCounts[domain] = (domainVisitCounts[domain] || 0) + visitCount;
24
25 // Find the next dot character to move to the higher-level domain
26 const dotIndex = domain.indexOf('.');
27
28 // If no dot is found, or it is the last position, we stop
29 if (dotIndex < 0) break;
30
31 // Update the domain to be the string after the dot
32 domain = domain.substring(dotIndex + 1);
33 }
34 });
35
36 // Prepare the answer array to hold the results in the required format
37 const results: string[] = [];
38
39 // Traverse the map and format the output accordingly
40 for (const domain in domainVisitCounts) {
41 const count = domainVisitCounts[domain];
42 // Concatenate the count and the domain with a space separator
43 results.push(`${count} ${domain}`);
44 }
45
46 // Return the formatted list of visit counts per subdomain
47 return results;
48}
49
Time and Space Complexity
Time Complexity
The time complexity of the code is determined by several factors:
- Iterating through each domain in
cpdomains
, which gives usO(n)
wheren
is the number of domains. - Inside the loop, the
index()
method is called to find the first space, which operates inO(m)
, wherem
is the length of the strings
. - The inner loop enumerates over each character in the domain string
s
which isO(m)
in the worst case. - The slicing of strings
s[i + 1 :]
is done in each loop iteration, which, in Python, is alsoO(m)
.
The second to the fourth steps occur within the loop of the first step, so the total time complexity is O(n * m)
since we have to assume that each domain could be of variable length, and we are bound by the length of both the domains and the list itself.
Space Complexity
For space complexity, we consider the memory allocations:
- The Counter object
cnt
storage grows with the number of distinct subdomains found, in the worst case this may contain all the subdomains which would beO(d)
whered
is the total number of distinct subdomains. - The created list for the final output will have as many elements as there are entries in
cnt
which is alsoO(d)
.
The space complexity, therefore, is O(d)
, which is determined by the number of unique subdomains.
Learn more about how to find time and space complexity quickly using problem constraints.
Which data structure is used in a depth first search?
Recommended Readings
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.