1519. Number of Nodes in the Sub-Tree With the Same Label

Problem Description

In this problem, we are given a tree consisting of n nodes. Each node has a unique number from 0 to n-1 and a label represented by a lowercase character from the given string labels. The tree has n - 1 edges provided in the edges array, where each element is a pair of nodes indicating there is an edge between those nodes. The goal is to count how many nodes in each node's subtree (including the node itself) have the same label as that particular node and then return the counts in an array.


To arrive at the solution for this problem, we should consider a depth-first search (DFS) strategy since we want to deal with subtrees and descendants in a tree structure. The DFS will start from the root node (node 0), explore as far as possible along each branch before backtracking, and perform the necessary counting operations.

Here's the intuition behind the provided solution:

  1. We use DFS to traverse the entire tree from the root node, exploring all its subtrees.
  2. We maintain a counter, cnt, using a Counter object to track the occurrence of each label as we move down the tree.
  3. While performing DFS, for each node, we decrement the count for its label before the DFS goes deeper into its children. This is done to keep track of how many times the label occurs in the current node's subtree.
  4. After visiting all of the children nodes, we increment the count back, thus accounting for the occurrence of the node's label in its own subtree.
  5. The counter differences before and after exploring the children will give us the number of children with the same label.
  6. Create a list g to represent the adjacency list for the graph. For each edge, we add nodes to each other's list to create this undirected graph representation.
  7. Create an array ans of size n to store the counts for each node.
  8. Call the dfs function with the root node and an invalid parent index -1 to kick-start the DFS.
  9. The updated ans array will be returned, which contains the count of nodes in the subtree of every node with the same label as that node.

Learn more about Tree, Depth-First Search and Breadth-First Search patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What is the worst case running time for finding an element in a binary search tree(not necessarily balanced) of size n?

Solution Approach

The solution uses recursion and graph traversal techniques to address the problem effectively. Here's a breakdown of the solution implementation process with the algorithms, data structures, and patterns employed:

  1. Depth-First Search (DFS): A classic DFS recursion is the crux of traversing the tree. The function dfs takes two parameters, i representing the current node, and fa representing the parent of the current node. The recursion helps us reach the leaves, count the nodes, and backtrack to the parent nodes while updating the counts.

  2. Counter Object: We use a Counter object from the collections module, named cnt. It keeps track of the count of labels encountered during the DFS traversal. It plays a pivotal role in calculating the number of nodes bearing the same label within a subtree.

  3. Adjacency List Representation: A dictionary named g is used to store the adjacency list. Here, we map each node to a list containing its neighboring nodes. This data structure allows us to represent the undirected graph for the tree efficiently.

  4. Recursive Logic: In the dfs function:

    • First, we subtract cnt[labels[i]] from ans[i] to record the initial count of the label for node i.
    • We increment cnt[labels[i]] as we've encountered node i's label.
    • We loop over each child j in the adjacency list for node i and perform a DFS call if j is not the parent (to avoid revisiting the parent).
    • After visiting all children, we add back the cnt[labels[i]] count to ans[i], which now contains the correct count for the subtree at i.
  5. Result: The ans array, which initially is set to zeros of length n, is updated with the count of nodes that have the same label as node i for each node. This array is returned as the result.

  6. Initialization and Start Point: Before starting the DFS traversal, we need to initialize the adjacency list g and the result array ans. We populate g based on the edges provided. The dfs function is called with the root node (0) and -1 as the parent since the root has no parent.

Here is the code block relevant to the above explanation, divided into initialization and recursive DFS parts for clarity:

1# Initialization
2g = defaultdict(list)
3for a, b in edges:
4    g[a].append(b)
5    g[b].append(a)
6cnt = Counter()
7ans = [0] * n
9# Recursive DFS Function
10def dfs(i, fa):
11    ans[i] -= cnt[labels[i]]
12    cnt[labels[i]] += 1
13    for j in g[i]:
14        if j != fa:
15            dfs(j, i)
16    ans[i] += cnt[labels[i]]
18# Start DFS from root node 0
19dfs(0, -1)

By maintaining a counter that is updated at each node of the DFS, we ensure that when we calculate the count for node i, we have a complete picture of the number of times its label appears within its entire subtree.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the following is a min heap?

Example Walkthrough

Consider a small example where we have a tree with n = 5 nodes. The edges are given by edges = [[0, 1], [0, 2], [1, 3], [1, 4]], and the labels string is "ababa". Using these, we need to count the number of nodes in each node's subtree with the same label as that node.

First, the tree's structure can be visualized as shown below:

1   0(a)
2   / \
3 1(a) 2(b)
4 / \
53(b) 4(a)

According to the labels string, the labels of nodes in the order from 0 to 4 are a, a, b, b, a.

We start with our solution approach:

  1. Create the adjacency list g for graph representation:
1g[0] = [1, 2]
2g[1] = [0, 3, 4]
3g[2] = [0]
4g[3] = [1]
5g[4] = [1]
  1. Initialize a Counter object, cnt, and an array ans of size n:
1cnt = Counter()
2ans = [0, 0, 0, 0, 0]
  1. We start DFS from the root node 0 with -1 as the parent:

During the first call dfs(0, -1):

  • ans[0] is decremented by cnt['a'] (which is 0 at this moment), so ans[0] remains 0.

  • cnt['a'] is incremented to 1.

  • We then call DFS on its children (nodes 1 and 2):

    For node 1:

    • Recursive call dfs(1, 0):
      • ans[1] is decremented by cnt['a'] (which is 1), so ans[1] = -1.
      • cnt['a'] is incremented to 2.
      • dfs(3, 1) is called:
        • ans[3] is decremented by cnt['b'] (0), remains 0.
        • cnt['b'] is incremented to 1.
        • DFS is called on its children (none), so no further action.
        • ans[3] is incremented by cnt['b'] (1), becomes 1.
      • dfs(4, 1) is called:
        • ans[4] is decremented by cnt['a'] (2), becomes -2.
        • cnt['a'] is incremented to 3.
        • DFS is called on its children (none), so no further action.
        • ans[4] is incremented by cnt['a'] (3), becomes 1.
      • After its children are done, ans[1] is incremented by cnt['a'] (3), becomes 2.

    For node 2:

    • Recursive call dfs(2, 0):
      • ans[2] is decremented by cnt['b'] (1), becomes -1.
      • cnt['b'] is incremented to 2.
      • DFS is called on its children (none), so no further action.
      • ans[2] is incremented by cnt['b'] (2), becomes 1.
  • After visiting all children of node 0, ans[0] is incremented by cnt['a'] (3), becomes 3.

So, after the DFS completes, we have the ans array updated with the count of the nodes that have the same label as each node in their subtrees:

1ans = [3, 2, 1, 1, 1]
Not Sure What to Study? Take the 2-min Quiz:

Given an array of 1,000,000 integers that is almost sorted, except for 2 pairs of integers. Which algorithm is fastest for sorting the array?

Python Solution

1from collections import defaultdict, Counter
2from typing import List
4class Solution:
5    def countSubTrees(self, n: int, edges: List[List[int]], labels: str) -> List[int]:
6        # Recursive function to traverse the graph and update counts
7        def dfs(node, parent):
8            # Decrement the count of the label for this node
9            label_counts[labels[node]] -= 1
10            # Traverse adjacent nodes (children)
11            for adjacent in graph[node]:
12                if adjacent != parent:
13                    dfs(adjacent, node)
14            # Increment the count of the label for this node to include itself
15            label_counts[labels[node]] += 1
16            # Update the result for this node 
17            # (Total count of the label minus the count before visiting the subtree)
18            results[node] = label_counts[labels[node]]
20        # Create a graph from the edges (adjacency list)
21        graph = defaultdict(list)
22        for a, b in edges:
23            graph[a].append(b)
24            graph[b].append(a)
26        # Counter to keep track of the label frequencies during DFS
27        label_counts = Counter()
28        # List to store the result for each node
29        results = [0] * n
30        # Call DFS starting from the root node (0) with no parent (-1)
31        dfs(0, -1)
33        # After DFS, results list contains the counts for nodes' labels as required
34        return results

Java Solution

1class Solution {
2    // Graph represented as a list of adjacency lists.
3    private List<Integer>[] graph;
4    // The labels for each node are stored in a string.
5    private String labels;
6    // Array to store the final answer for each node.
7    private int[] answer;
8    // Count array to store the frequency of each character.
9    private int[] count;
11    // Public method that initializes the class variables and starts the DFS traversal.
12    public int[] countSubTrees(int n, int[][] edges, String labels) {
13        // Initialize graph with empty lists for each node.
14        graph = new List[n];
15        Arrays.setAll(graph, x -> new ArrayList<>());
17        // Populating the adjacency list for the undirected graph.
18        for (int[] edge : edges) {
19            int from = edge[0], to = edge[1];
20            graph[from].add(to);
21            graph[to].add(from);
22        }
24        // Set the labels and prepare the answer array.
25        this.labels = labels;
26        answer = new int[n];
27        count = new int[26];
29        // Start DFS traversal from the first node.
30        dfs(0, -1);
31        return answer;
32    }
34    // Private helper method to perform DFS traversal.
35    private void dfs(int node, int parent) {
36        // Get the index of the label character for the current node.
37        int labelIndex = labels.charAt(node) - 'a';
39        // Decrement count of the current label at this node before DFS, as this count will include
40        // all occurrences in the subtree rooted at this node.
41        answer[node] -= count[labelIndex];
42        // Increment the count for this character as this node is visited.
43        count[labelIndex]++;
45        // Visit all the connected nodes that are not the parent.
46        for (int neighbor : graph[node]) {
47            if (neighbor != parent) {
48                dfs(neighbor, node);
49            }
50        }
52        // After visiting all children, increment the answer for this node.
53        // Now the count includes all occurrences in subtree plus the current node.
54        answer[node] += count[labelIndex];
55    }

C++ Solution

1class Solution {
3    // Method to count the number of subtrees with the same label for each node
4    vector<int> countSubTrees(int n, vector<vector<int>>& edges, string labels) {
5        // Create an adjacency list representation of the graph
6        vector<vector<int>> graph(n);
7        for (auto& edge : edges) {
8            int from = edge[0], to = edge[1];
9            graph[from].push_back(to);
10            graph[to].push_back(from);
11        }
13        // Vector to store the answer
14        vector<int> answer(n);
16        // Array to count occurrences of each character 'a' to 'z'
17        int charCount[26] = {0};
19        // Define the DFS function
20        function<void(int, int)> depthFirstSearch = [&](int node, int parent) {
21            int charIndex = labels[node] - 'a'; // Convert char to index (0 - 25)
23            // Subtract current count from the answer for this node, as it will be 'recounted' during backtracking
24            answer[node] -= charCount[charIndex];
25            charCount[charIndex]++; // Increment the count of the current character
27            // Recurse for all adjacent nodes
28            for (int& adjacent : graph[node]) {
29                if (adjacent != parent) { // Avoid revisiting the parent node
30                    depthFirstSearch(adjacent, node);
31                }
32            }
34            // Add the total count discovered during the recursive calls to the answer for this node
35            answer[node] += charCount[charIndex];
36        };
38        // Start DFS from node 0 with no parent
39        depthFirstSearch(0, -1);
40        return answer;
41    }

Typescript Solution

1// Function to count the number of occurrences of labels in the subtrees of each node
2function countSubTrees(n: number, edges: number[][], labels: string): number[] {
3    // DFS function to traverse the nodes
4    const depthFirstSearch = (currentNode: number, parent: number) => {
5        // Determine the index (0-25) corresponding to the label's character
6        const labelIndex = labels.charCodeAt(currentNode) - 97;
7        // Subtract the current count before the DFS of the children to later find the delta
8        subtreeCounts[currentNode] -= labelCounts[labelIndex];
9        // Increase the character count
10        labelCounts[labelIndex]++;
11        // Traverse all the connected nodes (children)
12        for (const nextNode of adjacencyList[currentNode]) {
13            if (nextNode !== parent) {  // Skip the parent node to prevent going backwards
14                depthFirstSearch(nextNode, currentNode);
15            }
16        }
17        // After visiting children, add the updated count to the result
18        subtreeCounts[currentNode] += labelCounts[labelIndex];
19    };
21    // Array to hold the counts of each label's occurrences in the subtree rooted at each node
22    const subtreeCounts: number[] = new Array(n).fill(0);
23    // Array to hold the counts of each label globally across the DFS traversal
24    const labelCounts: number[] = new Array(26).fill(0);
25    // Adjacency list to represent the graph, with an array for each node
26    const adjacencyList: number[][] = Array.from({ length: n }, () => []);
28    // Construct the adjacency list from the edges
29    for (const [nodeA, nodeB] of edges) {
30        adjacencyList[nodeA].push(nodeB);
31        adjacencyList[nodeB].push(nodeA);
32    }
34    // Start DFS traversal from the node with label '0', with no parent (-1)
35    depthFirstSearch(0, -1);
37    // Return the array containing counts of each label in the subtree of each node
38    return subtreeCounts;
Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following problems can be solved with backtracking (select multiple)

Time and Space Complexity

Time Complexity

The time complexity of the code is O(n) where n is the number of nodes in the tree. Each node is visited exactly once due to the depth-first search (DFS). Inside each DFS call, operations are constant time except for the iteration over the children which, across all calls, account for n - 1 edges. Since every edge connects two nodes and each edge is visited exactly twice (once from each node it connects), the time complexity due to edges is O(n - 1) which simplifies to O(n). Thus, the overall time complexity is governed by the number of node visits and edge traversals combined, which is linear in the number of nodes.

Space Complexity

The space complexity is O(n). The dictionary g storing the graph can contain up to 2(n - 1) entries because it is an undirected graph and each edge is stored twice. The cnt Counter and ans array each require O(n) space. The space required for the dfs call stack can also go up to O(n) in the case of a skewed tree where the recursive calls are not returned until the end of the longest branch. Therefore, the total space complexity combines the space for graph representation, counters, answer storage, and the recursion stack, which all scale linearly with the number of nodes in the tree.

Learn more about how to find time and space complexity quickly.

Recommended Readings

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.

TA 👨‍🏫