2003. Smallest Missing Genetic Value in Each Subtree


Problem Description

In this problem, we are given a family tree with n nodes, each node has a unique genetic value. Our task is to find the smallest genetic value that is not present in the subtree of each node. The tree is uniquely represented by an array parents where parents[i] gives us the parent of the node i. The root of the tree is the node with index 0 (thus parents[0] == -1).

Every node has a distinct genetic value, given by the array nums. Genetic values are within the range of [1, 10^5]. We are supposed to return an array ans where ans[i] is the smallest missing genetic value in the subtree rooted at node i. A subtree rooted at a node x consists of node x and all its descendants.

Intuition

The challenge here is that we could traverse the subtree of each node and collect all genetic values present, but that would be highly inefficient for a large number of nodes. So the key insight is to leverage the unique genetic value property - especially the node with the genetic value of 1.

We begin by noting that if a subtree doesn't contain the genetic value of 1, then 1 is the smallest missing value for all nodes in that subtree. This allows us to initialize the answer array ans with ones.

We then locate the node idx that has the genetic value of 1. From this node, we move upwards to the root, and for each of these nodes, we find the smallest missing genetic value in their subtrees. We use depth-first search (DFS) starting from idx to mark all the genetic values that appear in its subtree.

If idx is not found, which means the genetic value of 1 is missing in the entire tree, then all nodes have an answer of 1, and there's no need to proceed further.

If we do find idx, we have to dynamically update the smallest missing genetic values while traveling upwards towards the root. For that, we use a flag array vis to keep track of the visited nodes (to avoid revisiting during DFS), and an array has to mark which genetic values are present.

We continue the DFS process to mark present genetic values, and after each subtree traversal, we update the corresponding node's smallest missing genetic value in the answer array ans. Once a node is processed, we move to its parent and repeat until we reach the root.

By maintaining an iterative index i (starting from 2, the first non-present value after 1) and incrementing it whenever we find the corresponding genetic value in has, we efficiently find the next smallest missing genetic value.

Finally, we return the array ans, containing the smallest missing genetic values for all nodes' subtrees.

Learn more about Tree, Depth-First Search, Union Find and Dynamic Programming patterns.

Solution Approach

The solution utilizes a Depth-First Search (DFS) algorithm to traverse the tree along with additional data structures to optimize the search for the smallest missing genetic values.

The code initializes a graph g to represent the family tree. Each index of the graph corresponds to a node in the tree, and it contains a list of child nodes. This graph is used during DFS to iterate over the children of a node.

A crucial part of the algorithm is identifying the node idx with the genetic value of 1 because this serves as a starting point for our optimized search. From there, the solution implements the following steps iteratively:

  1. A DFS traversal begins at the current idx, marking genetic values as present in the subtree rooted at this node. This is done using a helper function dfs(i), where i is the current node. Any genetic value nums[i] that is found during traversal and is less than the length of the has array is marked as True in has[nums[i]] to indicate its presence.

  2. The vis array is used to mark nodes that have been visited to prevent revisiting nodes, which could lead to an infinite loop or unnecessary computations.

  3. After the DFS traversal, the code searches for the smallest missing value starting from i = 2, since the genetic value of 1 is already accounted for by our starting index idx. It looks for the first value not marked as present in the has array.

  4. The smallest missing value found is assigned to ans[idx]. This corresponds to the smallest missing genetic value in the subtree rooted at idx.

  5. The current idx is then updated to parents[idx], moving up one level in the tree to the parent of the current node.

  6. This process repeats until the root node is reached (idx == -1), and the smallest missing genetic values for the nodes along the path from the original node with the genetic value of 1 to the root node are updated in the ans array.

  7. The nodes that are not on the path from the node with genetic value of 1 to the root have an answer of 1, as their subtrees do not contain the genetic value of 1.

This solution approach is efficient because it avoids checking the entire subtree for each node. Instead, it focuses on the path from the node with genetic value 1 to the root and leverages the fact that the subtrees of nodes outside this path will not influence the smallest missing genetic values along this path. Additionally, the has array dynamically tracks which values are found, allowing for an efficient search for missing values.

Finally, the ans array is returned, providing the smallest missing genetic values for each node's subtree.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

The three-steps of Depth First Search are:

  1. Identify states;
  2. Draw the state-space tree;
  3. DFS on the state-space tree.

Example Walkthrough

Let's illustrate the solution approach with a small example.

Suppose we have the following family tree with n = 5 nodes and the parents array: [-1, 0, 1, 1, 2]. This indicates that:

  1. Node 0 is the root as parents[0] == -1.
  2. Node 1 is a child of node 0.
  3. Node 2 is a child of node 1.
  4. Node 3 is also a child of node 1.
  5. Node 4 is a child of node 2.

The nums array gives us the genetic values: [3, 1, 5, 4, 2]:

  1. Node 0 has a genetic value of 3.
  2. Node 1 has a genetic value of 1. This is our starting point since it's the unique node with value 1.
  3. Node 2 has a genetic value of 5.
  4. Node 3 has a genetic value of 4.
  5. Node 4 has a genetic value of 2.

First, we initialize our answer array ans with ones, assuming that the smallest missing genetic value for each node's subtree is 1: ans = [1, 1, 1, 1, 1].

Since node 1 has the genetic value 1, we begin a DFS traversal from this node.

While we traverse the subtree rooted at node 1, we mark in the has array which genetic values we encounter. Starting from node 1, we visit nodes 2 and 4. We update the has array with the genetic values found: has = [False, True, True, False, True] (the indexing is 1-based for genetic values).

After visiting the subtree of node 1, we look for the smallest missing genetic value in has. Since has[3] is False, the smallest missing value for this subtree is 3. We then update ans[1] to 3.

Now we move to the parent of node 1, which is the root node 0. We continue our DFS, but since we already visited node 1's subtree, we now visit node 3. After visiting, we update our has array to include the genetic value of 4: has = [False, True, True, True, True].

For node 0, we again search for the smallest missing genetic value. has[6] is our next check, but since our genetic values only range from 1 to n, the smallest missing value is 6. So we update ans[0] to 6.

No other nodes are parents, so we've completed our traversal. The final answer array is ans = [6, 3, 1, 1, 1], indicating the smallest missing genetic values for the subtrees of each node.

This approach is efficient for larger trees because we only traverse each node once and update our smallest missing values dynamically as we move through the tree.

Solution Implementation

1from typing import List
2
3class Solution:
4    def smallestMissingValueSubtree(self, parents: List[int], nums: List[int]) -> List[int]:
5        # Depth-First Search function to visit nodes
6        def dfs(current_node: int):
7            # If the current node is already visited, don't revisit
8            if visited[current_node]:
9                return
10            # Mark the current node as visited
11            visited[current_node] = True
12            # If the value at current node is within range, update has_value array
13            if nums[current_node] < len(has_value):
14                has_value[nums[current_node]] = True
15            # Recursively visit all the children of the current node
16            for child in graph[current_node]:
17                dfs(child)
18
19        # Number of nodes
20        num_nodes = len(nums)
21        # Initialize the answer list with 1s
22        answer = [1] * num_nodes
23        # Create an adjacency list to represent the tree
24        graph = [[] for _ in range(num_nodes)]
25        # Initialize the index of node having value 1
26        node_with_one = -1
27        # Construct the graph and find the index of node with value 1
28        for i, parent in enumerate(parents):
29            # Exclude the root node which has no parent
30            if i:
31                graph[parent].append(i)
32            # If the node's value is 1, update node_with_one
33            if nums[i] == 1:
34                node_with_one = i
35        # If no node contains the value 1, return the answer array as it is
36        if node_with_one == -1:
37            return answer
38      
39        # Initialize visited array to keep track of visited nodes
40        visited = [False] * num_nodes
41        # Array to track which values are present in the subtree
42        has_value = [False] * (num_nodes + 2)
43        # Missing value to start checking from (initially 2 since 1 is present)
44        missing_value = 2
45      
46        # Process ancestors of the node with value 1 and their subtrees
47        while node_with_one != -1:
48            # Perform DFS starting from the current node_with_one to update has_value array
49            dfs(node_with_one)
50            # Find the smallest missing value in the subtree
51            while has_value[missing_value]:
52                missing_value += 1
53            # Update the answer for the current node
54            answer[node_with_one] = missing_value
55            # Move up to the parent of the current node
56            node_with_one = parents[node_with_one]
57      
58        return answer
59
1import java.util.ArrayList;
2import java.util.Arrays;
3import java.util.List;
4
5class Solution {
6    private List<Integer>[] adjacencyList; // Graph representation as an adjacency list
7    private boolean[] visited; // Track visited nodes during DFS
8    private boolean[] hasValue; // Track if a subtree has a particular value
9    private int[] nodeValues; // The values assigned to each node
10
11    // Method to find the smallest missing value for each subtree
12    public int[] smallestMissingValueSubtree(int[] parents, int[] nums) {
13        int n = nums.length;
14        nodeValues = nums;
15        adjacencyList = new List[n];
16        visited = new boolean[n];
17        hasValue = new boolean[n + 2]; // n+1 possible node values plus an extra for easier indexing
18
19        Arrays.setAll(adjacencyList, i -> new ArrayList<>());
20        int oneValueIndex = -1; // Index of the node with value 1
21
22        // Build the tree structure from parent pointers
23        for (int i = 0; i < n; ++i) {
24            if (i > 0) {
25                adjacencyList[parents[i]].add(i); // Link child to parent in the list
26            }
27            if (nums[i] == 1) {
28                oneValueIndex = i; // Find the index of the node that contains the value 1
29            }
30        }
31
32        int[] result = new int[n]; // The array to store smallest missing values
33        Arrays.fill(result, 1); // Default value if '1' is not in any subtree
34
35        if (oneValueIndex == -1) {
36            return result; // If there's no node with value '1', return all ones
37        }
38
39        // Perform DFS and update result while moving up the tree
40        for (int missingValue = 2; oneValueIndex != -1; oneValueIndex = parents[oneValueIndex]) {
41            performDFS(oneValueIndex);
42
43            // Find the smallest missing value in the current subtree
44            while (hasValue[missingValue]) {
45                ++missingValue;
46            }
47            result[oneValueIndex] = missingValue; // Assign smallest missing value to the result array
48        }
49
50        return result;
51    }
52
53    // Perform DFS starting from node 'i' to mark values present in the subtree
54    private void performDFS(int i) {
55        if (visited[i]) {
56            return; // If the node is already visited, we do nothing
57        }
58
59        visited[i] = true; // Mark the node as visited
60
61        // Mark the value present in the current node if in range
62        if (nodeValues[i] < hasValue.length) {
63            hasValue[nodeValues[i]] = true;
64        }
65
66        // Recursively visit all connected nodes
67        for (int child : adjacencyList[i]) {
68            performDFS(child);
69        }
70    }
71}
72
1#include <vector>
2#include <cstring>
3#include <functional>
4using namespace std;
5
6class Solution {
7public:
8    vector<int> smallestMissingValueSubtree(vector<int>& parents, vector<int>& nums) {
9        int n = nums.size(); // Number of nodes in the tree
10        vector<int> adjacencyList[n]; // Adjacency list for representing the tree structure
11        bool visited[n]; // Track visited nodes during DFS
12        bool hasValue[n + 2]; // Track if a value exists in the subtree rooted at the node (indexed by value)
13        memset(visited, false, sizeof(visited)); // Initialize visited array
14        memset(hasValue, false, sizeof(hasValue)); // Initialize hasValue array
15      
16        int indexWithOne = -1; // To store the index of the node where the value is 1
17        for (int i = 0; i < n; ++i) {
18            if (i != 0) { // Skip the root (because the root has no parent)
19                adjacencyList[parents[i]].push_back(i); // Construct the adjacency list for the tree
20            }
21            if (nums[i] == 1) {
22                indexWithOne = i; // Find the node with value 1 and save its index
23            }
24        }
25
26        vector<int> answer(n, 1); // Initialize answer array with 1s
27
28        if (indexWithOne == -1) {
29            // If there is no node with value 1, all answers are 1
30            return answer;
31        }
32
33        // Define a Depth-First Search (DFS) lambda function
34        function<void(int)> dfs = [&](int node) {
35            if (visited[node]) {
36                return; // If node is already visited, do nothing
37            }
38            visited[node] = true; // Mark this node as visited
39            if (nums[node] < n + 2) {
40                hasValue[nums[node]] = true; // Mark the value that exists in the subtree
41            }
42            for (int child : adjacencyList[node]) {
43                dfs(child); // Run DFS recursively for each child
44            }
45        };
46
47        // Find the smallest missing values for the nodes from the node with value 1 up to the root
48        for (int missingValue = 2; indexWithOne != -1; indexWithOne = parents[indexWithOne]) {
49            dfs(indexWithOne); // Run DFS to populate hasValue array
50            while (hasValue[missingValue]) { // Until we find a missing value...
51                ++missingValue; // ...increment the missing value
52            }
53            answer[indexWithOne] = missingValue; // Set the answer for this node
54        }
55
56        return answer; // Return the final answer vector
57    }
58};
59
1function smallestMissingValueSubtree(parents: number[], nums: number[]): number[] {
2    const nodeCount = nums.length;
3    const graph: number[][] = Array.from({ length: nodeCount }, () => []);
4    const visited: boolean[] = Array(nodeCount).fill(false);
5    const hasValue: boolean[] = Array(nodeCount + 2).fill(false);
6    const result: number[] = Array(nodeCount).fill(1);
7    let oneValueIndex = -1;
8
9    // Populate the graph representation from the parent array and find the index with value 1
10    for (let i = 0; i < nodeCount; ++i) {
11        if (i) {
12            graph[parents[i]].push(i);
13        }
14        if (nums[i] === 1) {
15            oneValueIndex = i;
16        }
17    }
18
19    // If no node with value 1 is found, return the default result array
20    if (oneValueIndex === -1) {
21        return result;
22    }
23
24    // Depth-first search to update visited and hasValue arrays
25    const depthFirstSearch = (nodeIndex: number): void => {
26        if (visited[nodeIndex]) {
27            return;
28        }
29        visited[nodeIndex] = true;
30        if (nums[nodeIndex] < hasValue.length) {
31            hasValue[nums[nodeIndex]] = true;
32        }
33        for (const childIndex of graph[nodeIndex]) {
34            depthFirstSearch(childIndex);
35        }
36    };
37
38    // Search for the smallest missing value starting from the node with value 1
39    for (let missingValue = 2; oneValueIndex !== -1; oneValueIndex = parents[oneValueIndex]) {
40        depthFirstSearch(oneValueIndex);
41        while (hasValue[missingValue]) {
42            ++missingValue;
43        }
44        result[oneValueIndex] = missingValue;
45    }
46
47    return result;
48}
49

Time and Space Complexity

The time complexity of the code is O(n) where n is the number of nodes in the tree. This is because every node is accessed once during the DFS (Depth First Search) traversal. The DFS is called once for every node in the path from the node with the value '1' up to the root node. This results in a maximum of n calls to DFS, each processing each node only once.

The space complexity of the code is O(n) as well due to several factors:

  1. The recursion stack for the DFS, which in the worst case could hold n function calls if the tree degenerates into a linked list.

  2. The vis array which is used to keep track of visited nodes in the DFS. The size of this array is n.

  3. The has array which is used to keep track of the smallest missing values. Its size is n+2, to cover all nodes values plus an extra space for detecting the missing value that is larger than the number of nodes.

  4. The g array (adjacency list) which is constructed at the start. Each node can have at most n-1 children, but since it's a tree, there will be a total of n-1 edges spread out across the g list, still keeping the total space for g within O(n).

Summing up all these components still results in a total space complexity of O(n).

Learn more about how to find time and space complexity quickly using problem constraints.


Fast Track Your Learning with Our Quick Skills Quiz:

Given an array of 1,000,000 integers that is almost sorted, except for 2 pairs of integers. Which algorithm is fastest for sorting the array?


Recommended Readings


Got a question? Ask the Monster Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


🪄