Facebook Pixel

2003. Smallest Missing Genetic Value in Each Subtree

Problem Description

You are given a family tree represented as a rooted tree with n nodes numbered from 0 to n - 1, where node 0 is the root. The tree structure is defined by an array parents where parents[i] indicates the parent of node i. Since node 0 is the root, parents[0] == -1.

Each node in the tree has a unique genetic value from the range [1, 10^5]. These genetic values are given in an array nums where nums[i] represents the genetic value of node i. All genetic values in nums are distinct.

Your task is to find, for each node i, the smallest positive genetic value that is missing from the subtree rooted at that node. A subtree rooted at node x includes node x itself and all of its descendants.

Return an array ans of length n where ans[i] is the smallest missing genetic value from the subtree rooted at node i.

For example, if a subtree contains nodes with genetic values [1, 2, 4, 5], then the smallest missing value would be 3. If a subtree contains genetic values [2, 3, 4], then the smallest missing value would be 1.

Flowchart Walkthrough

First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:

Is it a graph?

  • Yes: The problem involves a tree structure where nodes are connected through parent-child relationships. A tree is a special type of graph (specifically, a connected acyclic graph).

Is it a tree?

  • Yes: The problem explicitly states we have a family tree rooted at node 0, with parent-child relationships defined by the parents array. Each node (except the root) has exactly one parent, which confirms this is a tree structure.

DFS

  • According to the flowchart, when we have a tree structure, the natural algorithm choice is DFS (Depth-First Search).

Conclusion: The flowchart suggests using DFS for this problem. This makes sense because:

  1. We need to explore subtrees (which requires traversing from each node to all its descendants)
  2. For each node, we need to collect information about all genetic values present in its subtree
  3. DFS is ideal for tree traversal where we need to process complete subtrees before returning results to parent nodes
  4. The solution requires us to traverse from specific nodes (like the node with genetic value 1) upward to the root, collecting subtree information along the way

The DFS pattern is particularly effective here as we can:

  • Traverse the subtree rooted at each node to identify which genetic values are present
  • Mark visited nodes to avoid redundant processing
  • Efficiently find the smallest missing genetic value by tracking which values appear in each subtree
Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that since all genetic values are unique and we're looking for the smallest missing value, most nodes will have an answer of 1 - unless their subtree contains the genetic value 1.

Think about it: if a subtree doesn't contain genetic value 1, then 1 is the smallest missing value for that subtree. This means we only need to do complex calculations for nodes whose subtrees contain 1.

So where is genetic value 1? Since each value appears exactly once, there's only one node with genetic value 1. Let's call this node idx. Now, only the nodes on the path from idx to the root can possibly have subtrees containing 1 (because idx is in their subtree). All other nodes will have answer 1.

For the nodes on the path from idx to root, we need to find the actual smallest missing value. Here's where it gets clever:

  1. Start at node idx and use DFS to mark all genetic values present in its subtree
  2. Find the smallest positive integer not in this set - this is the answer for node idx
  3. Move to the parent of idx and repeat

The brilliant optimization is that as we move up the tree, we don't need to restart from scratch. Each parent's subtree contains all the genetic values from its children's subtrees plus potentially more from other branches. So we can keep accumulating the genetic values we've seen and continue our search for the smallest missing value from where we left off.

For example, if node idx has smallest missing value 5, then its parent's smallest missing value must be at least 5 (since the parent's subtree contains everything idx's subtree has). This means we can start checking from 5 instead of from 1 again.

This approach is efficient because:

  • We only do expensive DFS traversals for nodes on one path (from the node with value 1 to root)
  • We reuse information as we move up the tree
  • The smallest missing value can only increase as we go up (never decrease)

Learn more about Tree, Depth-First Search, Union Find and Dynamic Programming patterns.

Solution Approach

The implementation follows the intuition by focusing on the path from the node containing genetic value 1 to the root:

1. Initialize Data Structures:

  • Create an answer array ans initialized with all 1s (since most nodes will have answer 1)
  • Build an adjacency list g to represent the tree structure from the parent array
  • Find the node idx that has genetic value 1 (if no such node exists, all answers remain 1)

2. Set Up Tracking Arrays:

  • vis[i]: Boolean array to track visited nodes during DFS to avoid revisiting
  • has[i]: Boolean array where has[i] = True means genetic value i exists in the current subtree being explored
  • Initialize a counter i = 2 to track the next potential missing value

3. DFS Function:

def dfs(i: int):
    if vis[i]:
        return
    vis[i] = True
    if nums[i] < len(has):
        has[nums[i]] = True
    for j in g[i]:
        dfs(j)

This function marks node i as visited, records its genetic value in has, and recursively visits all children.

4. Main Algorithm: Starting from node idx (the node with genetic value 1), traverse up to the root:

while idx != -1:
    dfs(idx)           # Mark all genetic values in subtree rooted at idx
    while has[i]:      # Find smallest missing value starting from i
        i += 1
    ans[idx] = i       # Set the answer for current node
    idx = parents[idx] # Move to parent

Key Optimizations:

  • The variable i is never reset - it only increases. This works because as we move up the tree, the smallest missing value can only stay the same or increase (parent subtrees contain all values from child subtrees)
  • We only visit each node once across all DFS calls due to the vis array
  • The has array accumulates genetic values as we move up, so we don't lose information from previous subtrees

Time Complexity: O(n) where n is the number of nodes, since each node is visited at most once during all DFS traversals combined.

Space Complexity: O(n) for the adjacency list, visited array, and has array.

The algorithm efficiently solves the problem by recognizing that only nodes on one specific path (from the node with value 1 to root) need detailed computation, while all other nodes have a trivial answer of 1.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate the solution approach.

Example Input:

  • parents = [-1, 0, 0, 1, 1, 2]
  • nums = [5, 4, 1, 3, 6, 2]

This creates the following tree:

       0 (val=5)
      / \
     1   2
    / \   \
   3   4   5
(val=4)(val=3)(val=6)(val=2)

Step 1: Initialize

  • ans = [1, 1, 1, 1, 1, 1] (all nodes start with answer 1)
  • Build adjacency list: g = {0: [1,2], 1: [3,4], 2: [5], 3: [], 4: [], 5: []}
  • Find node with value 1: idx = 2 (since nums[2] = 1)

Step 2: Process path from node 2 to root

We'll traverse: node 2 → node 0 (root)

Processing Node 2:

  • Start DFS from node 2
  • Visit node 2: mark has[1] = True (value 1)
  • Visit child node 5: mark has[2] = True (value 2)
  • Current has = [False, True, True, False, False, ...]
  • Start checking from i = 2: has[2] is True, increment to 3
  • Check i = 3: has[3] is False
  • ans[2] = 3 (smallest missing value is 3)
  • Move to parent: idx = 0

Processing Node 0 (root):

  • Continue DFS from node 0 (accumulating previous values)
  • Visit node 0: mark has[5] = True (value 5)
  • Visit child node 1: mark has[4] = True (value 4)
  • Visit grandchild node 3: mark has[3] = True (value 3)
  • Visit grandchild node 4: mark has[6] = True (value 6)
  • Current has = [False, True, True, True, True, True, True, ...]
  • Continue checking from i = 3: all of 3, 4, 5, 6 are True
  • Check i = 7: has[7] is False
  • ans[0] = 7 (smallest missing value is 7)
  • Move to parent: idx = -1 (done, as 0 is the root)

Final Result: ans = [7, 1, 3, 1, 1, 1]

Verification:

  • Node 0's subtree has values [5,4,1,3,6,2] → missing 7 ✓
  • Node 1's subtree has values [4,3,6] → missing 1 ✓
  • Node 2's subtree has values [1,2] → missing 3 ✓
  • Node 3's subtree has values [3] → missing 1 ✓
  • Node 4's subtree has values [6] → missing 1 ✓
  • Node 5's subtree has values [2] → missing 1 ✓

The key insight illustrated here: only nodes 2 and 0 (on the path from the node with value 1 to root) needed detailed computation. All other nodes simply have answer 1 because their subtrees don't contain the value 1.

Solution Implementation

1class Solution:
2    def smallestMissingValueSubtree(
3        self, parents: List[int], nums: List[int]
4    ) -> List[int]:
5        """
6        Find the smallest missing positive integer in each subtree.
7      
8        Args:
9            parents: Parent array where parents[i] is the parent of node i
10            nums: Array where nums[i] is the genetic value at node i
11          
12        Returns:
13            Array where result[i] is the smallest missing value in subtree rooted at i
14        """
15      
16        def mark_subtree_values(node: int) -> None:
17            """
18            DFS to mark all genetic values present in the subtree rooted at 'node'.
19          
20            Args:
21                node: Current node to process
22            """
23            # Skip if already visited
24            if visited[node]:
25                return
26          
27            # Mark node as visited
28            visited[node] = True
29          
30            # Mark the genetic value of current node as present
31            if nums[node] < len(value_exists):
32                value_exists[nums[node]] = True
33          
34            # Recursively process all children
35            for child in children[node]:
36                mark_subtree_values(child)
37      
38        n = len(nums)
39        # Initialize result array - default is 1 (smallest missing value)
40        result = [1] * n
41      
42        # Build adjacency list for tree (parent -> children)
43        children = [[] for _ in range(n)]
44        node_with_value_one = -1
45      
46        for node_id, parent_id in enumerate(parents):
47            # Skip root node (parent is -1)
48            if node_id > 0:
49                children[parent_id].append(node_id)
50            # Find the node containing genetic value 1
51            if nums[node_id] == 1:
52                node_with_value_one = node_id
53      
54        # If no node has value 1, all subtrees miss value 1
55        if node_with_value_one == -1:
56            return result
57      
58        # Track visited nodes during DFS
59        visited = [False] * n
60        # Track which genetic values exist in current path
61        value_exists = [False] * (n + 2)
62      
63        # Start checking from value 2 (since we know 1 exists)
64        current_missing = 2
65        current_node = node_with_value_one
66      
67        # Process path from node with value 1 to root
68        while current_node != -1:
69            # Mark all values in subtree rooted at current_node
70            mark_subtree_values(current_node)
71          
72            # Find the next missing value
73            while value_exists[current_missing]:
74                current_missing += 1
75          
76            # Set result for current node
77            result[current_node] = current_missing
78          
79            # Move to parent node
80            current_node = parents[current_node]
81      
82        return result
83
1class Solution {
2    // Graph adjacency list where g[i] contains all children of node i
3    private List<Integer>[] graph;
4    // Tracks whether a node has been visited during DFS
5    private boolean[] visited;
6    // Tracks which values (from node values) have been seen in current subtree
7    private boolean[] hasValue;
8    // Original node values array
9    private int[] nodeValues;
10
11    public int[] smallestMissingValueSubtree(int[] parents, int[] nums) {
12        int n = nums.length;
13        this.nodeValues = nums;
14      
15        // Initialize data structures
16        graph = new List[n];
17        visited = new boolean[n];
18        hasValue = new boolean[n + 2]; // Extra space to handle boundary cases
19      
20        // Create adjacency list for the tree
21        Arrays.setAll(graph, i -> new ArrayList<>());
22      
23        // Find the node with value 1 (if exists) and build the graph
24        int nodeWithOne = -1;
25        for (int i = 0; i < n; ++i) {
26            // Add edge from parent to child (skip root node at index 0)
27            if (i > 0) {
28                graph[parents[i]].add(i);
29            }
30            // Track which node has value 1
31            if (nums[i] == 1) {
32                nodeWithOne = i;
33            }
34        }
35      
36        // Initialize answer array with 1 (default smallest missing value)
37        int[] answer = new int[n];
38        Arrays.fill(answer, 1);
39      
40        // If no node has value 1, all subtrees have smallest missing value = 1
41        if (nodeWithOne == -1) {
42            return answer;
43        }
44      
45        // Process path from node with value 1 to root
46        // Only nodes on this path can have smallest missing value > 1
47        int currentMissing = 2;
48        for (int currentNode = nodeWithOne; currentNode != -1; currentNode = parents[currentNode]) {
49            // Mark all values in the subtree rooted at currentNode
50            dfs(currentNode);
51          
52            // Find the smallest missing positive integer
53            while (hasValue[currentMissing]) {
54                currentMissing++;
55            }
56          
57            // Set the answer for current node
58            answer[currentNode] = currentMissing;
59        }
60      
61        return answer;
62    }
63
64    /**
65     * Depth-first search to mark all values present in the subtree rooted at node i
66     * @param nodeIndex The current node being processed
67     */
68    private void dfs(int nodeIndex) {
69        // Skip if already visited (avoid reprocessing)
70        if (visited[nodeIndex]) {
71            return;
72        }
73      
74        // Mark node as visited
75        visited[nodeIndex] = true;
76      
77        // Mark the value at this node as present (if within bounds)
78        if (nodeValues[nodeIndex] < hasValue.length) {
79            hasValue[nodeValues[nodeIndex]] = true;
80        }
81      
82        // Recursively process all children
83        for (int child : graph[nodeIndex]) {
84            dfs(child);
85        }
86    }
87}
88
1class Solution {
2public:
3    vector<int> smallestMissingValueSubtree(vector<int>& parents, vector<int>& nums) {
4        int n = nums.size();
5      
6        // Build adjacency list for the tree (children of each node)
7        vector<vector<int>> children(n);
8        for (int i = 1; i < n; ++i) {
9            children[parents[i]].push_back(i);
10        }
11      
12        // Find the node with value 1 (if it exists)
13        int nodeWithOne = -1;
14        for (int i = 0; i < n; ++i) {
15            if (nums[i] == 1) {
16                nodeWithOne = i;
17                break;
18            }
19        }
20      
21        // Initialize result array with 1 (default smallest missing value)
22        vector<int> result(n, 1);
23      
24        // If there's no node with value 1, all subtrees miss 1
25        if (nodeWithOne == -1) {
26            return result;
27        }
28      
29        // Track visited nodes and present values in explored subtrees
30        vector<bool> visited(n, false);
31        vector<bool> valuePresent(n + 2, false);
32      
33        // DFS to mark all values present in a subtree rooted at node
34        function<void(int)> dfs = [&](int node) {
35            if (visited[node]) {
36                return;
37            }
38          
39            visited[node] = true;
40          
41            // Mark the value at this node as present
42            if (nums[node] <= n + 1) {
43                valuePresent[nums[node]] = true;
44            }
45          
46            // Recursively visit all children
47            for (int child : children[node]) {
48                dfs(child);
49            }
50        };
51      
52        // Process path from node with value 1 to root
53        int currentMissing = 2;  // Start checking from 2 since 1 is present
54        for (int currentNode = nodeWithOne; currentNode != -1; currentNode = parents[currentNode]) {
55            // Explore subtree rooted at currentNode
56            dfs(currentNode);
57          
58            // Find the smallest missing positive integer
59            while (valuePresent[currentMissing]) {
60                ++currentMissing;
61            }
62          
63            result[currentNode] = currentMissing;
64        }
65      
66        return result;
67    }
68};
69
1/**
2 * Finds the smallest missing value in each subtree of a tree
3 * @param parents - Array where parents[i] is the parent of node i (parents[0] = -1 for root)
4 * @param nums - Array where nums[i] is the value at node i
5 * @returns Array where result[i] is the smallest missing positive integer in subtree rooted at i
6 */
7function smallestMissingValueSubtree(parents: number[], nums: number[]): number[] {
8    const nodeCount: number = nums.length;
9  
10    // Build adjacency list representation of the tree
11    const adjacencyList: number[][] = Array.from({ length: nodeCount }, () => []);
12  
13    // Track visited nodes during DFS traversal
14    const visited: boolean[] = Array(nodeCount).fill(false);
15  
16    // Track which values have been seen in the current path
17    const valueExists: boolean[] = Array(nodeCount + 2).fill(false);
18  
19    // Initialize result array with 1 (default smallest missing value)
20    const result: number[] = Array(nodeCount).fill(1);
21  
22    // Find the node containing value 1 (if exists)
23    let nodeWithOne: number = -1;
24  
25    // Build the tree and find node with value 1
26    for (let i = 0; i < nodeCount; ++i) {
27        // Add edge from parent to child (skip root node)
28        if (i !== 0) {
29            adjacencyList[parents[i]].push(i);
30        }
31      
32        // Track the node containing value 1
33        if (nums[i] === 1) {
34            nodeWithOne = i;
35        }
36    }
37  
38    // If no node contains 1, all subtrees have smallest missing value = 1
39    if (nodeWithOne === -1) {
40        return result;
41    }
42  
43    /**
44     * Performs DFS to mark all values in the subtree as existing
45     * @param currentNode - Current node being processed
46     */
47    const dfs = (currentNode: number): void => {
48        // Skip if already visited
49        if (visited[currentNode]) {
50            return;
51        }
52      
53        // Mark current node as visited
54        visited[currentNode] = true;
55      
56        // Mark the value at current node as existing
57        if (nums[currentNode] < valueExists.length) {
58            valueExists[nums[currentNode]] = true;
59        }
60      
61        // Recursively process all children
62        for (const child of adjacencyList[currentNode]) {
63            dfs(child);
64        }
65    };
66  
67    // Process path from node with value 1 to root
68    let smallestMissing: number = 2;
69    for (let currentNode = nodeWithOne; currentNode !== -1; currentNode = parents[currentNode]) {
70        // Collect all values in the subtree rooted at current node
71        dfs(currentNode);
72      
73        // Find the smallest missing value
74        while (valueExists[smallestMissing]) {
75            ++smallestMissing;
76        }
77      
78        // Store the result for current node
79        result[currentNode] = smallestMissing;
80    }
81  
82    return result;
83}
84

Time and Space Complexity

Time Complexity: O(n)

The algorithm performs a depth-first search starting from the node containing value 1 and traversing up to the root. Key observations:

  1. The while loop that finds idx (node with value 1) runs at most once through all nodes: O(n)
  2. The main while loop traverses from the node with value 1 up to the root, visiting at most n nodes along the path
  3. The DFS function dfs(i) visits each node at most once due to the vis[i] check. Across all DFS calls, every node is visited exactly once: O(n) total
  4. The inner while loop while has[i]: i += 1 increments i from 2 to at most n+1. Since i never decreases, this contributes O(n) total across all iterations

Therefore, the total time complexity is O(n) + O(n) + O(n) = O(n).

Space Complexity: O(n)

The algorithm uses the following data structures:

  1. ans: array of size n storing the result: O(n)
  2. g: adjacency list representation of the tree with n nodes: O(n)
  3. vis: boolean array of size n for tracking visited nodes: O(n)
  4. has: boolean array of size n+2 for tracking which values exist in subtrees: O(n)
  5. Recursive call stack for DFS: In the worst case (skewed tree), the depth can be O(n)

The total space complexity is O(n).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Assumption About Value Range

Pitfall: A common mistake is assuming that genetic values are limited to the range [1, n] where n is the number of nodes. The problem states values can be up to 10^5, which may be larger than n.

Example Issue:

# Incorrect: assumes values are bounded by n
value_exists = [False] * (n + 2)  # This could cause index out of bounds!

# When marking values:
if nums[node] < len(value_exists):  # This check prevents crashes but silently ignores large values
    value_exists[nums[node]] = True

Solution: Size the value_exists array based on the maximum possible genetic value:

# Correct approach:
max_value = 10**5 + 2  # Based on problem constraints
value_exists = [False] * max_value

# Or dynamically based on actual values:
max_genetic_value = max(nums)
value_exists = [False] * (max_genetic_value + 2)

2. Resetting the Missing Value Counter

Pitfall: Developers might instinctively reset current_missing back to 1 or 2 for each node on the path, thinking each subtree needs independent computation.

Example Issue:

while current_node != -1:
    current_missing = 2  # WRONG: Resetting for each node
    mark_subtree_values(current_node)
    while value_exists[current_missing]:
        current_missing += 1
    result[current_node] = current_missing
    current_node = parents[current_node]

Why It's Wrong: As we move up the tree, parent nodes contain all values from their children's subtrees. The smallest missing value can only increase or stay the same - it never decreases. Resetting would cause unnecessary recomputation and incorrect results.

Solution: Keep current_missing persistent across iterations:

current_missing = 2  # Initialize once outside the loop
while current_node != -1:
    # Don't reset current_missing here
    mark_subtree_values(current_node)
    while value_exists[current_missing]:
        current_missing += 1
    result[current_node] = current_missing
    current_node = parents[current_node]

3. Building the Tree Structure Incorrectly

Pitfall: Confusing parent-to-child vs child-to-parent relationships when building the adjacency list.

Example Issue:

# Incorrect: Adding parent to child's list
for node_id, parent_id in enumerate(parents):
    if node_id > 0:
        children[node_id].append(parent_id)  # WRONG direction!

Solution: Ensure you're adding children to their parent's adjacency list:

# Correct: Adding child to parent's list
for node_id, parent_id in enumerate(parents):
    if node_id > 0:  # Skip root (parent_id would be -1)
        children[parent_id].append(node_id)

4. Not Handling the Case When Value 1 Doesn't Exist

Pitfall: Forgetting to handle the edge case where no node has genetic value 1, leading to attempting to traverse from a non-existent node.

Example Issue:

# Find node with value 1
node_with_value_one = -1
for i, val in enumerate(nums):
    if val == 1:
        node_with_value_one = i
        break

# Forgetting to check if it was found
current_node = node_with_value_one  # Could be -1
while current_node != -1:  # This would immediately exit without processing anything
    # ...

Solution: Explicitly check and handle this case:

if node_with_value_one == -1:
    # All subtrees are missing value 1
    return result  # result is already initialized with all 1s
Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More