928. Minimize Malware Spread II


Problem Description

In this problem, we have a network of n nodes and these nodes are represented as an n x n adjacency matrix called [graph](/problems/graph_intro). The matrix element graph[i][j] == 1 indicates that there is a direct connection between node i and node j. Initially, some nodes are infected with malware (an array initial is given to represent which ones). The malware spreads between directly connected nodes: if one node is infected, its direct connections also become infected. This spread continues until no new nodes can be infected.

The goal is to determine which single node, if removed from the array of initially infected nodes, would result in the minimum number of final infected nodes in the network after the spread stops. This process also involves removing all connections to and from this node. The task is to return the node that minimizes the final number of infected nodes. If there are multiple nodes whose removal would yield the same minimum number of infections, return the one with the smallest index.

The challenge lies in simulating the malware spread and determining the optimal node for removal to minimize the spread.

Flowchart Walkthrough

Let's analyze the algorithm using the Flowchart. Here's a step-by-step walkthrough for LeetCode 928, Minimize Malware Spread II:

Is it a graph?

  • Yes: The problem presents a network of computers as a graph, where nodes represent computers and edges represent connections between them.

Is it a tree?

  • No: The graph may have cycles and is not necessarily a hierarchical tree structure.

Is the problem related to directed acyclic graphs (DAGs)?

  • No: The graph might contain cycles, as it represents an interconnected network, not a DAG.

Is the problem related to shortest paths?

  • No: The goal is to find a node (computer) whose removal minimizes the spread of malware, not to find the shortest path between nodes.

Does the problem involve connectivity?

  • Yes: The solution involves understanding the component connectivity to effectively reduce the impact of malware, by identifying which nodes, when removed, disrupt the most significant number of malware connections.

Is the graph weighted?

  • No: Each connection between computers is treated equally, without specific weights.

Since the problem involves navigating through an unweighted connectivity graph to identify critical nodes, the flowchart suggests using Depth-First Search (DFS) or Breadth-First Search (BFS). Given that DFS is often simpler to implement recursively and can conveniently explore all potential paths and backtracking, Depth-First Search (DFS) becomes suitable for this scenario.

Conclusion: The flowchart guides us to utilize DFS for this unweighted connectivity problem in an iterative or recursive manner to manage and understand the network's structure for minimizing malware spread.

Intuition

The intuition behind the solution involves looking at the problem as a kind of network analysis combined with optimization. The solution requires us to determine the impact of each initially infected node's removal on the overall spread of the infection throughout the network.

To achieve this, we can use a union-find data structure (also known as a disjoint-set union) to handle the grouping of connected nodes. Here, the idea is to perform the following steps:

  1. Use the union-find to group all clean nodes. If two clean nodes are connected, they belong to the same group.

  2. Count the connections between each initially infected node and the groups of clean nodes. We want to know how many clean groups each infected node can infect.

  3. Calculate the impact of removing each initially infected node. An infected node's impact is determined by the number of clean nodes in groups it can infect by itself (without other infected nodes connecting to those same groups).

  4. Pick the initially infected node that, when removed, would prevent the most spread of the malware. This is the node connected to the largest number of clean nodes that are not also connected to other initially infected nodes.

  5. If there are multiple nodes that meet the criterion in step 4, choose the node with the smallest index.

This approach relies on understanding the structure of the network and using the union-find to efficiently manage connected components, ultimately finding the node whose removal minimizes the spread of malware.

Learn more about Depth-First Search, Breadth-First Search, Union Find and Graph patterns.

Solution Approach

The implementation of the solution involves the following key components:

  1. Union-Find Data Structure: This is used to group the clean nodes that are directly connected. Each group is represented by a root, and all nodes in the same group will have the same root after the connections are processed.

  2. Parent Array (p) and Size Array (size): The parent array p keeps track of the root of each node, initially set to the node itself. The size array size keeps track of the size of the group that the node belongs to if it is a root, initialized with 1 for all nodes.

  3. Clean Nodes Identification: An array clean is used to identify whether a node is initially infected or not. This helps to avoid processing initially infected nodes during the union-find grouping phase.

The algorithm can be described in the following steps:

  • Initialize Union-Find:

    • The parent (p) array is initialized with each node as its own parent.
    • The size (size) array is filled with 1 since initially, each group (each node being its own group) has a size of one.
    • A clean array is initialized to True for all nodes and then set to False for nodes in the initial list of infected nodes.
  • Group Clean Nodes:

    • Iterate over pairs of nodes (i, j) and if both nodes are clean and they are directly connected ([graph](/problems/graph_intro)[i][j] == 1), perform the union of the two nodes.
    • The union operation finds the root of each node and if they have different roots, it connects them by setting the parent of one root to be the other root and updates the size of the group.
  • Count Malware Connections:

    • Create a Counter for infected connections to groups (cnt) and a map mp from infected node to the groups of clean nodes it can infect.
    • For each infected node in the initial list, determine the unique roots of clean groups that are connected to it. Increment the Counter for each of these roots.
  • Determine the Optimal Node for Removal:

    • Initialize variables mx and ans to keep track of the maximum number of nodes that can be saved and the corresponding node index.
    • For each infected node, sum up the sizes of the unique clean groups to which it's connected if those groups have only this one infected connection (cnt[root] == 1). This gives us the number of nodes that would be saved by removing this infected node.
    • Check if this number (t) is greater than the current maximum (mx). If yes, update mx. If t equals mx, then update ans if the current node has a smaller index.
  • Return the Optimal Node:

    • After iterating through all the infected nodes and considering the impact of their removal, return the node index stored in ans.

The patterns used in this solution are union-find for grouping and analysis of connected components, and greedy algorithms with respect to choosing the node whose removal results in the most significant benefit (minimizes the spread of malware).

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let’s go through a small example to illustrate the solution approach. Suppose we have a network of 5 nodes and the following adjacency matrix:

graph = [
  [1, 1, 0, 0, 0],
  [1, 1, 0, 1, 0],
  [0, 0, 1, 0, 1],
  [0, 1, 0, 1, 1],
  [0, 0, 1, 1, 1]
]

And the array of initially infected nodes is given by:

initial = [1, 3]

This means that nodes 1 and 3 are initially infected. We need to find out which one we should remove to minimize the final number of infections.

Initialize Union-Find

  • Create parent array p as [0, 1, 2, 3, 4] and size array size as [1, 1, 1, 1, 1].
  • Set clean array as [True, False, True, False, True] to reflect that nodes 1 and 3 are infected and nodes 0, 2, and 4 are clean.

Group Clean Nodes

  • Iterate over the matrix and apply union to the clean nodes that are directly connected:
    • Nodes 0 and 2 are not directly connected.
    • Nodes 2 and 4 are directly connected, so we union them. Now p looks like [0, 1, 2, 3, 2] and size looks like [1, 1, 2, 1, 1]. Node 2 is the root of nodes 2 and 4.
    • Nodes 0 and 4 are not directly connected (because node 0 is not connected to node 2, which is the root of node 4).

Count Malware Connections

  • We initialize a Counter cnt to track the connections from infected to clean groups and a map mp to track the mapping of infected nodes to clean groups.
  • For infected node 1, we find that it connects to clean node 0 (group root is 0).
  • For infected node 3, it connects to the clean group represented by root 2 (node 2).

Determine the Optimal Node for Removal

  • We check for node 1 and calculate that by removing it, we can save 1 clean node (since it's the only connection to node 0).
  • For node 3, removing it would save 2 clean nodes (group size of node 2 which is the only infected connection).
  • Since removing node 3 saves more nodes, it has a greater impact, leading us to choose node 3 for removal.

Return the Optimal Node

  • Based on our calculations, the best node to remove to minimize infections is node 3, and that would be our final answer for this scenario.

Solution Implementation

1from typing import List
2from collections import Counter
3
4class Solution:
5    def minMalwareSpread(self, graph: List[List[int]], initial: List[int]) -> int:
6        # Helper function to find the root node of 'x' using path compression for optimization
7        def find(x):
8            if parent[x] != x:
9                parent[x] = find(parent[x])
10            return parent[x]
11
12        # Helper function to union two sets, merging the smaller tree into the larger one
13        def union(a, b):
14            root_a, root_b = find(a), find(b)
15            if root_a != root_b:
16                component_size[root_b] += component_size[root_a]
17                parent[root_a] = root_b
18
19        n = len(graph)
20        parent = list(range(n))  # Initialize each node to be its own parent
21        component_size = [1] * n  # Keep track of the size of each component
22        is_clean = [True] * n  # Keep track if a node is not initially infected
23        for node in initial:
24            is_clean[node] = False
25
26        # Union find to build up components of clean nodes
27        for i in range(n):
28            if not is_clean[i]:
29                continue
30            for j in range(i + 1, n):
31                if is_clean[j] and graph[i][j] == 1:
32                    union(i, j)
33
34        affected_component_count = Counter()  # Counts the number of initially infected nodes affecting the component
35        node_to_components = {}  # Maps an initially infected node to the components it affects
36        for node in initial:
37            affected_components = {find(j) for j in range(n) if is_clean[j] and graph[node][j] == 1}
38            for root in affected_components:
39                affected_component_count[root] += 1
40            node_to_components[node] = affected_components
41
42        # Initialize variables to find the optimal node to remove
43        max_reduction, result_node = -1, 0
44        for node, affected_components in node_to_components.items():
45            # Calculate the total reduction in number of infected nodes 
46            # if this 'node' is removed and is the only one affecting those components
47            reduction = sum(component_size[root] for root in affected_components if affected_component_count[root] == 1)
48          
49            # Update the result if this 'node' provides a higher reduction of infected nodes
50            # or if there is a tie, take the 'node' with the smaller index
51            if max_reduction < reduction or (max_reduction == reduction and node < result_node):
52                max_reduction, result_node = reduction, node
53
54        return result_node
55
1class Solution {
2
3    private int[] parent; // parent[i] is the parent of node i in the union-find data structure
4    private int[] componentSize; // componentSize[i] is the size of the component whose root is i
5
6    public int minMalwareSpread(int[][] graph, int[] initial) {
7        int n = graph.length;
8        parent = new int[n];
9        componentSize = new int[n];
10      
11        // Initialize the union-find data structure
12        for (int i = 0; i < n; ++i) {
13            parent[i] = i;
14            componentSize[i] = 1;
15        }
16      
17        // Keep track of clean nodes (non-malware)
18        boolean[] isClean = new boolean[n];
19        Arrays.fill(isClean, true);
20        for (int infectedNode : initial) {
21            isClean[infectedNode] = false;
22        }
23      
24        // Union clean nodes that are connected
25        for (int i = 0; i < n; ++i) {
26            if (!isClean[i]) {
27                continue;
28            }
29            for (int j = i + 1; j < n; ++j) {
30                if (isClean[j] && graph[i][j] == 1) {
31                    union(i, j);
32                }
33            }
34        }
35      
36        // Count how many initial nodes affect each component
37        int[] affectedCount = new int[n];
38        Map<Integer, Set<Integer>> malwareSources = new HashMap<>();
39        for (int infectedNode : initial) {
40            Set<Integer> affectedComponents = new HashSet<>();
41            for (int j = 0; j < n; ++j) {
42                if (isClean[j] && graph[infectedNode][j] == 1) {
43                    affectedComponents.add(find(j));
44                }
45            }
46            for (int root : affectedComponents) {
47                affectedCount[root] += 1;
48            }
49            malwareSources.put(infectedNode, affectedComponents);
50        }
51      
52        int maxAffected = -1;
53        int result = 0;
54      
55        // Choose the node removing which minimizes the spread of malware
56        for (Map.Entry<Integer, Set<Integer>> entry : malwareSources.entrySet()) {
57            int node = entry.getKey();
58            int totalAffected = 0;
59            for (int root : entry.getValue()) {
60                if (affectedCount[root] == 1) { // Only affected by one malware
61                    totalAffected += componentSize[root];
62                }
63            }
64            if (maxAffected < totalAffected || (maxAffected == totalAffected && node < result)) {
65                maxAffected = totalAffected;
66                result = node;
67            }
68        }
69        return result;
70    }
71
72    // Find the root of the component to which node x belongs
73    private int find(int x) {
74        if (parent[x] != x) {
75            parent[x] = find(parent[x]);
76        }
77        return parent[x];
78    }
79
80    // Union two components by size
81    private void union(int a, int b) {
82        int rootA = find(a);
83        int rootB = find(b);
84        if (rootA != rootB) {
85            if (componentSize[rootA] < componentSize[rootB]) {
86                parent[rootA] = rootB;
87                componentSize[rootB] += componentSize[rootA];
88            } else {
89                parent[rootB] = rootA;
90                componentSize[rootA] += componentSize[rootB];
91            }
92        }
93    }
94}
95
1class Solution {
2public:
3    std::vector<int> parent;  // Array to hold the parent of each node
4    std::vector<int> clusterSize;  // Array to track the size of each cluster
5
6    int minMalwareSpread(std::vector<std::vector<int>>& graph, std::vector<int>& initial) {
7        int n = graph.size();
8        parent.resize(n);
9        clusterSize.resize(n);
10      
11        // Initialize every node to be its own parent, forming n clusters
12        for (int i = 0; i < n; ++i) {
13            parent[i] = i;
14        }
15
16        // Initially, each cluster size is 1
17        std::fill(clusterSize.begin(), clusterSize.end(), 1);
18
19        // Track which nodes are clean (not initially infected)
20        std::vector<bool> isClean(n, true);
21        for (int infectedNode : initial) {
22            isClean[infectedNode] = false;
23        }
24      
25        // Union the clean nodes that are connected
26        for (int i = 0; i < n; ++i) {
27            if (!isClean[i]) continue;
28            for (int j = i + 1; j < n; ++j) {
29                if (isClean[j] && graph[i][j] == 1) {
30                    merge(i, j);
31                }
32            }
33        }
34
35        // Count the number of infected clusters
36        std::vector<int> infectedClustersCount(n, 0);
37        std::unordered_map<int, std::unordered_set<int>> infectedClusterRoots;
38        for (int infectedNode : initial) {
39            std::unordered_set<int> uniqueClusters;
40            for (int j = 0; j < n; ++j) {
41                if (isClean[j] && graph[infectedNode][j] == 1) {
42                    uniqueClusters.insert(find(j));
43                }
44            }
45            for (int root : uniqueClusters) {
46                ++infectedClustersCount[root];
47            }
48            infectedClusterRoots[infectedNode] = uniqueClusters;
49        }
50
51        int maxClusterSize = -1;
52        int nodeToRemove = 0;
53
54        // Find the node removal that maximizes the clean cluster size
55        for (auto& [infectedNode, roots] : infectedClusterRoots) {
56            int cleanClusterSizeSum = 0;
57            for (int root : roots) {
58                if (infectedClustersCount[root] == 1) { // If the cluster is only infected once
59                    cleanClusterSizeSum += clusterSize[root];
60                }
61            }
62            if (maxClusterSize < cleanClusterSizeSum || (maxClusterSize == cleanClusterSizeSum && infectedNode < nodeToRemove)) {
63                maxClusterSize = cleanClusterSizeSum;
64                nodeToRemove = infectedNode;
65            }
66        }
67
68        return nodeToRemove;
69    }
70
71    // Find operation with path compression
72    int find(int x) {
73        if (parent[x] != x) {
74            parent[x] = find(parent[x]);  // Compress the path for efficiency
75        }
76        return parent[x];
77    }
78
79    // Merge (union) operation to unify clusters
80    void merge(int a, int b) {
81        int parentA = find(a);
82        int parentB = find(b);
83        if (parentA != parentB) {
84            clusterSize[parentB] += clusterSize[parentA];  // Merge smaller cluster into larger one
85            parent[parentA] = parentB;  // Make the parent of the root of A the root of B
86        }
87    }
88};
89
1// Array to hold the parent of each node
2let parent: number[] = [];
3
4// Array to track the size of each cluster
5let clusterSize: number[] = [];
6
7// Find operation with path compression
8function find(x: number): number {
9  if (parent[x] !== x) {
10    // Compress the path for efficiency
11    parent[x] = find(parent[x]);
12  }
13  return parent[x];
14}
15
16// Merge (union) operation to unify clusters
17function merge(a: number, b: number): void {
18  const parentA: number = find(a);
19  const parentB: number = find(b);
20  // Merge smaller cluster into larger one
21  if (parentA !== parentB) {
22    clusterSize[parentB] += clusterSize[parentA];
23    // Make the parent of the root of A the root of B
24    parent[parentA] = parentB;
25  }
26}
27
28// Main function to find the node whose removal minimizes the spread of malware
29function minMalwareSpread(graph: number[][], initial: number[]): number {
30  const n: number = graph.length;
31  parent.length = n;
32  clusterSize.length = n;
33
34  // Initialize every node to be its own parent, forming n clusters
35  for (let i = 0; i < n; ++i) {
36    parent[i] = i;
37  }
38
39  // Initially, each cluster size is 1
40  clusterSize.fill(1);
41
42  // Track which nodes are clean (not initially infected)
43  const isClean: boolean[] = new Array(n).fill(true);
44  for (const infectedNode of initial) {
45    isClean[infectedNode] = false;
46  }
47
48  // Union the clean nodes that are connected
49  for (let i = 0; i < n; ++i) {
50    if (!isClean[i]) continue;
51    for (let j = i + 1; j < n; ++j) {
52      if (isClean[j] && graph[i][j] === 1) {
53        merge(i, j);
54      }
55    }
56  }
57
58  // Count the number of infected clusters
59  const infectedClustersCount: number[] = new Array(n).fill(0);
60  const infectedClusterRoots: Map<number, Set<number>> = new Map();
61  for (const infectedNode of initial) {
62    const uniqueClusters: Set<number> = new Set();
63    for (let j = 0; j < n; ++j) {
64      if (isClean[j] && graph[infectedNode][j] === 1) {
65        uniqueClusters.add(find(j));
66      }
67    }
68    for (const root of uniqueClusters) {
69      infectedClustersCount[root]++;
70    }
71    infectedClusterRoots.set(infectedNode, uniqueClusters);
72  }
73
74  let maxClusterSize: number = -1;
75  let nodeToRemove: number = -1;
76
77  // Find the node removal that maximizes the clean cluster size
78  for (const [infectedNode, roots] of infectedClusterRoots.entries()) {
79    let cleanClusterSizeSum: number = 0;
80    for (const root of roots) {
81      if (infectedClustersCount[root] === 1) { // If the cluster is only infected once
82        cleanClusterSizeSum += clusterSize[root];
83      }
84    }
85    if (
86      maxClusterSize < cleanClusterSizeSum ||
87      (maxClusterSize === cleanClusterSizeSum && infectedNode < nodeToRemove)
88    ) {
89      maxClusterSize = cleanClusterSizeSum;
90      nodeToRemove = infectedNode;
91    }
92  }
93
94  // Ensure that if no node can improve the situation, the smallest index is chosen
95  if (nodeToRemove === -1) {
96    nodeToRemove = Math.min(...initial);
97  }
98
99  return nodeToRemove;
100}
101

Time and Space Complexity

The given code is designed to find the node whose removal would minimize the spread of malware across a network represented by the graph, considering that some nodes are initially infected, as specified in the initial list.

Time Complexity

The time complexity of the algorithm can be analyzed based on the steps it performs:

  1. find function: Each find operation takes amortized O(α(n)) time, where α(n) is the inverse Ackermann function, which is a very slow-growing function and can be considered almost constant in this context.
  2. union function: The union operation essentially calls find twice, and then performs constant-time operations. Thus, it also works in amortized O(α(n)) time.
  3. Initialization of data structures p and size, and marking clean nodes: Takes O(n) time, where n is the number of nodes in the graph.
  4. Nested loop for union-find operations: In the worst case, it could potentially go through all possible edges, leading to O(n^2) operations. Within this loop, union is called which is amortized O(α(n)).
  5. Loop for initializing cnt and mp dictionaries: This loop iterates over nodes in initial and within that has a loop over n nodes, resulting in a time complexity of O(k*n), where k is the number of initially infected nodes.
  6. Final loop for finding the answer: This loop goes through nodes in initial and performs operations that are linear with respect to the size of the set of connected nodes, leading to O(k*n).

The most time-consuming part is therefore the nested loop for union-find operations, giving the time complexity as O(n^2 * α(n)).

Space Complexity

The space complexity is determined by the data structures used:

  1. p and size arrays: These are both of size O(n).
  2. clean array: Also of size O(n).
  3. cnt and mp dictionaries: cnt will have at most O(n) entries corresponding to different components, whereas mp will have at most O(k) entries where k is the length of initial.

The main space consumption comes from the arrays and dictionaries, hence the space complexity is O(n + k), where k can be considered much smaller than n in most practical scenarios, simplifying the space complexity to O(n).

Overall, the algorithm's time complexity is O(n^2 * α(n)) and its space complexity is O(n).

Learn more about how to find time and space complexity quickly using problem constraints.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

Which technique can we use to find the middle of a linked list?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!