928. Minimize Malware Spread II
Problem Description
In this problem, we have a network of n
nodes and these nodes are represented as an n x n
adjacency matrix called [graph](/problems/graph_intro)
. The matrix element graph[i][j] == 1
indicates that there is a direct connection between node i
and node j
. Initially, some nodes are infected with malware (an array initial
is given to represent which ones). The malware spreads between directly connected nodes: if one node is infected, its direct connections also become infected. This spread continues until no new nodes can be infected.
The goal is to determine which single node, if removed from the array of initially infected nodes, would result in the minimum number of final infected nodes in the network after the spread stops. This process also involves removing all connections to and from this node. The task is to return the node that minimizes the final number of infected nodes. If there are multiple nodes whose removal would yield the same minimum number of infections, return the one with the smallest index.
The challenge lies in simulating the malware spread and determining the optimal node for removal to minimize the spread.
Flowchart Walkthrough
Let's analyze the algorithm using the Flowchart. Here's a step-by-step walkthrough for LeetCode 928, Minimize Malware Spread II:
Is it a graph?
- Yes: The problem presents a network of computers as a graph, where nodes represent computers and edges represent connections between them.
Is it a tree?
- No: The graph may have cycles and is not necessarily a hierarchical tree structure.
Is the problem related to directed acyclic graphs (DAGs)?
- No: The graph might contain cycles, as it represents an interconnected network, not a DAG.
Is the problem related to shortest paths?
- No: The goal is to find a node (computer) whose removal minimizes the spread of malware, not to find the shortest path between nodes.
Does the problem involve connectivity?
- Yes: The solution involves understanding the component connectivity to effectively reduce the impact of malware, by identifying which nodes, when removed, disrupt the most significant number of malware connections.
Is the graph weighted?
- No: Each connection between computers is treated equally, without specific weights.
Since the problem involves navigating through an unweighted connectivity graph to identify critical nodes, the flowchart suggests using Depth-First Search (DFS) or Breadth-First Search (BFS). Given that DFS is often simpler to implement recursively and can conveniently explore all potential paths and backtracking, Depth-First Search (DFS) becomes suitable for this scenario.
Conclusion: The flowchart guides us to utilize DFS for this unweighted connectivity problem in an iterative or recursive manner to manage and understand the network's structure for minimizing malware spread.
Intuition
The intuition behind the solution involves looking at the problem as a kind of network analysis combined with optimization. The solution requires us to determine the impact of each initially infected node's removal on the overall spread of the infection throughout the network.
To achieve this, we can use a union-find data structure (also known as a disjoint-set union) to handle the grouping of connected nodes. Here, the idea is to perform the following steps:
-
Use the union-find to group all clean nodes. If two clean nodes are connected, they belong to the same group.
-
Count the connections between each initially infected node and the groups of clean nodes. We want to know how many clean groups each infected node can infect.
-
Calculate the impact of removing each initially infected node. An infected node's impact is determined by the number of clean nodes in groups it can infect by itself (without other infected nodes connecting to those same groups).
-
Pick the initially infected node that, when removed, would prevent the most spread of the malware. This is the node connected to the largest number of clean nodes that are not also connected to other initially infected nodes.
-
If there are multiple nodes that meet the criterion in step 4, choose the node with the smallest index.
This approach relies on understanding the structure of the network and using the union-find to efficiently manage connected components, ultimately finding the node whose removal minimizes the spread of malware.
Learn more about Depth-First Search, Breadth-First Search, Union Find and Graph patterns.
Solution Approach
The implementation of the solution involves the following key components:
-
Union-Find Data Structure: This is used to group the clean nodes that are directly connected. Each group is represented by a
root
, and all nodes in the same group will have the same root after the connections are processed. -
Parent Array (
p
) and Size Array (size
): The parent arrayp
keeps track of the root of each node, initially set to the node itself. The size arraysize
keeps track of the size of the group that the node belongs to if it is a root, initialized with 1 for all nodes. -
Clean Nodes Identification: An array
clean
is used to identify whether a node is initially infected or not. This helps to avoid processing initially infected nodes during the union-find grouping phase.
The algorithm can be described in the following steps:
-
Initialize Union-Find:
- The parent (
p
) array is initialized with each node as its own parent. - The size (
size
) array is filled with1
since initially, each group (each node being its own group) has a size of one. - A
clean
array is initialized toTrue
for all nodes and then set toFalse
for nodes in theinitial
list of infected nodes.
- The parent (
-
Group Clean Nodes:
- Iterate over pairs of nodes
(i, j)
and if both nodes are clean and they are directly connected ([graph](/problems/graph_intro)[i][j] == 1
), perform theunion
of the two nodes. - The
union
operation finds the root of each node and if they have different roots, it connects them by setting the parent of one root to be the other root and updates the size of the group.
- Iterate over pairs of nodes
-
Count Malware Connections:
- Create a
Counter
for infected connections to groups (cnt
) and a mapmp
from infected node to the groups of clean nodes it can infect. - For each infected node in the
initial
list, determine the unique roots of clean groups that are connected to it. Increment theCounter
for each of these roots.
- Create a
-
Determine the Optimal Node for Removal:
- Initialize variables
mx
andans
to keep track of the maximum number of nodes that can be saved and the corresponding node index. - For each infected node, sum up the sizes of the unique clean groups to which it's connected if those groups have only this one infected connection (
cnt[root] == 1
). This gives us the number of nodes that would be saved by removing this infected node. - Check if this number (
t
) is greater than the current maximum (mx
). If yes, updatemx
. Ift
equalsmx
, then updateans
if the current node has a smaller index.
- Initialize variables
-
Return the Optimal Node:
- After iterating through all the infected nodes and considering the impact of their removal, return the node index stored in
ans
.
- After iterating through all the infected nodes and considering the impact of their removal, return the node index stored in
The patterns used in this solution are union-find for grouping and analysis of connected components, and greedy algorithms with respect to choosing the node whose removal results in the most significant benefit (minimizes the spread of malware).
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let’s go through a small example to illustrate the solution approach. Suppose we have a network of 5
nodes and the following adjacency matrix:
graph = [ [1, 1, 0, 0, 0], [1, 1, 0, 1, 0], [0, 0, 1, 0, 1], [0, 1, 0, 1, 1], [0, 0, 1, 1, 1] ]
And the array of initially infected nodes is given by:
initial = [1, 3]
This means that nodes 1
and 3
are initially infected. We need to find out which one we should remove to minimize the final number of infections.
Initialize Union-Find
- Create parent array
p
as[0, 1, 2, 3, 4]
and size arraysize
as[1, 1, 1, 1, 1]
. - Set
clean
array as[True, False, True, False, True]
to reflect that nodes1
and3
are infected and nodes0
,2
, and4
are clean.
Group Clean Nodes
- Iterate over the matrix and apply union to the clean nodes that are directly connected:
- Nodes
0
and2
are not directly connected. - Nodes
2
and4
are directly connected, so we union them. Nowp
looks like[0, 1, 2, 3, 2]
andsize
looks like[1, 1, 2, 1, 1]
. Node2
is the root of nodes2
and4
. - Nodes
0
and4
are not directly connected (because node0
is not connected to node2
, which is the root of node4
).
- Nodes
Count Malware Connections
- We initialize a Counter
cnt
to track the connections from infected to clean groups and a mapmp
to track the mapping of infected nodes to clean groups. - For infected node
1
, we find that it connects to clean node0
(group root is0
). - For infected node
3
, it connects to the clean group represented by root2
(node2
).
Determine the Optimal Node for Removal
- We check for node
1
and calculate that by removing it, we can save1
clean node (since it's the only connection to node0
). - For node
3
, removing it would save2
clean nodes (group size of node2
which is the only infected connection). - Since removing node
3
saves more nodes, it has a greater impact, leading us to choose node3
for removal.
Return the Optimal Node
- Based on our calculations, the best node to remove to minimize infections is node
3
, and that would be our final answer for this scenario.
Solution Implementation
1from typing import List
2from collections import Counter
3
4class Solution:
5 def minMalwareSpread(self, graph: List[List[int]], initial: List[int]) -> int:
6 # Helper function to find the root node of 'x' using path compression for optimization
7 def find(x):
8 if parent[x] != x:
9 parent[x] = find(parent[x])
10 return parent[x]
11
12 # Helper function to union two sets, merging the smaller tree into the larger one
13 def union(a, b):
14 root_a, root_b = find(a), find(b)
15 if root_a != root_b:
16 component_size[root_b] += component_size[root_a]
17 parent[root_a] = root_b
18
19 n = len(graph)
20 parent = list(range(n)) # Initialize each node to be its own parent
21 component_size = [1] * n # Keep track of the size of each component
22 is_clean = [True] * n # Keep track if a node is not initially infected
23 for node in initial:
24 is_clean[node] = False
25
26 # Union find to build up components of clean nodes
27 for i in range(n):
28 if not is_clean[i]:
29 continue
30 for j in range(i + 1, n):
31 if is_clean[j] and graph[i][j] == 1:
32 union(i, j)
33
34 affected_component_count = Counter() # Counts the number of initially infected nodes affecting the component
35 node_to_components = {} # Maps an initially infected node to the components it affects
36 for node in initial:
37 affected_components = {find(j) for j in range(n) if is_clean[j] and graph[node][j] == 1}
38 for root in affected_components:
39 affected_component_count[root] += 1
40 node_to_components[node] = affected_components
41
42 # Initialize variables to find the optimal node to remove
43 max_reduction, result_node = -1, 0
44 for node, affected_components in node_to_components.items():
45 # Calculate the total reduction in number of infected nodes
46 # if this 'node' is removed and is the only one affecting those components
47 reduction = sum(component_size[root] for root in affected_components if affected_component_count[root] == 1)
48
49 # Update the result if this 'node' provides a higher reduction of infected nodes
50 # or if there is a tie, take the 'node' with the smaller index
51 if max_reduction < reduction or (max_reduction == reduction and node < result_node):
52 max_reduction, result_node = reduction, node
53
54 return result_node
55
1class Solution {
2
3 private int[] parent; // parent[i] is the parent of node i in the union-find data structure
4 private int[] componentSize; // componentSize[i] is the size of the component whose root is i
5
6 public int minMalwareSpread(int[][] graph, int[] initial) {
7 int n = graph.length;
8 parent = new int[n];
9 componentSize = new int[n];
10
11 // Initialize the union-find data structure
12 for (int i = 0; i < n; ++i) {
13 parent[i] = i;
14 componentSize[i] = 1;
15 }
16
17 // Keep track of clean nodes (non-malware)
18 boolean[] isClean = new boolean[n];
19 Arrays.fill(isClean, true);
20 for (int infectedNode : initial) {
21 isClean[infectedNode] = false;
22 }
23
24 // Union clean nodes that are connected
25 for (int i = 0; i < n; ++i) {
26 if (!isClean[i]) {
27 continue;
28 }
29 for (int j = i + 1; j < n; ++j) {
30 if (isClean[j] && graph[i][j] == 1) {
31 union(i, j);
32 }
33 }
34 }
35
36 // Count how many initial nodes affect each component
37 int[] affectedCount = new int[n];
38 Map<Integer, Set<Integer>> malwareSources = new HashMap<>();
39 for (int infectedNode : initial) {
40 Set<Integer> affectedComponents = new HashSet<>();
41 for (int j = 0; j < n; ++j) {
42 if (isClean[j] && graph[infectedNode][j] == 1) {
43 affectedComponents.add(find(j));
44 }
45 }
46 for (int root : affectedComponents) {
47 affectedCount[root] += 1;
48 }
49 malwareSources.put(infectedNode, affectedComponents);
50 }
51
52 int maxAffected = -1;
53 int result = 0;
54
55 // Choose the node removing which minimizes the spread of malware
56 for (Map.Entry<Integer, Set<Integer>> entry : malwareSources.entrySet()) {
57 int node = entry.getKey();
58 int totalAffected = 0;
59 for (int root : entry.getValue()) {
60 if (affectedCount[root] == 1) { // Only affected by one malware
61 totalAffected += componentSize[root];
62 }
63 }
64 if (maxAffected < totalAffected || (maxAffected == totalAffected && node < result)) {
65 maxAffected = totalAffected;
66 result = node;
67 }
68 }
69 return result;
70 }
71
72 // Find the root of the component to which node x belongs
73 private int find(int x) {
74 if (parent[x] != x) {
75 parent[x] = find(parent[x]);
76 }
77 return parent[x];
78 }
79
80 // Union two components by size
81 private void union(int a, int b) {
82 int rootA = find(a);
83 int rootB = find(b);
84 if (rootA != rootB) {
85 if (componentSize[rootA] < componentSize[rootB]) {
86 parent[rootA] = rootB;
87 componentSize[rootB] += componentSize[rootA];
88 } else {
89 parent[rootB] = rootA;
90 componentSize[rootA] += componentSize[rootB];
91 }
92 }
93 }
94}
95
1class Solution {
2public:
3 std::vector<int> parent; // Array to hold the parent of each node
4 std::vector<int> clusterSize; // Array to track the size of each cluster
5
6 int minMalwareSpread(std::vector<std::vector<int>>& graph, std::vector<int>& initial) {
7 int n = graph.size();
8 parent.resize(n);
9 clusterSize.resize(n);
10
11 // Initialize every node to be its own parent, forming n clusters
12 for (int i = 0; i < n; ++i) {
13 parent[i] = i;
14 }
15
16 // Initially, each cluster size is 1
17 std::fill(clusterSize.begin(), clusterSize.end(), 1);
18
19 // Track which nodes are clean (not initially infected)
20 std::vector<bool> isClean(n, true);
21 for (int infectedNode : initial) {
22 isClean[infectedNode] = false;
23 }
24
25 // Union the clean nodes that are connected
26 for (int i = 0; i < n; ++i) {
27 if (!isClean[i]) continue;
28 for (int j = i + 1; j < n; ++j) {
29 if (isClean[j] && graph[i][j] == 1) {
30 merge(i, j);
31 }
32 }
33 }
34
35 // Count the number of infected clusters
36 std::vector<int> infectedClustersCount(n, 0);
37 std::unordered_map<int, std::unordered_set<int>> infectedClusterRoots;
38 for (int infectedNode : initial) {
39 std::unordered_set<int> uniqueClusters;
40 for (int j = 0; j < n; ++j) {
41 if (isClean[j] && graph[infectedNode][j] == 1) {
42 uniqueClusters.insert(find(j));
43 }
44 }
45 for (int root : uniqueClusters) {
46 ++infectedClustersCount[root];
47 }
48 infectedClusterRoots[infectedNode] = uniqueClusters;
49 }
50
51 int maxClusterSize = -1;
52 int nodeToRemove = 0;
53
54 // Find the node removal that maximizes the clean cluster size
55 for (auto& [infectedNode, roots] : infectedClusterRoots) {
56 int cleanClusterSizeSum = 0;
57 for (int root : roots) {
58 if (infectedClustersCount[root] == 1) { // If the cluster is only infected once
59 cleanClusterSizeSum += clusterSize[root];
60 }
61 }
62 if (maxClusterSize < cleanClusterSizeSum || (maxClusterSize == cleanClusterSizeSum && infectedNode < nodeToRemove)) {
63 maxClusterSize = cleanClusterSizeSum;
64 nodeToRemove = infectedNode;
65 }
66 }
67
68 return nodeToRemove;
69 }
70
71 // Find operation with path compression
72 int find(int x) {
73 if (parent[x] != x) {
74 parent[x] = find(parent[x]); // Compress the path for efficiency
75 }
76 return parent[x];
77 }
78
79 // Merge (union) operation to unify clusters
80 void merge(int a, int b) {
81 int parentA = find(a);
82 int parentB = find(b);
83 if (parentA != parentB) {
84 clusterSize[parentB] += clusterSize[parentA]; // Merge smaller cluster into larger one
85 parent[parentA] = parentB; // Make the parent of the root of A the root of B
86 }
87 }
88};
89
1// Array to hold the parent of each node
2let parent: number[] = [];
3
4// Array to track the size of each cluster
5let clusterSize: number[] = [];
6
7// Find operation with path compression
8function find(x: number): number {
9 if (parent[x] !== x) {
10 // Compress the path for efficiency
11 parent[x] = find(parent[x]);
12 }
13 return parent[x];
14}
15
16// Merge (union) operation to unify clusters
17function merge(a: number, b: number): void {
18 const parentA: number = find(a);
19 const parentB: number = find(b);
20 // Merge smaller cluster into larger one
21 if (parentA !== parentB) {
22 clusterSize[parentB] += clusterSize[parentA];
23 // Make the parent of the root of A the root of B
24 parent[parentA] = parentB;
25 }
26}
27
28// Main function to find the node whose removal minimizes the spread of malware
29function minMalwareSpread(graph: number[][], initial: number[]): number {
30 const n: number = graph.length;
31 parent.length = n;
32 clusterSize.length = n;
33
34 // Initialize every node to be its own parent, forming n clusters
35 for (let i = 0; i < n; ++i) {
36 parent[i] = i;
37 }
38
39 // Initially, each cluster size is 1
40 clusterSize.fill(1);
41
42 // Track which nodes are clean (not initially infected)
43 const isClean: boolean[] = new Array(n).fill(true);
44 for (const infectedNode of initial) {
45 isClean[infectedNode] = false;
46 }
47
48 // Union the clean nodes that are connected
49 for (let i = 0; i < n; ++i) {
50 if (!isClean[i]) continue;
51 for (let j = i + 1; j < n; ++j) {
52 if (isClean[j] && graph[i][j] === 1) {
53 merge(i, j);
54 }
55 }
56 }
57
58 // Count the number of infected clusters
59 const infectedClustersCount: number[] = new Array(n).fill(0);
60 const infectedClusterRoots: Map<number, Set<number>> = new Map();
61 for (const infectedNode of initial) {
62 const uniqueClusters: Set<number> = new Set();
63 for (let j = 0; j < n; ++j) {
64 if (isClean[j] && graph[infectedNode][j] === 1) {
65 uniqueClusters.add(find(j));
66 }
67 }
68 for (const root of uniqueClusters) {
69 infectedClustersCount[root]++;
70 }
71 infectedClusterRoots.set(infectedNode, uniqueClusters);
72 }
73
74 let maxClusterSize: number = -1;
75 let nodeToRemove: number = -1;
76
77 // Find the node removal that maximizes the clean cluster size
78 for (const [infectedNode, roots] of infectedClusterRoots.entries()) {
79 let cleanClusterSizeSum: number = 0;
80 for (const root of roots) {
81 if (infectedClustersCount[root] === 1) { // If the cluster is only infected once
82 cleanClusterSizeSum += clusterSize[root];
83 }
84 }
85 if (
86 maxClusterSize < cleanClusterSizeSum ||
87 (maxClusterSize === cleanClusterSizeSum && infectedNode < nodeToRemove)
88 ) {
89 maxClusterSize = cleanClusterSizeSum;
90 nodeToRemove = infectedNode;
91 }
92 }
93
94 // Ensure that if no node can improve the situation, the smallest index is chosen
95 if (nodeToRemove === -1) {
96 nodeToRemove = Math.min(...initial);
97 }
98
99 return nodeToRemove;
100}
101
Time and Space Complexity
The given code is designed to find the node whose removal would minimize the spread of malware across a network represented by the graph
, considering that some nodes are initially infected, as specified in the initial
list.
Time Complexity
The time complexity of the algorithm can be analyzed based on the steps it performs:
find
function: Eachfind
operation takes amortized O(α(n)) time, where α(n) is the inverse Ackermann function, which is a very slow-growing function and can be considered almost constant in this context.union
function: Theunion
operation essentially callsfind
twice, and then performs constant-time operations. Thus, it also works in amortized O(α(n)) time.- Initialization of data structures
p
andsize
, and marking clean nodes: Takes O(n) time, where n is the number of nodes in the graph. - Nested loop for union-find operations: In the worst case, it could potentially go through all possible edges, leading to O(n^2) operations. Within this loop,
union
is called which is amortized O(α(n)). - Loop for initializing
cnt
andmp
dictionaries: This loop iterates over nodes ininitial
and within that has a loop overn
nodes, resulting in a time complexity of O(k*n), where k is the number of initially infected nodes. - Final loop for finding the answer: This loop goes through nodes in
initial
and performs operations that are linear with respect to the size of the set of connected nodes, leading to O(k*n).
The most time-consuming part is therefore the nested loop for union-find operations, giving the time complexity as O(n^2 * α(n))
.
Space Complexity
The space complexity is determined by the data structures used:
p
andsize
arrays: These are both of size O(n).clean
array: Also of size O(n).cnt
andmp
dictionaries:cnt
will have at most O(n) entries corresponding to different components, whereasmp
will have at most O(k) entries where k is the length of initial.
The main space consumption comes from the arrays and dictionaries, hence the space complexity is O(n + k), where k can be considered much smaller than n in most practical scenarios, simplifying the space complexity to O(n)
.
Overall, the algorithm's time complexity is O(n^2 * α(n))
and its space complexity is O(n)
.
Learn more about how to find time and space complexity quickly using problem constraints.
Which technique can we use to find the middle of a linked list?
Recommended Readings
https algomonster s3 us east 2 amazonaws com cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS
https algomonster s3 us east 2 amazonaws com cover_photos bfs svg Breadth First Search on Trees Hopefully by this time you've drunk enough DFS Kool Aid to understand its immense power and seen enough visualization to create a call stack in your mind Now let me introduce the companion spell
Union Find Disjoint Set Union Data Structure Introduction Prerequisite Depth First Search Review problems dfs_intro Once we have a strong grasp of recursion and Depth First Search we can now introduce Disjoint Set Union DSU This data structure is motivated by the following problem Suppose we have sets of elements
Want a Structured Path to Master System Design Too? Don’t Miss This!