924. Minimize Malware Spread


Problem Description

In this problem, we are given a network of n nodes where connections between the nodes are represented by an n x n adjacency matrix called [graph](/problems/graph_intro). In this matrix, if graph[i][j] == 1, then node i is directly connected to node j. Some of these nodes are initially infected with malware (given in the array initial). The malware spreads between directly connected nodes as long as at least one of the nodes has the malware. This spread continues until it cannot be passed along any further between nodes.

Our goal is to identify one node that we can remove from the initially infected list to minimize the total number of nodes that will be infected after the malware has stopped spreading. The function should return the smallest-indexed node that would result in the minimum number of infections.

It's important to note that removing a node from the initially infected list doesn't guarantee that it will not become infected later due to the malware's spread.

Flowchart Walkthrough

First, let's analyze the problem using the Flowchart. Here is a detailed step-by-step walkthrough:

Is it a graph?

  • Yes: The network is given as a graph where computers are nodes, and connections are edges.

Is it a tree?

  • No: The graph in the problem can contain cycles as it is not necessarily hierarchical.

Is the problem related to directed acyclic graphs (DAGs)?

  • No: The connectivity relation does not involve directionality or acyclic constraints, as it focuses on spreading malware across possibly circular networks.

Is the problem related to shortest paths?

  • No: We are not looking to find the shortest path; rather, we aim to minimize the spread of malware by potentially removing a node.

Does the problem involve connectivity?

  • Yes: The core of the problem is about understanding and manipulating the connectivity of the graph to minimize the number of nodes (computers) affected by malware.

Is the graph weighted?

  • No: The graph does not specify weights on the connections; it simply indicates whether nodes (computers) are connected or not.

Since the problem is about exploring connectivity to identify components and minimize malware spread without the context of weights, and it is not a tree structure or a requirement for shortest paths, DFS (Depth-First Search) is a suitable method. The approach using DFS will efficiently allow us to explore each component formed by nodes and edges, helping to identify the largest component affected by malware that could be minimized by removing certain affected nodes.

Conclusion: the Flowchart suggests using DFS for this unweighted connectivity problem to efficiently analyze and potentially minimize the spread of malware in the network.

Intuition

To solve this problem, the code takes a union-find approach to determine which nodes are connected in a group, and to gather information about the size of each connected component within the graph. The algorithm works through the following steps:

  1. Initialize each node's parent to itself, creating n separate sets, and set the initial size of each set to 1.
  2. Loop through the adjacency matrix [graph](/problems/graph_intro) and perform a union of each pair of directly connected nodes (graph[i][j] == 1). This is done by finding the root parents of both nodes and setting one as the parent of the other, effectively joining the sets. The size array is updated to reflect the new size of the combined set.
  3. Iterate over the initially infected nodes after sorting them. This ensures that if multiple nodes can be removed to achieve the same minimized infection result, we will return the one with the smallest index.
  4. For each initially infected node, temporarily exclude it and tally the total infections that would result by adding the sizes of the unique parent sets containing the other initially infected nodes.
  5. If this tally is smaller than the current minimum number of infections, update the minimum and set the result to the current node.

By executing these steps, the algorithm finds the node which, when removed, would lead to the smallest number of infected nodes. In cases where multiple nodes could be chosen, it opts for the one with the smallest index.

Learn more about Depth-First Search, Breadth-First Search, Union Find and Graph patterns.

Solution Approach

The solution uses the union-find algorithm to group connected nodes and dynamically track the size of these groups. Union-find is a data structure that provides efficient operations to find the root of an element (indicating which group it belongs to) and to unify two elements if they are in different sets (or groups). Here is a detailed walk-through of its implementation in the given solution:

Union-Find Data Structure

  • A parent array p is initialized, where p[i] represents the parent of node i. Initially, each node is its own parent.
  • A size array is initialized, where size[i] represents the size of the tree for which node i is the root. Initially, each node is in a separate group of size 1.

Union Operation

  • Two loops iterate over the rows and columns of the adjacency matrix representing the graph.
  • If graph[i][j] == 1, nodes i and j are directly connected and should be considered part of the same group.
  • We use a find function to get the root parent for nodes i and j, which effectively finds which groups they belong to.
  • If they have the same root parent, they are already in the same group, and no action is taken.
  • Otherwise, they belong to different groups, and a union is performed by updating the parent of one root to the other. The size of the new root's group is incremented by the size of the merged group's size.

Find Operation

  • To determine which node is the parent (or root) of a given node, and thus which group it belongs to, we use the find function.
  • The find function will recursively update the node's parent to its root by path compression, which flattens the structure of the tree and speeds up subsequent find operations.

Identifying the Critical Node

  • The array of initially infected nodes initial is sorted to ensure the smallest index is prioritized if there are multiple candidates.
  • A loop iterates over the sorted initial array to consider the removal of each node.
  • Inside this loop, a set s is created to store unique root parents associated with the remaining initially infected nodes.
  • A temporary count t is used to tally the total possible infections without the considered node.
  • A nested loop iterates through the initial array again to count infections caused by the remaining initially infected nodes.
  • If the root of the current node is not already in the set s (to avoid double counting), the size of the group with that root is added to t (since removing a node from initial doesn't prevent it from being infected by others).
  • If the tally t is less than the current minimum mi, we update mi to the new lower count and set res to the index of the node considered for removal.
  • The loop continues until all initially infected nodes are considered, ensuring that the one leading to the smallest M(initial) is found.

Conclusion

By leveraging the union-find algorithm with path compression, we can efficiently identify the node that minimizes the spread of malware after exclusion and ensure we return the smallest index in the case of ties.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's say we have a network of 4 nodes, represented by the adjacency matrix graph below:

graph = [
  [1, 1, 0, 0],
  [1, 1, 1, 0],
  [0, 1, 1, 0],
  [0, 0, 0, 1]
]

And we have an array initial showing which nodes are initially infected with malware:

initial = [1, 3]

We can walk through the solution approach as follows:

  1. Union-Find Initialization: We start by initializing the parent and size arrays.

    • parent would be [0, 1, 2, 3]
    • size would be [1, 1, 1, 1]
  2. Union Operations: Next, we examine the adjacency matrix and perform union operations.

    • For graph[0][1], nodes 0 and 1 are connected, so we unite them, which leads to:
      • parent being updated to [0, 0, 2, 3]
      • size being updated to [2, 1, 1, 1] because the size of the group with root 0 now includes node 1.
    • For graph[1][2], nodes 1 and 2 are connected, and indirectly node 0 is also connected to node 2, so we unite them under root 0:
      • parent being updated to [0, 0, 0, 3]
      • size being updated to [3, 1, 1, 1]
    • graph[3] has no connections thus no union operations are needed.
  3. Sorting Initially Infected Nodes: We ensure that initial is sorted to consider the smallest index first:

    • Sorted initial would be [1, 3]
  4. Identifying the Critical Node: We start with the first node in initial and imagine removing it.

    • Considering node 1 for removal, we find that:
      • Node 3 is in its own group and will only infect itself.
      • There are no other initially infected nodes connected to node 1, as we temporarily exclude it.
    • The total number of infections in this case would be 1 (from node 3).
    • Next, considering node 3 for removal:
      • Now nodes 0, 1, and 2 are connected and since node 1 is infected, all three will get infected.
      • The total would then be 3.
    • Comparing the two removals, we find that excluding node 1 leads to fewer infections.
  5. Result: From this process, we find that removing node 1 from initial leads to the minimum number of infections. The final answer is 1, as it is the smallest-indexed node that minimizes the spread of malware.

In this simple network, the algorithm efficiently finds that removing node 1, despite it potentially getting infected through other paths, leads to the lowest number of total infected nodes.

Solution Implementation

1class Solution:
2    def minMalwareSpread(self, graph: List[List[int]], initial: List[int]) -> int:
3        # Number of nodes in the graph.
4        num_nodes = len(graph)
5      
6        # Parent array for Disjoint Set Union (DSU).
7        parent = list(range(num_nodes))
8      
9        # Size array to keep track of the size of each component.
10        component_size = [1] * num_nodes
11
12        # Function to find the representative (leader) of the component.
13        def find(x):
14            if parent[x] != x:
15                parent[x] = find(parent[x])
16            return parent[x]
17
18        # Union operation to merge two components into one.
19        for i in range(num_nodes):
20            for j in range(i + 1, num_nodes):
21                if graph[i][j] == 1:
22                    parent_i, parent_j = find(i), find(j)
23                    if parent_i != parent_j:
24                        parent[parent_i] = parent_j
25                        component_size[parent_j] += component_size[parent_i]
26
27        # Initialize to positive infinity for comparison purposes.
28        min_infected = float('inf')
29      
30        # Result node which gets minimum malware spread.
31        result_node = initial[0]
32      
33        # Sort the list to ensure that the smallest index is returned in case of ties.
34        initial.sort()
35      
36        for i in range(len(initial)):
37            total_infected = 0
38            infected_set = set()
39            for j in range(len(initial)):
40                if i == j:
41                    continue
42                # Find the parent node of the j-th initially infected node.
43                parent_node = find(initial[j])
44                if parent_node in infected_set:
45                    continue
46                # Add this parent node to the infected set and increment the total infected counter.
47                infected_set.add(parent_node)
48                total_infected += component_size[parent_node]
49          
50            # Compare the number of infected nodes with the current minimum.
51            if min_infected > total_infected:
52                min_infected = total_infected
53                result_node = initial[i]
54              
55        # Return the node that if removed, minimizes the spread of malware.
56        return result_node
57
1import java.util.Arrays;
2import java.util.HashSet;
3import java.util.Set;
4
5class Solution {
6    private int[] parent; // Array representing the parent of each node in the union-find structure.
7
8    public int minMalwareSpread(int[][] graph, int[] initial) {
9        int nodeCount = graph.length; // Number of nodes in the graph.
10        parent = new int[nodeCount];  // Initialize the parent array.
11        for (int i = 0; i < nodeCount; ++i) {
12            parent[i] = i; // Each node is initially its own parent.
13        }
14
15        // Array to keep track of the size of each connected component.
16        int[] componentSize = new int[nodeCount];
17        Arrays.fill(componentSize, 1); // Initially, each component is of size 1.
18
19        // Perform union operations for all connected nodes in the graph.
20        for (int i = 0; i < nodeCount; ++i) {
21            for (int j = i + 1; j < nodeCount; ++j) {
22                if (graph[i][j] == 1) {
23                    int parentA = find(i), parentB = find(j);
24                    if (parentA != parentB) { // If the nodes have different parents, perform union.
25                        parent[parentA] = parentB;
26                        // The size of the new root is the sum of the sizes of the components.
27                        componentSize[parentB] += componentSize[parentA];
28                    }
29                }
30            }
31        }
32
33        // Sort the initial infected nodes for stable selection later on.
34        Arrays.sort(initial);
35      
36        int minimumInfected = Integer.MAX_VALUE; // Variable to hold the minimum number of infections.
37        int resultNode = initial[0]; // Resulting node that, if removed, minimizes the spread of malware.
38
39        // Iterate through all initially infected nodes.
40        for (int i = 0; i < initial.length; ++i) {
41            int infectedCount = 0; // Count of potentially infected nodes.
42            Set<Integer> uniqueComponents = new HashSet<>(); // Set to track unique components.
43          
44            // Compare the current node against all other infected nodes.
45            for (int j = 0; j < initial.length; ++j) {
46                if (i == j) { // Skip the current node.
47                    continue;
48                }
49                int root = find(initial[j]); // Find the root of the j-th infected node.
50
51                // If this component has been already considered, skip it.
52                if (uniqueComponents.contains(root)) {
53                    continue;
54                }
55                uniqueComponents.add(root); // Mark this component as seen.
56                // Add the size of this component to the count of infected nodes.
57                infectedCount += componentSize[root];
58            }
59
60            // If the current node results in a lower number of infections, update the result.
61            if (minimumInfected > infectedCount) {
62                minimumInfected = infectedCount;
63                resultNode = initial[i];
64            }
65        }
66
67        return resultNode;
68    }
69
70    private int find(int x) {
71        // Find the root of the component for node x with path compression.
72        if (parent[x] != x) {
73            parent[x] = find(parent[x]);
74        }
75        return parent[x];
76    }
77}
78
1#include <vector>
2#include <algorithm>
3#include <unordered_set>
4using namespace std;
5
6class Solution {
7public:
8    vector<int> parent; // Parent array for the Union-Find data structure.
9
10    // Method to minimize malware spread on the graph, given the initial infected nodes.
11    int minMalwareSpread(vector<vector<int>>& graph, vector<int>& initial) {
12        int n = graph.size(); // Number of nodes in the graph.
13        parent.resize(n);
14        // Initialize parents of each node to itself and sizes to 1.
15        for (int i = 0; i < n; ++i) {
16            parent[i] = i;
17        }
18        vector<int> size(n, 1); // Size array to keep track of the size of each set.
19      
20        // Union-find algorithm to group the connected components.
21        for (int i = 0; i < n; ++i) {
22            for (int j = i + 1; j < n; ++j) {
23                if (graph[i][j]) {
24                    int parentA = find(i), parentB = find(j);
25                    if (parentA != parentB) {
26                        parent[parentA] = parentB; // Merge the sets.
27                        size[parentB] += size[parentA];
28                    }
29                }
30            }
31        }
32      
33        int minInfected = INT_MAX;
34        int result = initial[0];
35        sort(initial.begin(), initial.end()); // Sort initial infected nodes for stability.
36
37        // Evaluate the impact of removing each infected node.
38        for (int i = 0; i < initial.size(); ++i) {
39            int totalInfected = 0;
40            unordered_set<int> uniqueParents;
41            for (int j = 0; j < initial.size(); ++j) {
42                if (i == j) continue;
43                int root = find(initial[j]);
44                // Count unique uninfected components.
45                if (uniqueParents.count(root)) continue;
46                uniqueParents.insert(root);
47                totalInfected += size[root];
48            }
49            // If removing this node decreases the number of infected nodes.
50            if (minInfected > totalInfected) {
51                minInfected = totalInfected;
52                result = initial[i];
53            }
54        }
55        return result; // Return the node that minimizes the spread.
56    }
57
58    // Find method with path compression for the Union-Find data structure.
59    int find(int x) {
60        if (parent[x] != x) {
61            parent[x] = find(parent[x]);
62        }
63        return parent[x];
64    }
65};
66
1function minMalwareSpread(graph: number[][], initial: number[]): number {
2  let n = graph.length;
3
4  // Initialize parents of each node to itself.
5  let parent: number[] = new Array(n).fill(0).map((value, index) => index);
6  let size: number[] = new Array(n).fill(1); // Initialize the size of each set to 1.
7
8  // Find method with path compression for Union-Find.
9  function find(x: number): number {
10    if (parent[x] !== x) {
11      parent[x] = find(parent[x]);
12    }
13    return parent[x];
14  }
15
16  // Union-Find algorithm to group the connected components.
17  for (let i = 0; i < n; ++i) {
18    for (let j = i + 1; j < n; ++j) {
19      if (graph[i][j] === 1) {
20        let parentA = find(i);
21        let parentB = find(j);
22        if (parentA !== parentB) {
23          parent[parentA] = parentB; // Merge the sets.
24          size[parentB] += size[parentA];
25        }
26      }
27    }
28  }
29
30  // Sort the initial infected nodes for stability.
31  initial.sort((a, b) => a - b);
32
33  let minInfected = Number.MAX_SAFE_INTEGER;
34  let result = initial[0];
35
36  // Evaluate the impact of removing each infected node.
37  for (let i = 0; i < initial.length; ++i) {
38    let totalInfected = 0;
39    let uniqueParents: Set<number> = new Set();
40
41    for (let j = 0; j < initial.length; ++j) {
42      if (i === j) continue;
43      let root = find(initial[j]);
44
45      // Count unique uninfected components.
46      if (!uniqueParents.has(root)) {
47        uniqueParents.add(root);
48        totalInfected += size[root];
49      }
50    }
51
52    // Update the result if the current node reduces the number of infected nodes.
53    if (minInfected > totalInfected) {
54      minInfected = totalInfected;
55      result = initial[i];
56    }
57  }
58
59  // Return the node that minimizes the spread.
60  return result;
61}
62

Time and Space Complexity

Time Complexity

The time complexity of the code is determined by several nested loops and the union-find operations.

  1. The outer loop for merging nodes of the graph runs n times, where n is the number of nodes in the graph.
  2. The inner loop for checking all possible pairs runs (n-1) + (n-2) + ... + 1 times, which simplifies to O(n^2) when summing the arithmetic series.
  3. The find function, which is called within these loops, has an almost constant time complexity due to path compression. The worst-case scenario for each call can be assumed to be O(log n), but because of path compression, it tends to be much closer to O(1) over many calls.
  4. The last loop iterates over the initial infected nodes, which let's say there are k infected nodes. It calls the find function and sums the sizes of connected components, which could be O(n) in a sparse graph but is usually much less if infected nodes are clustered.

Considering these points, the time complexity can be estimated as follows:

  • Union-find operations + loop through all pairs of nodes: O(n^2 * \alpha(n)), where \alpha is the Inverse Ackermann function, which is a very slowly growing function and for all practical purposes can be considered constant, thus reducing this to O(n^2).
  • Loop through initial nodes and associated operations: O(k * n), where k is the number of initial infected nodes.

The final time complexity is O(n^2 + k * n), but because k <= n, this simplifies to O(n^2).

Space Complexity

The space complexity of the code is determined by the storage for the parent and size array, the set s, and any overhead needed for the union-find data structure.

  1. The parent array p uses O(n) space.
  2. The size array also uses O(n) space.
  3. The set s in the worst case might store k elements where k is the number of initial infected nodes.

Thus, the space complexity is O(n + k), which in the worst case simplifies to O(n) if every node is an initially infected node.

Learn more about how to find time and space complexity quickly using problem constraints.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

Is the following code DFS or BFS?

void search(Node root) {
  if (!root) return;
  visit(root);
  root.visited = true;
  for (Node node in root.adjacent) {
    if (!node.visited) {
      search(node);
    }
  }
}

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!