433. Minimum Genetic Mutation

MediumBreadth-First SearchHash TableString
Leetcode Link

Problem Description

In this problem, we are given a start gene string and an end gene string, each 8 characters long composed of the characters 'A', 'C', 'G', and 'T'. We also have a bank of valid gene mutations. A mutation is a change of a single character in the gene string, and a gene must be in the bank to be considered a valid gene mutation.

The task is to determine the minimum number of mutations required to change the start gene into the end gene. If it's not possible to reach the end gene from the start gene by only using valid mutations from the bank, we must return -1. It is worth noting that the start gene is considered valid even if it's not present in the bank, but every step after must be a mutation existing in the bank.

To clarify: a gene string mutation is represented as a single character change from the set of possible characters. For example, changing 'AACCGGTT' to 'AACCGGTA' is considered one mutation.

Intuition

To solve this problem, we can use either Breadth-First Search (BFS) or Depth-First Search (DFS), but BFS is more convenient for finding the shortest path in an unweighted graph, which is analogous to finding the minimum number of mutations needed.

The intuition behind the solution is to treat each gene string as a node in a graph and each valid mutation as an edge connecting two nodes. We use BFS to explore the graph level by level, starting from the start gene and working towards the end gene. BFS is ideal for this scenario because it finds the shortest path (minimum mutations) from the start to the end node.

Here are the key steps of the BFS approach:

  1. Initialize a set from the bank to quickly check if a mutation is valid.
  2. Begin with a queue initialized with the start gene and a mutation count of 0.
  3. Use a dictionary to define mutation possibilities for each character.
  4. Dequeue an element and iterate over its characters, changing one character at a time according to the mutation possibilities.
  5. If a new mutation is valid (in the bank), enqueue it with a mutation count incremented by one.
  6. If we reach the end gene, return the mutation count.
  7. If the queue is exhausted without finding the end gene, return -1.

By using this BFS approach, we explore all possible gene mutations in the shortest number of steps, ensuring the minimum number of mutations needed to achieve the end gene, if possible.

Learn more about Breadth-First Search patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What is the running time of the following code?

1int sqrt(int n) {
2  for (int guess = 1; guess * guess <= n; guess++) {
3    if (guess * guess == n) {
4      return guess;
5    }
6  }
7  return -1;
8}

Solution Approach

The solution uses Breadth-First Search (BFS), which is an algorithm for traversing or searching tree or graph data structures. It explores the neighbor nodes at the present depth prior to moving on to nodes at the next depth level. The implementation includes the following algorithms, data structures, and patterns:

  1. Set: A set is initialized from the bank, which provides O(1) complexity for checking the existence of a gene string.

  2. Queue (Deque): The deque data structure is used as a queue to implement BFS. Items are appended at one end and popped from the other end, mimicking a queue's First In, First Out (FIFO) behavior.

  3. Hash Map: A dictionary mp maps each character to the other three characters, indicating the possible mutations from one character. This is used to easily determine the viable mutations at any position of a gene string.

  4. Level-by-Level Traversal: The algorithm traverses the graph level by level. It considers all gene strings that can be reached from the current gene string in a single mutation as neighbors and explores them in the BFS manner.

  5. Early Stopping: If at any point the end gene is found during the exploration, the function immediately returns the number of mutations (steps) taken to reach this state.

  6. Graph Pruning: To avoid revisiting gene strings and thus entering cycles, every time a new valid gene string is found in the bank, it is removed from the set. This ensures that each gene string is visited only once.

The implementation steps are as follows:

  • Create a set of the gene bank to facilitate quick look-up.
  • Initialize a deque with a tuple containing the start gene and an initial step count of 0.
  • Define a mp dictionary representing the mutation map from each character to the other three possible characters.
  • Start the BFS loop by popping an element from the queue:
    • For each character v in the current gene string t, loop through the possible characters j in mp[v] to generate new strings.
    • Construct a new gene string next by replacing the character at position i with j.
    • If next is found in the set, append it to the queue with the step count incremented and remove it from the set to prevent future visits.
  • Continue the loop until the queue is empty or the end gene is reached.

Here is the complete BFS implementation in code:

1s = set(bank)  # Convert bank into a set for O(1) access
2q = deque([(start, 0)])  # Initialize the queue with a tuple of start gene and step counter
3mp = {'A': 'TCG', 'T': 'ACG', 'C': 'ATG', 'G': 'ATC'}  # Mutation map
4
5# BFS loop
6while q:
7    t, step = q.popleft()  # Pop the next gene string and current step count
8    if t == end:  # End reached
9        return step
10    for i, v in enumerate(t):  # Iterate through each character in the string
11        for j in mp[v]:  # Go through all possible mutations for the character
12            next = t[:i] + j + t[i + 1:]  # Create the new mutated string
13            if next in s:  # Check if the mutation is valid
14                q.append((next, step + 1))  # Enqueue the new gene with incremented steps
15                s.remove(next)  # Remove the visited gene from the set
16return -1  # Return -1 if end can't be reached

This code utilizes BFS to find the shortest path from startGene to endGene, which corresponds to the minimum mutations required to transform one into the other.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

What is the running time of the following code?

1int sqrt(int n) {
2  for (int guess = 1; guess * guess <= n; guess++) {
3    if (guess * guess == n) {
4      return guess;
5    }
6  }
7  return -1;
8}

Example Walkthrough

Let's demonstrate the solution approach with an example:

Assume our start gene is AACCGGTT and our end gene is AACCGGTA. The bank of valid gene mutations includes AACCGGTA, AACCGGTC, and AACCGGCT.

  1. The set of bank mutations is {AACCGGTA, AACCGGTC, AACCGGCT}. Since we can check the presence of a gene in O(1) time, this helps us quickly validate new mutations.

  2. We initialize our queue (deque) with the start gene and a mutation count of 0: deque([('AACCGGTT', 0)]).

  3. Our mutation map mp is {'A': 'TCG', 'T': 'ACG', 'C': 'ATG', 'G': 'ATC'}, showing all possible single character mutations for each character.

Now, we begin the BFS:

  • We dequeue the first element: ('AACCGGTT', 0).
  • Beginning with the first character, we check all other characters, but since no mutation is present in the bank for a change in the first character, we move to the second character and so on.
  • We finally reach the last character 'T'. The possible mutations for 'T' are 'A', 'C', and 'G'.
  • We mutate the last character to 'A' (resulting in AACCGGTA) and check if it's in the bank. It is, so we enqueue it with a mutation count incremented by one: deque([('AACCGGTA', 1)]).
  • We remove AACCGGTA from the set to avoid revisiting this gene string.
  • Continuing the BFS, we dequeue ('AACCGGTA', 1) and find it's the end gene. The search is over, and we return 1.

Hence, we have found that the minimum number of mutations to get from AACCGGTT to AACCGGTA using the given gene bank is 1.

Solution Implementation

1from collections import deque
2
3class Solution:
4    def minMutation(self, start: str, end: str, bank: list[str]) -> int:
5        # Convert the bank into a set for faster lookup
6        bank_set = set(bank)
7      
8        # Initialize a queue with the starting sequence and initial step count of 0
9        queue = deque([(start, 0)])
10      
11        # Map to store the possible mutations for each gene
12        mutation_map = {'A': 'TCG', 'T': 'ACG', 'C': 'ATG', 'G': 'ATC'}
13      
14        # Perform BFS to find the minimum mutation steps
15        while queue:
16            current_sequence, step_count = queue.popleft()
17          
18            # If the current sequence matches the end sequence, we return the steps taken
19            if current_sequence == end:
20                return step_count
21          
22            # Try all possible single-gene mutations of the current sequence
23            for i, gene in enumerate(current_sequence):
24                for mutation in mutation_map[gene]:
25                    # Form the new sequence by replacing the gene at index i with the mutation
26                    next_sequence = current_sequence[:i] + mutation + current_sequence[i + 1:]
27                    # If the new sequence is in the bank, add it to the queue and remove from bank
28                    if next_sequence in bank_set:
29                        queue.append((next_sequence, step_count + 1))
30                        bank_set.remove(next_sequence)
31      
32        # If we reach this point, there is no way to mutate start into end with the given bank
33        return -1
34
1// Import statements for the HashSet, HashMap, and LinkedList
2import java.util.HashSet;
3import java.util.HashMap;
4import java.util.Set;
5import java.util.Map;
6import java.util.Deque;
7import java.util.LinkedList;
8
9// Pair class implementation (as Pair class was not explicitly provided or imported)
10class Pair<K, V> {
11    private K key;
12    private V value;
13
14    public Pair(K key, V value) {
15        this.key = key;
16        this.value = value;
17    }
18
19    public K getKey() {
20        return key;
21    }
22
23    public V getValue() {
24        return value;
25    }
26}
27
28class Solution {
29    // Function to find the minimum number of mutations to transform start string into end string
30    public int minMutation(String start, String end, String[] bank) {
31        // Create a set from the given bank array for efficient lookups
32        Set<String> bankSet = new HashSet<>();
33        for (String gene : bank) {
34            bankSet.add(gene);
35        }
36
37        // Map to hold the possible mutations for each gene character
38        Map<Character, String> mutationMap = new HashMap<>(4);
39        mutationMap.put('A', "TCG");
40        mutationMap.put('T', "ACG");
41        mutationMap.put('C', "ATG");
42        mutationMap.put('G', "ATC");
43
44        // Initialize a queue to perform Breadth-First Search (BFS)
45        Deque<Pair<String, Integer>> queue = new LinkedList<>();
46        queue.offer(new Pair<>(start, 0)); // Insert start gene with step 0 into the queue
47
48        // Loop until the queue is empty
49        while (!queue.isEmpty()) {
50            // Poll the first pair from the queue
51            Pair<String, Integer> current = queue.poll();
52            String currentGene = current.getKey();
53            int currentStep = current.getValue();
54
55            // If the current gene matches the end gene, return the number of steps taken
56            if (end.equals(currentGene)) {
57                return currentStep;
58            }
59
60            // Iterate through all characters of the current gene
61            for (int i = 0; i < currentGene.length(); ++i) {
62                // Explore each possible mutation based on the mapping
63                for (char mutation : mutationMap.get(currentGene.charAt(i)).toCharArray()) {
64                    // Construct the new mutated gene
65                    String mutatedGene = currentGene.substring(0, i) + mutation + currentGene.substring(i + 1);
66
67                    // If the mutated gene is in the bank
68                    if (bankSet.contains(mutatedGene)) {
69                        // Add the mutated gene with an incremented step count into the queue
70                        queue.offer(new Pair<>(mutatedGene, currentStep + 1));
71                        // Remove the mutated gene from the set to prevent revisiting
72                        bankSet.remove(mutatedGene);
73                    }
74                }
75            }
76        }
77
78        // If no mutation leads to the end gene, return -1
79        return -1;
80    }
81}
82
1#include <unordered_set>
2#include <unordered_map>
3#include <queue>
4#include <string>
5#include <vector>
6
7class Solution {
8public:
9    int minMutation(std::string start, std::string end, std::vector<std::string>& bank) {
10        // Create a set to store valid genes from the bank for O(1) access
11        std::unordered_set<std::string> validGenes(bank.begin(), bank.end());
12
13        // Map to hold potential mutations for each gene character
14        std::unordered_map<char, std::string> mutationsMap;
15        mutationsMap['A'] = "TCG";
16        mutationsMap['T'] = "ACG";
17        mutationsMap['C'] = "ATG";
18        mutationsMap['G'] = "ATC";
19
20        // Queue to perform BFS, storing gene strings and their mutation step counts
21        std::queue<std::pair<std::string, int>> mutationQueue;
22        mutationQueue.push({start, 0});
23
24        // Perform BFS
25        while (!mutationQueue.empty()) {
26            auto current = mutationQueue.front();
27            mutationQueue.pop();
28            std::string gene = current.first;
29            int steps = current.second;
30
31            // Return the number of steps if the end gene is reached
32            if (gene == end) return steps;
33
34            // Try mutating each character of the gene string
35            for (size_t i = 0; i < gene.size(); ++i) {
36                // Check each possible mutation for the current character
37                for (char mut : mutationsMap[gene[i]]) {
38                    std::string nextGene = gene.substr(0, i) + mut + gene.substr(i + 1);
39                  
40                    // If mutation is valid and in the bank, add it to the queue
41                    if (validGenes.count(nextGene)) {
42                        mutationQueue.push({nextGene, steps + 1});
43                        // Remove the gene from set to prevent revisiting
44                        validGenes.erase(nextGene);
45                    }
46                }
47            }
48        }
49
50        // If the end gene cannot be reached, return -1
51        return -1;
52    }
53};
54
1function minMutation(start: string, end: string, bank: string[]): number {
2    // Initialize a queue and start with the starting sequence
3    const sequenceQueue: string[] = [start];
4  
5    // This variable will hold the number of mutations needed
6    let mutationCount: number = 0;
7  
8    // Process the queue until it's empty
9    while (sequenceQueue.length > 0) {
10        // Get the number of sequences to process in this round
11        const numberOfSequences: number = sequenceQueue.length;
12
13        // Process all sequences in the current round
14        for (let i = 0; i < numberOfSequences; i++) {
15            // Remove the first sequence from the queue to process it
16            const currentSequence: string = sequenceQueue.shift();
17
18            // If the current sequence is the target end sequence, return mutation count
19            if (currentSequence === end) {
20                return mutationCount;
21            }
22
23            // Iterate through the bank in reverse order to remove elements while iterating
24            for (let j = bank.length - 1; j >= 0; j--) {
25                const bankSequence: string = bank[j];
26                let difference: number = 0;
27
28                // Compare the current sequence with the bank sequence and count the differences
29                for (let k = 0; k < 8; k++) {
30                    if (currentSequence[k] !== bankSequence[k]) {
31                        difference++;
32                    }
33                }
34
35                // A valid mutation is one where there is exactly one difference
36                if (difference === 1) {
37                    // Add the new mutation to the queue and remove it from the bank
38                    sequenceQueue.push(...bank.splice(j, 1));
39                }
40            }
41        }
42        // Increment the mutation counter after each round
43        mutationCount++;
44    }
45
46    // If no mutation path was found to reach the end sequence, return -1
47    return -1;
48}
49
Not Sure What to Study? Take the 2-min Quiz:

In a binary min heap, the minimum element can be found in:

Time and Space Complexity

The given code snippet is a solution to the genetic mutation problem, which finds the minimum number of mutations needed to transform the start gene sequence into the end gene sequence using a given set of gene mutations from the bank.

Time Complexity

To analyze time complexity, let's look at the components involved:

  1. The BFS traversal: The algorithm uses a queue to perform a breadth-first search.

  2. Checking each character and trying all possible mutations: For each gene sequence, the algorithm checks each of its N characters (where N is the length of the gene sequence, typically 8 in the problem constraint) and attempts to mutate it to all other possible characters (3 possibilities per character, since there are 4 possible nucleotides).

  3. Checking and removing gene sequences from the set: Upon each successful mutation, it checks if the new sequence is in the set and removes it if present.

Considering that there are M gene sequences in the bank:

  • In the worst case, the algorithm might have to visit all M sequences in the bank.
  • For each sequence, we perform N character checks and 3 possible mutations.
  • Each mutation check involves an O(1) set containment check and O(1) removal operation (since it's a set).

Thus, the total time complexity can be estimated as O(M * N * 3), which simplifies to O(M * N) as constants can be ignored in Big O notation.

Space Complexity

The primary space-consuming structures are:

  1. The queue used for BFS which, in the worst case, might hold all M sequences before mutation at the same level.
  2. The set s, also holding the M gene sequences from the bank.
  3. The mp dictionary with a constant size of 4 (not dependent on N or M), therefore can be considered O(1).

So the space complexity is dominated by the queue and the set, both of which may hold up to M gene sequences. Hence, the space complexity is O(M).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following is equvalent to O(3*2^n + n^3 + n!+ log n)?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫