966. Vowel Spellchecker

MediumArrayHash TableString
Leetcode Link

Problem Description

In this problem, we are tasked with creating a spellchecker system that takes a list of valid words, called wordlist, and a list of query words. The goal is to correct each query word based on specific spelling mistake categories and rules. There are two categories of spelling mistakes the system needs to handle:

  1. Capitalization: If the query word differs from valid words only by capital letters, it should be treated as a correct match. The correction should return the word from wordlist in the same case as it appears there.

    • For instance, if wordlist = ["apple", "Banana"], queries like "APPLE" and "banana" should be corrected to "apple" and "Banana" respectively.
  2. Vowel Errors: If the query word differs from valid words only by the vowels ('a', 'e', 'i', 'o', 'u'), it should be treated as a correct match. The vowels in the query word can be replaced with any other vowel to find a match.

    • For example, wordlist = ["apple", "Banana"], and the query "appla" could be corrected to "apple" because changing 'a' to 'e' in query makes it match wordlist.

It's important to note that the spellchecker should follow a specific order of precedence while correcting:

  1. An exact case-sensitive match should be returned as is.
  2. If there is a case-insensitive match, the first such match in the wordlist should be returned.
  3. If there is a match allowing for vowel errors (case-insensitive), the first such match should be returned.
  4. If no match is found, return an empty string.

Given some queries, we return a list of words answer, where answer[i] is the correct word for query = queries[i].

Intuition

To approach this problem efficiently, we need a way to check for each type of correction quickly. We'll preprocess the wordlist to handle different cases and vowel variations, and then we'll use these preprocessed structures to correct each query.

Here's how we break down the solution approach:

  1. Exact Matches: We'll first create a set of all words from the wordlist to check for exact case-sensitive matches quickly. This allows us to find exact matches in O(1) time.

  2. Case-Insensitive Matches: We'll create a dictionary to store the words in lowercase form. If a word in wordlist happens to be in different cases, we only need to store the first occurrence as per our rule. The first occurrence can be heuristically considered the most common or correct form.

  3. Vowel Errors: Handling vowel errors involves creating a pattern where all vowels are replaced with a placeholder (like an asterisk ''). For example, "apple" and "epple" would both match the pattern "ppl". A dictionary is created to store these patterns with corresponding words from wordlist. As before, we only store the first occurrence of each pattern.

When checking a query:

  • If the query is in the exact match set, it is already correct.
  • Next, we convert the query to lowercase and look it up in the case-insensitive dictionary. If a match is found, we use the word stored there.
  • If there's no case-insensitive match, we replace vowels in the lowercase query with the placeholder and look for this pattern in the vowel error dictionary.
  • If all else fails, we append an empty string indicating no correction was found.

This solution ensures we can process each query effectively, leveraging the different lookup structures for each type of mistake, while respecting the precedence rules.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.

Which of the following method should we use to solve this problem?

Solution Approach

In the provided solution, several data structures are used to pre-process the wordlist and to provide an efficient lookup for each type of query correction needed:

  1. A set named s is used to store all the words from the wordlist. This allows for quick O(1) checks for exact matches, which is our highest priority according to the problem's rule set.

  2. Two dictionaries low and pat are used to prepare for case-insensitive and vowel error matches. The low (short for "lowercase") dictionary maps a lowercase version of each word to the first occurrence of it in the wordlist, ensuring we respect the order of occurrence rule. Likewise, the pat (short for "pattern") dictionary is for patterns created by replacing vowels with a placeholder.

  3. The function f(w) takes a word w, converts it to lowercase, and then replaces each vowel with a "*" to create a standardized pattern for vowel errors. Using this function we generate the keys for the pat dictionary which helps us lookup words with a vowel placement mistake.

  4. The main spellchecker function iterates over each query and applies the following rules in order of precedence:

    • First, it checks for an exact match by seeing if the query is in the s set. If it is, the query is immediately appended to the answers list.
    • Second, the query is converted to lowercase and we check the low dictionary to find a case-insensitive match. If found, the corresponding word is appended to the answers list.
    • Third, for vowel errors, we apply the f function on the query to create a pattern and then check the pat dictionary. If found, we append the word from the pat dictionary that corresponds to the pattern created by the query.
    • Finally, if none of the above conditions result in a match, an empty string is appended to the answers list, indicating that no correction was found for the query.

This extended processing for each type of mistake allows the function to handle all cases described in the problem statement efficiently and in a manner that respects the precedence of spell correction rules. Vowel error checks are only made if no case-sensitive or case-insensitive matches are found, thus adhering to the order specified, and ensuring the most accurate spell correction is provided.

Each query is handled in O(1) time lookups thanks to the data structures used, making the entire solution highly efficient even for large lists of words and queries.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

In a binary min heap, the maximum element can be found in:

Example Walkthrough

Let's consider a small example to illustrate the solution approach described earlier.

Assuming we have a wordlist as ["Kiwi", "banana", "Apple", "apricot", "Berry"] and our queries are ["berry", "BANANA", "apPle", "Apricot", "kwi", "Berri"]. We want to spellcheck each query and return the correct word based on the rules.

Let's go through each step:

  1. Exact Matches: We create a set s containing each word from the wordlist exactly as it is, which will allow us to quickly identify any exact matches.

    s: {"Kiwi", "banana", "Apple", "apricot", "Berry"}

  2. Case-Insensitive Matches: We create a dictionary low where each lowercase word from the wordlist is a key, and its value is the first occurrence of that word from the wordlist.

    low: {"kiwi": "Kiwi", "banana": "banana", "apple": "Apple", "apricot": "apricot", "berry": "Berry"}

  3. Vowel Errors: Next, we create a dictionary pat where we replace the vowels in each lowercase word from the wordlist with a "*" to generate patterns that help us match words with incorrect vowels.

    pat: {"k*w*": "Kiwi", "b*n*n*": "banana", "ppl*": "Apple", "pr*c*t": "apricot", "b*rry": "Berry"}

Now, we move on to checking the queries.

  • For query = "berry", we first look for an exact match in set s. It's not there, so we check the case-insensitive match in low and find "berry" which returns "Berry" as the correction.
  • For query = "BANANA", again, no exact match, but the lowercase "banana" is in low and we get "banana" as the correction.
  • For query = "apPle", no exact match, but in low, we find "apple" giving us "Apple" as the correction.
  • For query = "Apricot", no exact match. We lowercase the query to "apricot", and find it in low, returning "apricot".
  • For query = "kwi", no exact match, no case-insensitive match, we create a pattern "kw*" which doesn't match any pattern in pat, so we return an empty string.
  • For query = "Berri", no exact match, no case-insensitive match, we create the pattern "b*rr*", no matching pattern is found in pat, so we return an empty string as well.

Hence, our spellchecker returns the list: ["Berry", "banana", "Apple", "apricot", "", ""].

In this example, only when the exact and case-insensitive matches are not found do we consider vowel errors. Each step respects the order of precedence and uses efficient data structures for quick lookups, demonstrating the effectiveness of the solution approach.

Solution Implementation

1class Solution:
2    def spellchecker(self, wordlist, queries):
3        # Helper function to replace vowels in a word with '*'
4        def mask_vowels(word):
5            return ''.join('*' if char.lower() in 'aeiou' else char for char in word)
6
7        # Original set of words for exact match
8        exact_matches = set(wordlist)
9        # Dictionary for case-insensitive matches
10        case_insensitive_matches = {}
11        # Dictionary for vowel-error matches
12        vowel_error_matches = {}
13      
14        # Preprocess wordlist to fill the above dictionaries
15        for word in wordlist:
16            lowercase_word = word.lower()
17            masked_word = mask_vowels(lowercase_word)
18          
19            # For case-insensitive matches, store the first occurrence
20            if lowercase_word not in case_insensitive_matches:
21                case_insensitive_matches[lowercase_word] = word
22          
23            # For vowel-error matches, store the first occurrence
24            if masked_word not in vowel_error_matches:
25                vowel_error_matches[masked_word] = word
26
27        # List for storing spellchecked results
28        spellchecked_queries = []
29      
30        # Check each query against exact, case-insensitive, and vowel-error matches
31        for query in queries:
32            # Check for exact match
33            if query in exact_matches:
34                spellchecked_queries.append(query)
35                continue
36          
37            # Convert the query to lowercase for case-insensitive and vowel-error comparisons
38            lowercase_query = query.lower()
39            # Check for case-insensitive match
40            if lowercase_query in case_insensitive_matches:
41                spellchecked_queries.append(case_insensitive_matches[lowercase_query])
42                continue
43          
44            # Mask the vowels in the query
45            masked_query = mask_vowels(lowercase_query)
46            # Check for vowel-error match
47            if masked_query in vowel_error_matches:
48                spellchecked_queries.append(vowel_error_matches[masked_query])
49                continue
50          
51            # No match found
52            spellchecked_queries.append("")
53      
54        return spellchecked_queries
55
1class Solution {
2    // This function returns an array of spellcheck results for the given queries,
3    // based on the provided wordlist.
4    public String[] spellChecker(String[] wordList, String[] queries) {
5        // Initialize a set to hold the wordlist for fast look-up
6        Set<String> wordSet = new HashSet<>();
7        // Maps to hold lowercase and vowel-less variations of words
8        Map<String, String> wordLowerCaseMap = new HashMap<>();
9        Map<String, String> wordPatternMap = new HashMap<>();
10      
11        // Populate the data structures with the word list
12        for (String word : wordList) {
13            // Add the exact word to the set
14            wordSet.add(word);
15            // Convert to lowercase
16            String wordLowerCase = word.toLowerCase();
17            // If the lowercase word isn't in the map, put it with the original word
18            wordLowerCaseMap.putIfAbsent(wordLowerCase, word);
19            // Create and put the vowel-less pattern if absent
20            wordPatternMap.putIfAbsent(transformPattern(wordLowerCase), word);
21        }
22      
23        // Prepare an array to hold the answers
24        int queryCount = queries.length;
25        String[] answers = new String[queryCount];
26      
27        // Process each query
28        for (int i = 0; i < queryCount; ++i) {
29            String query = queries[i];
30
31            // Check if the exact word exists in the word set
32            if (wordSet.contains(query)) {
33                answers[i] = query;
34                continue;
35            }
36
37            // Convert to lowercase and check if it exists in the lowercase map
38            query = query.toLowerCase();
39            if (wordLowerCaseMap.containsKey(query)) {
40                answers[i] = wordLowerCaseMap.get(query);
41                continue;
42            }
43          
44            // Apply the vowel-less pattern and check if it exists in the pattern map
45            query = transformPattern(query);
46            if (wordPatternMap.containsKey(query)) {
47                answers[i] = wordPatternMap.get(query);
48                continue;
49            }
50          
51            // If none of the conditions are met, the answer is an empty string
52            answers[i] = "";
53        }
54      
55        return answers;
56    }
57
58    // This function creates a version of the string where all vowels are replaced by an asterisk.
59    private String transformPattern(String word) {
60        char[] characters = word.toCharArray();
61        for (int i = 0; i < characters.length; ++i) {
62            if ("aeiou".indexOf(characters[i]) >= 0) {
63                characters[i] = '*';
64            }
65        }
66        return new String(characters);
67    }
68}
69
1#include <vector>
2#include <string>
3#include <unordered_set>
4#include <unordered_map>
5#include <algorithm>
6
7class Solution {
8public:
9    std::vector<std::string> spellchecker(std::vector<std::string>& wordList, std::vector<std::string>& queries) {
10        // Create a set from the wordList for quick existence checks.
11        std::unordered_set<std::string> wordsSet(wordList.begin(), wordList.end());
12      
13        // Mapping to store lowercase versions of words.
14        std::unordered_map<std::string, std::string> lowercaseMap;
15      
16        // Mapping to store patternized versions of words (vowels replaced with '*').
17        std::unordered_map<std::string, std::string> patternMap;
18      
19        // Function to replace vowels in a word with '*'.
20        auto replaceVowels = [](std::string& word) {
21            std::string result;
22            for (char& c : word) {
23                if (c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u') {
24                    result += '*';
25                } else {
26                    result += c;
27                }
28            }
29            return result;
30        };
31
32        // Populate lowercaseMap and patternMap from the words in wordList.
33        for (auto& word : wordList) {
34            std::string tempWord = word;
35            // Convert to lowercase.
36            std::transform(tempWord.begin(), tempWord.end(), tempWord.begin(), ::tolower);
37            if (!lowercaseMap.count(tempWord)) {
38                lowercaseMap[tempWord] = word;
39            }
40          
41            // Convert to vowel-patterned form.
42            tempWord = replaceVowels(tempWord);
43            if (!patternMap.count(tempWord)) {
44                patternMap[tempWord] = word;
45            }
46        }
47
48        // Vector to store the answers to the queries.
49        std::vector<std::string> answers;
50      
51        // Process each query.
52        for (auto& query : queries) {
53            // Exact match check.
54            if (wordsSet.count(query)) {
55                answers.emplace_back(query);
56                continue;
57            }
58
59            // Convert query to lowercase for case-insensitive match.
60            std::transform(query.begin(), query.end(), query.begin(), ::tolower);
61          
62            // Check for case-insensitive match.
63            if (lowercaseMap.count(query)) {
64                answers.emplace_back(lowercaseMap[query]);
65                continue;
66            }
67
68            // Replace vowels in query to '*' for vowel-insensitive match.
69            query = replaceVowels(query);
70          
71            // Check for vowel-insensitive match.
72            if (patternMap.count(query)) {
73                answers.emplace_back(patternMap[query]);
74                continue;
75            }
76
77            // If no matches were found, add an empty string.
78            answers.emplace_back("");
79        }
80      
81        // Return the final answers vector.
82        return answers;
83    }
84};
85
1// Import required TypeScript modules.
2import { Set, Map } from "typescript-collections";
3
4// Function to replace vowels in a word with '*'.
5const replaceVowels = (word: string): string => {
6    // Define a regex pattern for vowels (a, e, i, o, u).
7    const vowelPattern = /[aeiou]/gi;
8    // Replace all vowels in the word with '*' and return the result.
9    return word.replace(vowelPattern, '*');
10};
11
12// Function to spell check words based on a given word list.
13const spellChecker = (wordList: string[], queries: string[]): string[] => {
14    // Create a set from the wordList for quick existence checks.
15    const wordsSet = new Set<string>();
16    wordList.forEach(word => wordsSet.add(word));
17
18    // Mapping to store lowercase versions of words.
19    const lowercaseMap = new Map<string, string>();
20
21    // Mapping to store patternized versions of words (vowels replaced with '*').
22    const patternMap = new Map<string, string>();
23
24    // Populate lowercaseMap and patternMap from the words in wordList.
25    wordList.forEach(word => {
26        const lowerWord = word.toLowerCase();
27        if (!lowercaseMap.containsKey(lowerWord)) {
28            lowercaseMap.setValue(lowerWord, word);
29        }
30
31        const vowelLessWord = replaceVowels(lowerWord);
32        if (!patternMap.containsKey(vowelLessWord)) {
33            patternMap.setValue(vowelLessWord, word);
34        }
35    });
36
37    // Vector to store the answers to the queries.
38    const answers: string[] = [];
39
40    // Process each query.
41    queries.forEach(query => {
42        // Exact match check.
43        if (wordsSet.contains(query)) {
44            answers.push(query);
45            return; // Continue to the next iteration.
46        }
47
48        // Convert query to lowercase for case-insensitive match.
49        const lowerQuery = query.toLowerCase();
50
51        // Check for case-insensitive match.
52        if (lowercaseMap.containsKey(lowerQuery)) {
53            answers.push(lowercaseMap.getValue(lowerQuery)!);
54            return; // Continue to the next iteration.
55        }
56
57        // Replace vowels in query to '*' for vowel-insensitive match.
58        const patternQuery = replaceVowels(lowerQuery);
59
60        // Check for vowel-insensitive match.
61        if (patternMap.containsKey(patternQuery)) {
62            answers.push(patternMap.getValue(patternQuery)!);
63            return; // Continue to the next iteration.
64        }
65
66        // If no matches were found, add an empty string.
67        answers.push("");
68    });
69
70    // Return the final answers array.
71    return answers;
72};
73
Not Sure What to Study? Take the 2-min Quiz:

In a binary min heap, the minimum element can be found in:

Time and Space Complexity

Time Complexity

The time complexity of the code can be analyzed in the following steps:

  1. The function f(w) takes O(n) time for transforming a word by replacing vowels with *, where n is the length of the word.

  2. The first loop where it iterates over the wordlist:

    • For each word w it computes t = w.lower() which takes O(m) time, where m is the length of the word.
    • It calls f(t) which takes O(m) time.
    • So, each iteration through the wordlist is O(m) and since it goes through the entire wordlist once, the complexity for this loop is O(L * m) where L is the length of the wordlist.
  3. The second loop where it iterates over the queries does the following:

    • Checks if q is in set s which is an O(1) operation since s is a set.
    • Converts q to lowercase and check if it's in low which is an O(k) operation for each query, where k is the length of the query.
    • Calls f(q) which is an O(k) operation, and checks against the pat dictionary.
    • Similar to before, each iteration’s complexity is O(k) and since it goes through each query once, the total complexity is O(Q * k) where Q is the number of queries.

Therefore, combining these complexities, the overall time complexity of the function spellchecker is O(L * m + Q * k).

Space Complexity

The space complexity of the code can be analyzed by considering the additional data structures being used:

  1. The s set stores all the words as they are, taking up O(L) space.

  2. The low and pat dictionaries store keys that are at most as long as the words and an associated word for each key. They will have at most L keys each (duplicates are not stored again because setdefault won't replace existing keys), bringing the space needed for these to O(L * m), where m is the average length of the words.

Combining the complexities for storing s, low, and pat, the overall space complexity is O(L * m), assuming the average length of the words m is not negligible compared to the number of words L.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following is the prefix sum of array [1, 2, 3, 4, 5]?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫