1032. Stream of Characters


Problem Description

The problem asks us to design an algorithm that continuously receives characters and needs to check whether any suffix (the end part of the characters received so far) is a string that appears in a given list of strings. Essentially, the algorithm has to be able to say whether the characters seen most recently form a word that we are looking for.

This is an example of a problem where we have to deal with a stream of data, meaning the input is not fixed, but keeps coming in over time (imagine typing on a keyboard, where each keystroke is a new character in the stream).

Suppose we are given a set of words like ["hello", "leet", "code"]. Now imagine characters start coming in: ['c', 'o', 'd', 'e']. After the last character ('e') arrives, the algorithm must recognize that 'code' is one of the words it has to detect and return true.

Intuition

To solve this problem effectively, we should use a data structure that allows for quick searches of prefixes or suffixes of words, such as a Trie (also known as a prefix tree). In typical usage, Tries are built with prefixes of words, but in this problem, because we are interested in suffixes, the Trie will be built backwards with words inserted in reverse order.

As characters stream in, we add them to a buffer (a list, for instance) that represents the current stream state. After each new character is added to the stream, we want to check if there exists any word that ends with the sequence of the letters currently in the buffer. Therefore, we check the Trie with the characters in the buffer in reverse order since our Trie is built with words in reverse order.

To optimize the search, we can keep the length of the search within the longest word in our list. There's no need to search further back in the stream because no word will be longer than the maximum word length in the list.

The Trie class has standard functionalities: it can insert a reversed word so that each node in the Trie represents a character, and it has a flag is_end to indicate if the current node is the end of a inserted word. The Trie also has a search method that checks if a given list of characters (in reverse) matches any word that ends with those characters. If it does, it returns true immediately upon finding a word's end. Otherwise, if it goes through all characters without finding a match, it returns false.

The StreamChecker class maintains an instance of the Trie. With each query, the StreamChecker adds the new character to the buffer and performs a search in the Trie for the recently added characters up to the limit. If a word is found, it returns true. Otherwise, it returns false.

In conclusion, each query operation involves a search in the Trie, which is efficient in terms of time complexity, making it well-suited for handling streams of characters where suffix matching is needed.

Learn more about Trie and Data Stream patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

How does merge sort divide the problem into subproblems?

Solution Approach

To implement the solution for the StreamChecker problem, we make use of Trie data structure because of its ability to handle queries related to prefixes and, in this case, suffixes of words. Here's how the solution is implemented:

  • Trie Class: First, we define a Trie class which will be used to store the words in a way that makes it efficient to search for suffixes.

    • The Trie consists of nodes, where each node contains an array of 26 elements (representing the 26 lowercase English letters) and a boolean flag is_end which indicates whether the node corresponds to the end of a word.

    • insert method: This inserts a word into the Trie. However, since we need to check for suffixes, each word is inserted in reverse. For instance, if the word is "abc", the Trie is updated with the letters 'c', 'b', 'a' in that order, starting from the Trie root.

    • search method: Takes a list of characters (representing the current stream) and checks if any suffix is present in the Trie. We pass the stream list sliced to the last limit characters to maintain efficiency and only check the recent part of the stream. This search is performed in reverse since words have been stored in reverse in the Trie.

  • StreamChecker Class: This class uses an instance of the Trie to check the stream.

    • The constructor __init__: Initializes a new Trie and inserts all the words in reverse. It also initializes a list cs which will be used to hold the characters from the stream. It sets a limit to the maximum number of characters to consider for a query which can be optimized based on the maximum length of the words provided.

    • query method: Accepts a character and appends it to the cs list which represents the current state of the stream; then, it calls the search method of the Trie and passes a slice of the stream list (to reduce the unnecessary searches beyond the longest word's length).

In the query method, it is important to note that the characters are used from the end of the list cs[-self.limit:], because the Trie has been constructed with words in reverse. This makes sure that we are always looking for a matching suffix in the Trie that corresponds with the most recently added characters to the stream.

With this implementation, whenever a new character arrives, we are able to query the Trie and get a response in efficient time. This solution ensures that even with a continuously growing stream, the query operations remain tractable due to the nature of the Trie data structure.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

How does merge sort divide the problem into subproblems?

Example Walkthrough

Let's illustrate the solution approach with a small example using the set of words ["hello", "leet", "code"] that our StreamChecker should watch out for. Here's how it will work step by step:

  1. Building the Trie: We begin by inserting the reverse of each word from our given list into the Trie. After this process, the Trie would have nodes representing the letters 'e', 't', 'o', 'c' (from "code"), 't', 'e', 'e', 'l' (from "leet"), and 'o', 'l', 'l', 'e', 'h' (from "hello").

  2. Initializing StreamChecker: We create an instance of StreamChecker and during its initialization, the above words are inserted in reverse into the Trie. The cs buffer is initialized to store the stream of characters and limit is set to 5, which is the length of the longest word "hello".

  3. Processing the Stream: Characters start to arrive one by one. Let's consider the stream ['c', 'o', 'd', 'e', 'b', 'l', 'e', 'e', 't'].

    • When 'c' arrives, we add it to cs and check the Trie. No suffix formed yet, so it returns false.

    • We keep adding subsequent characters: 'o', 'd'. Still, no recognized suffix, so return false each time.

    • Upon adding 'e', we check again. This time, the Trie recognizes the suffix 'e', 'd', 'o', 'c' (which is "code" in reverse), and returns true, as "code" is one of the words we were looking for.

    • The next letters are 'b', 'l', 'e', and 'e', all return false since they do not form a recognized suffix.

    • Finally, when 't' is added, we check the subarray cs[-5:] which contains ['b', 'l', 'e', 'e', 't']. We reverse it and now look for 't', 'e', 'e', 'l', 'b' in the Trie. It recognizes the suffix 't', 'e', 'e', 'l' as "leet" in reverse and returns true.

At each step, the query method is efficiently checking the last limit characters in the stream, which guarantees that as the character stream grows, we are only ever considering a manageable slice of it. This is why the Trie is an excellent choice for this type of problem; it keeps queries fast, even as more data comes in.

Solution Implementation

1class Trie:
2    def __init__(self):
3        # Initialize a list for the children, representing 26 lowercase English letters.
4        self.children = [None] * 26
5        # Mark this as the end of a word in the trie.
6        self.is_end_of_word = False
7
8    def insert(self, word: str):
9        # Insert a word into the trie with reversed order.
10        node = self
11        for char in word[::-1]:
12            index = ord(char) - ord('a')  # Map the character to an index.
13            if node.children[index] is None:
14                node.children[index] = Trie()  # Create a new Trie node if it does not exist.
15            node = node.children[index]  # Move to the next node.
16        node.is_end_of_word = True  # Mark the end of a word.
17
18    def search(self, word: list[str]) -> bool:
19        # Search for a word in the trie and returns True if it exists.
20        node = self
21        for char in word[::-1]:  # Check each character in reversed order.
22            index = ord(char) - ord('a')
23            if node.children[index] is None:
24                return False  # Return False if character path does not exist.
25            node = node.children[index]
26            if node.is_end_of_word:
27                return True  # Return True if the current node marks the end of a word.
28        return False  # Return False if the end of the word was not encountered.
29
30
31class StreamChecker:
32    def __init__(self, words: list[str]):
33        # Initialize the StreamChecker with a Trie and a list for incoming characters.
34        self.trie = Trie()
35        self.stream = []  # List to hold streamed letters.
36        self.limit = 200  # Maximum length of a word to consider in the trie.
37        for word in words:
38            self.trie.insert(word)
39
40    def query(self, letter: str) -> bool:
41        # Process a new letter from the stream, and check for matching words.
42        # Append the letter to the stream.
43        self.stream.append(letter)
44        # Return True if the added letter completes any word in the trie.
45        return self.trie.search(self.stream[-self.limit:])
46
47
48# Usage example:
49# stream_checker = StreamChecker(words)
50# result = stream_checker.query(letter)
51
1class Trie {
2    // Initialize trie children for each letter of the alphabet
3    private Trie[] children = new Trie[26];
4    // Flag to indicate the end of a word
5    private boolean isEndOfWord = false;
6  
7    // Method to insert a word into the trie
8    public void insert(String word) {
9        // Start with the root of the trie
10        Trie node = this;
11        // Iterate through the word in reverse order to insert it into the trie
12        for (int i = word.length() - 1; i >= 0; --i) {
13            int index = word.charAt(i) - 'a';
14            // If the corresponding child does not exist, create it
15            if (node.children[index] == null) {
16                node.children[index] = new Trie();
17            }
18            // Move to the child node
19            node = node.children[index];
20        }
21        // Mark the end of the word
22        node.isEndOfWord = true;
23    }
24
25    // Method to query the trie with a given string from a stream
26    public boolean query(StringBuilder stringBuilder) {
27        Trie node = this;
28        // Iterate through the stringBuilder in reverse order
29        for (int i = stringBuilder.length() - 1; i >= 0; --i) {
30            int index = stringBuilder.charAt(i) - 'a';
31            // If the corresponding child does not exist, the query fails
32            if (node.children[index] == null) {
33                return false;
34            }
35            // Move to the child node
36            node = node.children[index];
37            // If a word is found, return true
38            if (node.isEndOfWord) {
39                return true;
40            }
41        }
42        // No word found at the end of the query
43        return false;
44    }
45}
46
47class StreamChecker {
48    // StringBuilder to maintain the sequence of queried letters
49    private StringBuilder queryStringBuilder = new StringBuilder();
50    // Trie object to store the words for query
51    private Trie trie = new Trie();
52
53    // Constructor
54    public StreamChecker(String[] words) {
55        // Insert each word into the trie
56        for (String word : words) {
57            trie.insert(word);
58        }
59    }
60
61    // Query a single character and append it to the stream
62    public boolean query(char letter) {
63        // Append the letter to the query string
64        queryStringBuilder.append(letter);
65        // Query the trie with the updated stream
66        return trie.query(queryStringBuilder);
67    }
68}
69
1#include <vector>
2#include <string>
3#include <algorithm>
4
5// Class for Trie Node
6class Trie {
7private:
8    std::vector<Trie*> children; // Children nodes representing each alphabet letter
9    bool isEnd; // Flag indicating whether the node marks the end of a word
10
11public:
12    // Constructor initializes children for each letter of the alphabet and sets isEnd to false
13    Trie() : children(26, nullptr), isEnd(false) {}
14
15    // Inserts a word into the trie in reverse order
16    void insert(const std::string& word) {
17        Trie* node = this; // Start from the root node
18        for (int i = word.length() - 1; i >= 0; --i) {
19            int index = word[i] - 'a';
20            // If the child node does not exist, create it
21            if (!node->children[index]) {
22                node->children[index] = new Trie();
23            }
24            // Move to the child node
25            node = node->children[index];
26        }
27        // Mark the end of the word
28        node->isEnd = true;
29    }
30
31    // Searches for a word in reverse in the trie
32    bool search(const std::string& word) {
33        Trie* node = this;
34        for (int i = word.length() - 1; i >= 0; --i) {
35            int index = word[i] - 'a';
36            // If the child node does not exist, the word is not present
37            if (!node->children[index]) {
38                return false;
39            }
40            // Move to the child node
41            node = node->children[index];
42            // If we find a word ending here, we can return true
43            if (node->isEnd) {
44                return true;
45            }
46        }
47        // If no word ending is encountered during traversal, return false
48        return node->isEnd;
49    }
50};
51
52// Class for checking if a string contains any word as a suffix
53class StreamChecker {
54private:
55    Trie* trie; // Trie node used to store and query words
56    std::string streamSoFar; // String to store the characters queried so far
57
58public:
59    // Constructor that initializes the trie with the given words
60    StreamChecker(const std::vector<std::string>& words) : trie(new Trie()) {
61        for (const std::string& word: words) {
62            trie->insert(word);
63        }
64    }
65
66    // Queries the current character and returns true if any word is found as a suffix
67    bool query(char letter) {
68        // Add the current character to the stream
69        streamSoFar += letter;
70        // Search in the trie for the current stream
71        return trie->search(streamSoFar);
72    }
73
74    // Destructor to clean up the dynamically allocated Trie
75    ~StreamChecker() {
76        delete trie; // Properly delete the Trie to avoid memory leaks
77    }
78};
79
80/**
81 * Usage
82 * ----
83 * StreamChecker streamChecker(words); // Initialize the StreamChecker with a vector of words
84 * bool result = streamChecker.query(letter); // Will return true if the stream results in any suffix that is present in the word list
85 */
86
1// Global variables
2let children: TrieNode[] = [];
3let isEnd: boolean = false;
4let trie: TrieNode | null = null;
5let streamSoFar: string = "";
6
7// Interface for a Trie Node
8interface TrieNode {
9    children: TrieNode[];
10    isEnd: boolean;
11}
12
13// Function to create a trie node
14function createTrieNode(): TrieNode {
15    return {
16        children: new Array(26).fill(null),
17        isEnd: false
18    };
19}
20
21// Function to insert a word into the trie in reverse order
22
23function insert(word: string): void {
24    let node: TrieNode | null = trie;
25    for (let i = word.length - 1; i >= 0; --i) {
26        const index: number = word.charCodeAt(i) - 'a'.charCodeAt(0);
27        if (!node.children[index]) {
28            node.children[index] = createTrieNode();
29        }
30        node = node.children[index];
31    }
32    node.isEnd = true;
33}
34
35// Function to search for a word in reverse in the trie
36
37function search(word: string): boolean {
38    let node: TrieNode | null = trie;
39    for (let i = word.length - 1; i >= 0; --i) {
40        const index: number = word.charCodeAt(i) - 'a'.charCodeAt(0);
41        if (!node.children[index]) {
42            return false;
43        }
44        node = node.children[index];
45        if (node.isEnd) {
46            return true;
47        }
48    }
49    return node.isEnd;
50}
51
52// Function to initialize the trie with the given words
53// This replaces the StreamChecker constructor
54function initializeTrie(words: string[]): void {
55    trie = createTrieNode();
56    for (const word of words) {
57        insert(word);
58    }
59}
60
61// Function to query the current character and return true if any word is found as a suffix
62// This replaces StreamChecker.query
63function query(letter: string): boolean {
64    streamSoFar += letter;
65    return search(streamSoFar);
66}
67
68// Example of usage
69// Here you would first call initializeTrie with the array of words, then you can
70// repeatedly call query with different letters to simulate the stream of letters.
71
Not Sure What to Study? Take the 2-min Quiz:

What data structure does Breadth-first search typically uses to store intermediate states?

Time and Space Complexity

Time Complexity

  1. Trie.insert method:

    • The insertion of a word into the trie takes O(n) time, where n is the length of the word. This is because we process each character, inserting new nodes as necessary.
  2. Trie.search method:

    • The search operation in the trie takes O(m) time in the worst case, where m is the minimum between the length of the queried string w and the limit. The search reverses the string w and checks each character.
  3. StreamChecker.__init__ method:

    • During the initialization of a StreamChecker object, we insert several words into the trie. If l is the average length of the words and k is the total number of words, the overall time complexity for the initialization is O(k*l).
  4. StreamChecker.query method:

    • In each query, we add a letter to self.cs and perform a search. The time complexity of adding a letter is O(1). The search operates on a substring of at most self.limit characters, so the time complexity for the search part is O(self.limit). Therefore, the time complexity for each query is O(self.limit).

In summary, the time complexity for insertions is O(k*l), the search/query operations run in O(self.limit) time.

Space Complexity

  1. Trie storage:

    • The worst-case space complexity of a trie depends on the number of nodes and the size of the alphabet. In this implementation, each node can have up to 26 children. If we have n words of average length l, the worst-case space required would be O(26 * n * l). However, due to common prefixes, Tries usually use less space than this upper bound.
  2. StreamChecker object:

    • The StreamChecker maintains a list self.cs that holds the characters of the stream queried so far, and it's trimmed to the last self.limit elements. Thus, it takes O(self.limit) space.

In summary, the space complexity is dominated by the Trie structure, which is O(26 * n * l) for storing the nodes, and O(self.limit) for storing the stream of characters, resulting in a total space complexity of O(26 * n * l + self.limit).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Breadth first search can be used to find the shortest path between two nodes in a directed graph.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫