3045. Count Prefix and Suffix Pairs II

HardTrieArrayStringString MatchingHash FunctionRolling Hash
Leetcode Link

Problem Description

The problem presents us with an array of strings words, where each element is a string. We are tasked to determine the number of unique index pairs (i, j) such that the string at index i is both a prefix (the beginning part) and a suffix (the ending part) of the string at index j, with the additional condition that i must be less than j.

For example, if the words array contains ["apple", "applepie", "pie"], then the string "apple" is both a prefix and a suffix of "applepie", forming a valid pair of indices (0, 1).

The challenge is to count all such pairs in the given array, taking into consideration that the strings can be of any length and the array can contain a large number of strings.

Intuition

The straightforward approach to this problem would be to compare every possible pair of strings and check if the conditions of being a prefix and suffix are met. However, this naive approach could be very inefficient because it would require a lot of redundant comparison especially when there are repeating patterns in the strings.

To optimize the process, we should try to find a way that can help us identify the prefix-suffix relationship faster. This brings us to the core of the solution - Trie data structure. Tries are often used for efficiently searching for prefixes in a set of strings. But for our problem, we need to check for both prefixes and suffixes simultaneously.

To achieve this, we introduce a modified trie that holds pairs of characters. Each pair is composed of a character from the prefix and a character from the suffix of the same position in the string. For example, for string "ababa", the pairs would be (a,a), (b,b), (a,a).

While inserting these pairs into the trie, we maintain a count at each node, representing how many times a particular prefix-suffix pair has been seen thus far. When we iterate over a new string to insert its character pairs into the trie, we can simultaneously calculate the number of times a subtree's character pairs have appeared before. This allows us to efficiently count the number of valid i < j pairs, where isPrefixAndSuffix(words[i], words[j]) is true, without having to explicitly check every combination.

The given solution code implements a Trie with the specified modification and efficiently computes the count of qualifying index pairs.

Learn more about Trie patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which two pointer technique does Quick Sort use?

Solution Approach

In the solution approach, we are utilizing a Trie data structure. Traditionally, a trie is used for storing and searching for strings based on their prefixes. In our case, we need to consider both the prefix and suffix simultaneously. The main algorithms and patterns used in our solution are:

  • Trie Construction: The trie is constructed not on single characters but on pairs of characters. Each node in the trie represents a pair of (prefix_char, suffix_char). If the string is "ababa", the pairs inserted would be (a, a), (b, b), and (a, a). The trie is used to store all possible prefix-suffix pairs that we encounter when iterating through the words.

  • Trie Node Structure: Each node in the trie contains a dictionary children that maps character pairs to their corresponding child nodes. It also has a counter cnt, which indicates the number of times the particular prefix-suffix pair (represented by the path to this node) has been encountered so far.

  • Inserting into the Trie: For each string s, we iterate through the characters, creating a pair (s[i], s[m-i-1]) which contains the i-th character from the start and end of the string. We insert this pair into the trie. If the pair doesn't exist, we create a new node.

  • Searching and Counting: While inserting character pairs of a string, we traverse the trie based on these pairs. We also increment the count of how often we have seen the prefix-suffix pair, which is hosted on the trie node.

  • Utility of the Count: As we insert a string's character pairs into the trie, we simultaneously add the count at the current node to our overall count ans. This count at the node represents how many times we have previously seen this prefix-suffix pair, which directly corresponds to how many valid pairs (i, j) we can form with strings encountered so far for which i < j.

  • Efficient Counting: By using this trie structure, we can insert each string and efficiently increment our answer ans without having to directly compare the strings. This approach reduces the complexity from potentially quadratic time (considering all pairs of strings) to linear time, relative to the total number of characters in all strings.

1class Node:
2    __slots__ = ["children", "cnt"]
3
4    def __init__(self):
5        self.children = {}
6        self.cnt = 0
7
8
9class Solution:
10    def countPrefixSuffixPairs(self, words: List[str]) -> int:
11        ans = 0
12        trie = Node()
13        for s in words:
14            node = trie
15            for p in zip(s, reversed(s)):
16                if p not in node.children:
17                    node.children[p] = Node()
18                node = node.children[p]
19                ans += node.cnt
20            node.cnt += 1
21        return ans

The above code snippets show how each part of the trie is created, used, and contributes to solving the problem efficiently by using the trie as a sophisticated counting tool.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of these properties could exist for a graph but not a tree?

Example Walkthrough

Let's say we have an array of words ["one", "onenote", "note"]. We have to find all unique index pairs (i, j) where i < j and words[i] is both a prefix and a suffix of words[j].

Here’s a step-by-step process using the solution approach:

  1. Initialize an empty trie and a variable ans to keep count of the valid index pairs found.

  2. Start iterating through each word in words.

  3. For each word such as "one", create pairs from its prefix and suffix characters. Here we would get: ('o', 'e'), ('n', 'n'), ('e', 'o').

  4. Insert each pair into the trie:

    • For the first pair ('o', 'e'), as it doesn't exist in the trie yet, we create a new node and move to it.
    • For the second pair ('n', 'n'), again, create a new node and move to it.
    • For the third pair ('e', 'o'), create a new node and move to it.
  5. At the end of insertion of the character pairs from "one", the current node's count is increased by 1.

  6. Move on to the next word "onenote". Start inserting pairs: ('o', 'e'), ('n', 't'), ('e', 'o'), ('n', 'n'), ('o', 'e').

    • When we insert the first pair ('o', 'e'), we find it's already in the trie, so we just move to that node and increment ans by the count at the node (which is 0 at this point since no other strings have been encountered that end with 'e' and start with 'o').
    • Continue inserting pairs, creating new nodes for those which don't already exist (('n', 't') and ('e', 'o')), and incrementing ans when we encounter existing nodes (like for the last pair ('o', 'e'), ans would be incremented again).
  7. The pattern repeats for the last word "note". The pairs here are: ('n', 'e'), ('o', 't'), ('t', 'o'), ('e', 'n').

    • Since none of these character pairs exist in the trie yet, new nodes will be created, and ans remains unchanged during this insertion.
  8. At the end of this process, the ans variable contains the total count of valid (i, j) index pairs where word[i] is a prefix and suffix of word[j].

Following the above steps the ans will be 1 because "one" is both a prefix and suffix of "onenote". Therefore, we have a valid index pair (0, 1). The word "note" does not share the prefix and suffix relation with any other string in the array, so no more pairs are added.

The solution approach efficiently finds this count without having to compare each pair of words directly, which would be less efficient, especially as the array of words grows larger.

Solution Implementation

1class TrieNode:
2    __slots__ = ["children", "count"]
3
4    def __init__(self):
5        self.children = {}  # Each node keeps a dictionary of its children
6        self.count = 0  # This counts the number of times a particular node is reached
7
8
9class Solution:
10    def count_prefix_suffix_pairs(self, words: List[str]) -> int:
11        """
12        This method counts the number of pairs (i, j) such that the ith word has a
13        prefix which is a suffix of the jth word in reverse (not necessarily distinct).
14
15        :param words: list of input words.
16        :return: number of prefix-suffix pairs.
17        """
18        answer = 0
19        trie = TrieNode()
20
21        # loop through each word in the list
22        for word in words:
23            node = trie  # Start at the root of the trie
24
25            # Use zip to pair each character in the word with its corresponding character in the reversed word
26            # This effectively checks for the condition where a prefix is a reverse suffix
27            for pair in zip(word, reversed(word)):
28                # If this pair doesn't exist in the children, add it with a new TrieNode
29                if pair not in node.children:
30                    node.children[pair] = TrieNode()
31
32                # Move to the child node representing this pair
33                node = node.children[pair]
34
35                # add the count of the node to the answer, as it represents the number
36                # of times a prefix has been the reverse of suffixes seen so far
37                answer += node.count
38
39            # Increment the count of the current node to indicate we've seen another word with this (prefix, suffix) pair
40            node.count += 1
41
42        return answer
43
1import java.util.Map;
2import java.util.HashMap;
3
4// Node class that represents each node in the Trie
5class Node {
6    // Map children stores child nodes using an integer key
7    Map<Integer, Node> children = new HashMap<>();
8    // cnt is the count of words passing through this node
9    int count;
10}
11
12// Solution class containing a method to count prefix-suffix pairs
13class Solution {
14
15    // Method to count the number of pairs (prefix, suffix) where prefix + suffix forms any of the words in the array
16    public long countPrefixSuffixPairs(String[] words) {
17        // Initialize our answer to be returned
18        long answer = 0;
19        // Create the root of the Trie
20        Node trieRoot = new Node();
21        // Iterating through each word in the array of words
22        for (String word : words) {
23            // Start from the root for each word
24            Node currentNode = trieRoot;
25            // Get the length of the current word
26            int wordLength = word.length();
27            // Loop through each character in the word
28            for (int i = 0; i < wordLength; ++i) {
29                // Generate a unique integer key based on current pairs of characters from prefix and suffix
30                int key = word.charAt(i) * 32 + word.charAt(wordLength - i - 1);
31                // If key does not exist in children, initialize it with a new Node
32                currentNode.children.putIfAbsent(key, new Node());
33                // Move currentNode to its child node associated with the key
34                currentNode = currentNode.children.get(key);
35                // Increment the answer by the count value of the currentNode as it represents previous words counted passing through it
36                answer += currentNode.count;
37            }
38            // After going through each character of the word, increment the count at the currentNode
39            ++currentNode.count;
40        }
41        // Return the total count of prefix-suffix pairs
42        return answer;
43    }
44}
45
1#include <string>
2#include <vector>
3#include <unordered_map>
4using namespace std;
5
6// Definition of the Trie node class.
7class TrieNode {
8public:
9    unordered_map<int, TrieNode*> children; // Holds the children of the node.
10    int count; // Keeps track of the count of words that pass through this node.
11
12    // Constructor initializes the count to 0.
13    TrieNode() : count(0) {}
14};
15
16class Solution {
17public:
18    // Method to count the number of prefix-suffix pairs in the given list of words.
19    long long countPrefixSuffixPairs(vector<string>& words) {
20        long long ans = 0; // Initialize the answer to 0.
21        TrieNode* trie = new TrieNode(); // Create a new Trie.
22
23        // Loop through each word in the provided list of words.
24        for (const string& word : words) {
25            TrieNode* node = trie; // Start from the root of the Trie.
26            int wordLength = word.length(); // Get the length of the current word.
27
28            // Loop through each character in the word.
29            for (int i = 0; i < wordLength; ++i) {
30                // Create a unique key for each pair of prefix and suffix characters.
31                int key = word[i] * 32 + word[wordLength - i - 1];
32
33                // If the current key is not found among the children of the node, create a new node.
34                if (node->children.find(key) == node->children.end()) {
35                    node->children[key] = new TrieNode();
36                }
37
38                // Move to the corresponding child node.
39                node = node->children[key];
40
41                // Increment the answer by the count of the current node.
42                // This count represents the number of times a word with this prefix-suffix pair has been seen.
43                ans += node->count;
44            }
45
46            // After the last character, increment the count of the node to signify
47            // that another word with this prefix-suffix pair has been encountered.
48            ++node->count;
49        }
50
51        // Return the final count of prefix-suffix pairs.
52        return ans;
53    }
54};
55
1// Map structure to represent a node in the trie
2let childrenMap: Map<number, Node> = new Map<number, Node>();
3let count: number = 0;
4
5// Function to count the number of prefix-suffix pairs in a given list of words
6function countPrefixSuffixPairs(words: string[]): number {
7    let answer: number = 0;
8    const trieRoot: Node = { children: new Map<number, Node>(), cnt: 0 };
9
10    // Iterate over each word in the list
11    for (const word of words) {
12        let currentNode: Node = trieRoot;
13        const wordLength: number = word.length;
14
15        // Iterate over each character in the word, forming a pair with its symmetric character
16        for (let i = 0; i < wordLength; ++i) {
17            const pairKey: number = word.charCodeAt(i) * 32 + word.charCodeAt(wordLength - i - 1);
18
19            // If the pair doesn't exist in the current node's children, create a new node for it
20            if (!currentNode.children.has(pairKey)) {
21                currentNode.children.set(pairKey, { children: new Map<number, Node>(), cnt: 0 });
22            }
23
24            // Move to the child node corresponding to the pair
25            currentNode = currentNode.children.get(pairKey)!;
26
27            // Add the number of times this node was encountered (this helps in counting pairs)
28            answer += currentNode.cnt;
29        }
30
31        // Increment the count for the current node since we ended at this node
32        ++currentNode.cnt;
33    }
34
35    // Return the total count of prefix-suffix pairs
36    return answer;
37}
38
39// Example call to the function
40// const result = countPrefixSuffixPairs(["abc", "def"]);
41// console.log(result); // Output depends on the input words provided
42
Not Sure What to Study? Take the 2-min Quiz:

Given an array of 1,000,000 integers that is almost sorted, except for 2 pairs of integers. Which algorithm is fastest for sorting the array?

Time and Space Complexity

The time complexity of the code is O(n * m) where n is the number of words in the input list words and m is the maximum length of a word in words. This is because for each word, the code iterates through each character twice - once in the original order and once in the reversed order. In the worst case, it will visit each node up to m times, and there are n words to process.

The space complexity of the code is also O(n * m) because in the worst case, each character pair (p) of every word might be unique, constituting a new path in the trie. Since each unique pair would need a new node, this would require space proportional to the number of pairs, which is m for one word and n * m for all words.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

What is an advantages of top-down dynamic programming vs bottom-up dynamic programming?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫