244. Shortest Word Distance II π
Problem Description
You need to design a data structure that stores an array of strings and can efficiently answer queries about the shortest distance between any two different strings in that array.
The data structure should support two operations:
-
Initialization: Create a
WordDistance
object with an array of strings calledwordsDict
. This array can contain duplicate words at different positions. -
Query: Given two different strings
word1
andword2
, find the minimum distance between any occurrence ofword1
and any occurrence ofword2
in the original array. The distance is calculated as the absolute difference between their indices.
For example, if wordsDict = ["practice", "makes", "perfect", "coding", "makes"]
:
- The word "makes" appears at indices 1 and 4
- The word "coding" appears at index 3
- The shortest distance between "makes" and "coding" would be
min(|1-3|, |4-3|) = 1
The key insight in the solution is to preprocess the input array during initialization by storing all indices where each word appears. When querying for the shortest distance, it uses a two-pointer technique to efficiently find the minimum distance between the two sorted lists of indices.
The two-pointer approach works because both index lists are naturally sorted (since we iterate through the original array from left to right). At each step, we calculate the distance between the current pair of indices and move the pointer pointing to the smaller index forward, ensuring we explore all possible pairs efficiently in O(m + n)
time where m
and n
are the number of occurrences of word1
and word2
respectively.
Intuition
The naive approach would be to scan through the entire array for each query, finding all positions of word1
and word2
, then computing all pairwise distances. However, this would be inefficient if we have many queries to answer.
Since we're going to answer multiple queries on the same array, we can optimize by preprocessing the data once during initialization. The key observation is that we need to know where each word appears in the array. By storing these positions upfront, we can avoid repeatedly scanning the entire array.
Once we have the positions of both words, we need to find the closest pair of indices. If word1
appears at positions [1, 5, 9]
and word2
appears at positions [3, 7, 11]
, we need to find the minimum of all possible distances: |1-3|
, |1-7|
, |1-11|
, |5-3|
, |5-7|
, |5-11|
, |9-3|
, |9-7|
, |9-11|
.
Computing all pairwise distances would take O(m * n)
time. But notice that these position lists are sorted! This is natural because we collected them by iterating through the array from left to right.
With sorted lists, we can use a two-pointer technique. Start with pointers at the beginning of both lists. At each step:
- Calculate the distance between the current pair
- Move the pointer pointing to the smaller value forward
Why move the smaller pointer? Because if we have positions a[i]
and b[j]
where a[i] < b[j]
, moving j
forward will only increase the distance b[j] - a[i]
. The only way to potentially find a smaller distance is to move i
forward to get closer to b[j]
.
This greedy approach ensures we check all potentially optimal pairs while avoiding unnecessary comparisons, reducing the time complexity from O(m * n)
to O(m + n)
.
Solution Approach
The implementation consists of two main components: initialization and the shortest distance query.
Initialization (__init__
method):
We use a defaultdict(list)
to store a mapping from each word to all its positions in the array. This data structure allows us to:
- Automatically create an empty list for new words
- Append positions efficiently in
O(1)
time
self.d = defaultdict(list)
for i, w in enumerate(wordsDict):
self.d[w].append(i)
By iterating through wordsDict
once with enumerate
, we collect all indices where each word appears. Since we iterate from left to right, the indices for each word are naturally stored in ascending order.
Shortest Distance Query (shortest
method):
First, we retrieve the position lists for both words:
a, b = self.d[word1], self.d[word2]
Then we apply the two-pointer technique:
ans = inf
i = j = 0
while i < len(a) and j < len(b):
ans = min(ans, abs(a[i] - b[j]))
if a[i] <= b[j]:
i += 1
else:
j += 1
The algorithm works as follows:
- Initialize two pointers
i
andj
at the start of both position lists - Initialize
ans
to infinity to track the minimum distance - While both pointers are within bounds:
- Calculate the distance between current positions
abs(a[i] - b[j])
- Update
ans
if we found a smaller distance - Move the pointer pointing to the smaller position forward
- Calculate the distance between current positions
The key decision is which pointer to advance. If a[i] <= b[j]
:
- Moving
j
forward would only increase the distance sinceb[j+1] > b[j]
- Moving
i
forward might decrease the distance by gettinga[i]
closer tob[j]
This greedy strategy ensures we explore all potentially optimal pairs efficiently.
Time Complexity:
- Initialization:
O(n)
wheren
is the length ofwordsDict
- Query:
O(m + k)
wherem
andk
are the number of occurrences ofword1
andword2
Space Complexity:
O(n)
to store all word positions in the hash map
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to illustrate how the solution works.
Given: wordsDict = ["the", "quick", "brown", "fox", "quick"]
Step 1: Initialization
We build a hash map storing each word's positions:
- "the" β [0]
- "quick" β [1, 4]
- "brown" β [2]
- "fox" β [3]
Step 2: Query for shortest("quick", "fox")
We retrieve the position lists:
a = [1, 4]
(positions of "quick")b = [3]
(positions of "fox")
Now we use the two-pointer technique:
Step | i | j | a[i] | b[j] | Distance | ans | Action |
---|---|---|---|---|---|---|---|
1 | 0 | 0 | 1 | 3 | |1-3| = 2 | 2 | a[i] < b[j], move i |
2 | 1 | 0 | 4 | 3 | |4-3| = 1 | 1 | a[i] > b[j], move j |
3 | 1 | 1 | - | - | j out of bounds | 1 | Stop |
The algorithm correctly finds that the minimum distance is 1 (between "fox" at index 3 and "quick" at index 4).
Why the two-pointer approach works:
Consider what happens at Step 1: We have "quick" at position 1 and "fox" at position 3. Since 1 < 3, we move pointer i
forward. Why? Because:
- If we moved
j
forward instead, we'd be out of bounds (no more "fox" positions) - The only way to potentially find a smaller distance is to try the next "quick" position (index 4), which is closer to "fox" at index 3
This greedy choice ensures we examine all potentially optimal pairs without checking unnecessary combinations, achieving O(m + n) time complexity instead of O(m Γ n).
Solution Implementation
1from collections import defaultdict
2from typing import List
3import math
4
5class WordDistance:
6 def __init__(self, wordsDict: List[str]):
7 """
8 Initialize the WordDistance object with a list of words.
9 Store the indices of each word for efficient lookup.
10
11 Args:
12 wordsDict: List of strings representing words in a document
13 """
14 # Dictionary to store list of indices for each word
15 self.word_indices = defaultdict(list)
16
17 # Store the index of each word occurrence
18 for index, word in enumerate(wordsDict):
19 self.word_indices[word].append(index)
20
21 def shortest(self, word1: str, word2: str) -> int:
22 """
23 Find the shortest distance between two words in the original list.
24 Uses two-pointer technique to efficiently find minimum distance.
25
26 Args:
27 word1: First word to find
28 word2: Second word to find
29
30 Returns:
31 The minimum distance between any occurrence of word1 and word2
32 """
33 # Get the sorted lists of indices for both words
34 indices1 = self.word_indices[word1]
35 indices2 = self.word_indices[word2]
36
37 # Initialize minimum distance to infinity
38 min_distance = math.inf
39
40 # Two pointers to traverse both index lists
41 pointer1 = 0
42 pointer2 = 0
43
44 # Use two-pointer technique to find minimum distance
45 while pointer1 < len(indices1) and pointer2 < len(indices2):
46 # Calculate distance between current indices
47 current_distance = abs(indices1[pointer1] - indices2[pointer2])
48 min_distance = min(min_distance, current_distance)
49
50 # Move the pointer pointing to the smaller index
51 # This ensures we check all possible pairs efficiently
52 if indices1[pointer1] <= indices2[pointer2]:
53 pointer1 += 1
54 else:
55 pointer2 += 1
56
57 return min_distance
58
59
60# Your WordDistance object will be instantiated and called as such:
61# obj = WordDistance(wordsDict)
62# param_1 = obj.shortest(word1, word2)
63
1class WordDistance {
2 // Map to store each word and its list of indices in the original array
3 private Map<String, List<Integer>> wordIndicesMap = new HashMap<>();
4
5 /**
6 * Constructor that preprocesses the words dictionary.
7 * Stores each word along with all positions where it appears.
8 * @param wordsDict Array of words to preprocess
9 */
10 public WordDistance(String[] wordsDict) {
11 // Iterate through the words array and record each word's position
12 for (int index = 0; index < wordsDict.length; index++) {
13 // If word doesn't exist in map, create new list; then add current index
14 wordIndicesMap.computeIfAbsent(wordsDict[index], key -> new ArrayList<>()).add(index);
15 }
16 }
17
18 /**
19 * Finds the shortest distance between two words in the dictionary.
20 * Uses two-pointer technique since indices lists are sorted.
21 * @param word1 First word to find
22 * @param word2 Second word to find
23 * @return Minimum distance between any occurrence of word1 and word2
24 */
25 public int shortest(String word1, String word2) {
26 // Get the sorted lists of indices for both words
27 List<Integer> indicesWord1 = wordIndicesMap.get(word1);
28 List<Integer> indicesWord2 = wordIndicesMap.get(word2);
29
30 // Initialize minimum distance to a large value
31 int minDistance = Integer.MAX_VALUE;
32
33 // Two pointers for traversing both index lists
34 int pointer1 = 0;
35 int pointer2 = 0;
36
37 // Use two-pointer technique to find minimum distance
38 while (pointer1 < indicesWord1.size() && pointer2 < indicesWord2.size()) {
39 // Calculate distance between current indices and update minimum
40 int currentDistance = Math.abs(indicesWord1.get(pointer1) - indicesWord2.get(pointer2));
41 minDistance = Math.min(minDistance, currentDistance);
42
43 // Move the pointer pointing to the smaller index
44 // This ensures we check all possible pairs efficiently
45 if (indicesWord1.get(pointer1) <= indicesWord2.get(pointer2)) {
46 pointer1++;
47 } else {
48 pointer2++;
49 }
50 }
51
52 return minDistance;
53 }
54}
55
56/**
57 * Your WordDistance object will be instantiated and called as such:
58 * WordDistance obj = new WordDistance(wordsDict);
59 * int param_1 = obj.shortest(word1,word2);
60 */
61
1class WordDistance {
2public:
3 /**
4 * Constructor: Preprocesses the word list to store positions of each word
5 * @param wordsDict: Input array of words
6 * Time Complexity: O(n) where n is the size of wordsDict
7 * Space Complexity: O(n) for storing all word positions
8 */
9 WordDistance(vector<string>& wordsDict) {
10 // Build a map where key is word and value is list of positions
11 for (int i = 0; i < wordsDict.size(); ++i) {
12 wordPositions[wordsDict[i]].push_back(i);
13 }
14 }
15
16 /**
17 * Finds the shortest distance between two words in the dictionary
18 * @param word1: First word
19 * @param word2: Second word
20 * @return: Minimum distance between any occurrence of word1 and word2
21 * Time Complexity: O(m + n) where m and n are occurrence counts of word1 and word2
22 */
23 int shortest(string word1, string word2) {
24 // Get position lists for both words
25 vector<int> positions1 = wordPositions[word1];
26 vector<int> positions2 = wordPositions[word2];
27
28 // Two pointers to traverse both position lists
29 int i = 0;
30 int j = 0;
31 int minDistance = INT_MAX;
32
33 // Use two-pointer technique since position lists are sorted
34 while (i < positions1.size() && j < positions2.size()) {
35 // Update minimum distance
36 minDistance = min(minDistance, abs(positions1[i] - positions2[j]));
37
38 // Move pointer pointing to smaller position
39 // This ensures we check all possible pairs efficiently
40 if (positions1[i] <= positions2[j]) {
41 ++i;
42 } else {
43 ++j;
44 }
45 }
46
47 return minDistance;
48 }
49
50private:
51 // Map to store word -> list of positions in the original array
52 unordered_map<string, vector<int>> wordPositions;
53};
54
55/**
56 * Your WordDistance object will be instantiated and called as such:
57 * WordDistance* obj = new WordDistance(wordsDict);
58 * int param_1 = obj->shortest(word1,word2);
59 */
60
1// Map to store word -> list of positions in the original array
2const wordPositions: Map<string, number[]> = new Map();
3
4/**
5 * Constructor: Preprocesses the word list to store positions of each word
6 * @param wordsDict - Input array of words
7 * Time Complexity: O(n) where n is the size of wordsDict
8 * Space Complexity: O(n) for storing all word positions
9 */
10function WordDistance(wordsDict: string[]): void {
11 // Clear previous data
12 wordPositions.clear();
13
14 // Build a map where key is word and value is list of positions
15 for (let i = 0; i < wordsDict.length; i++) {
16 const word = wordsDict[i];
17 if (!wordPositions.has(word)) {
18 wordPositions.set(word, []);
19 }
20 wordPositions.get(word)!.push(i);
21 }
22}
23
24/**
25 * Finds the shortest distance between two words in the dictionary
26 * @param word1 - First word
27 * @param word2 - Second word
28 * @returns Minimum distance between any occurrence of word1 and word2
29 * Time Complexity: O(m + n) where m and n are occurrence counts of word1 and word2
30 */
31function shortest(word1: string, word2: string): number {
32 // Get position lists for both words
33 const positions1 = wordPositions.get(word1) || [];
34 const positions2 = wordPositions.get(word2) || [];
35
36 // Two pointers to traverse both position lists
37 let i = 0;
38 let j = 0;
39 let minDistance = Number.MAX_SAFE_INTEGER;
40
41 // Use two-pointer technique since position lists are sorted
42 while (i < positions1.length && j < positions2.length) {
43 // Update minimum distance
44 minDistance = Math.min(minDistance, Math.abs(positions1[i] - positions2[j]));
45
46 // Move pointer pointing to smaller position
47 // This ensures we check all possible pairs efficiently
48 if (positions1[i] <= positions2[j]) {
49 i++;
50 } else {
51 j++;
52 }
53 }
54
55 return minDistance;
56}
57
Time and Space Complexity
Time Complexity:
-
Constructor
__init__
:O(n)
wheren
is the length ofwordsDict
. We iterate through the entire list once to build the dictionary, and each append operation isO(1)
. -
Method
shortest
:O(p + q)
wherep
is the length of the position list forword1
andq
is the length of the position list forword2
. The two-pointer approach ensures we visit each element in both lists at most once. In the worst case where one word appearsm
times and another appearsk
times, the complexity would beO(m + k)
.
Space Complexity:
-
Constructor
__init__
:O(n)
wheren
is the length ofwordsDict
. We store all indices in the dictionary. In the worst case, if all words are unique, we storen
entries with one index each. If all words are the same, we store one entry withn
indices. Either way, the total space isO(n)
. -
Method
shortest
:O(1)
as we only use a constant amount of extra space for variablesa
,b
,ans
,i
, andj
. Note thata
andb
are references to existing lists in the dictionary, not new copies. -
Overall space complexity for the class:
O(n)
to maintain the dictionary throughout the object's lifetime.
Common Pitfalls
1. Not Handling Edge Cases with Single Word Occurrences
A common mistake is assuming both words will have multiple occurrences. When either word appears only once, the algorithm still works correctly, but developers might overthink and add unnecessary special case handling that could introduce bugs.
Problematic approach:
def shortest(self, word1: str, word2: str) -> int:
indices1 = self.word_indices[word1]
indices2 = self.word_indices[word2]
# Unnecessary special case handling
if len(indices1) == 1 and len(indices2) == 1:
return abs(indices1[0] - indices2[0])
# Rest of the two-pointer logic...
Why it's a pitfall: The two-pointer algorithm already handles single occurrences correctly without special cases, making this extra code redundant and potentially error-prone.
2. Incorrect Pointer Movement Logic
A critical mistake is moving both pointers simultaneously or using the wrong comparison for deciding which pointer to advance.
Incorrect implementation:
while pointer1 < len(indices1) and pointer2 < len(indices2):
current_distance = abs(indices1[pointer1] - indices2[pointer2])
min_distance = min(min_distance, current_distance)
# WRONG: Moving both pointers
pointer1 += 1
pointer2 += 1
Why it fails: This would miss many valid pairs. For example, with indices1=[1,5] and indices2=[3,7], it would only check pairs (1,3) and (5,7), missing the optimal pair (5,3) with distance 2.
Correct approach: Always move only the pointer pointing to the smaller index to ensure all potentially optimal pairs are examined.
3. Using Regular Dictionary Instead of defaultdict
Problematic initialization:
def __init__(self, wordsDict: List[str]):
self.word_indices = {} # Regular dict
for index, word in enumerate(wordsDict):
# Need to check if key exists first
if word not in self.word_indices:
self.word_indices[word] = []
self.word_indices[word].append(index)
Why it's less optimal: While functionally correct, this requires an extra existence check for every word, making the code more verbose and slightly less efficient.
4. Not Considering Words That Don't Exist in the Dictionary
Missing validation:
def shortest(self, word1: str, word2: str) -> int:
# Directly accessing without checking existence
indices1 = self.word_indices[word1] # Could be empty list with defaultdict
indices2 = self.word_indices[word2] # or KeyError with regular dict
Solution: Add validation if the problem requires handling non-existent words:
def shortest(self, word1: str, word2: str) -> int:
if word1 not in self.word_indices or word2 not in self.word_indices:
return -1 # or raise an exception based on requirements
indices1 = self.word_indices[word1]
indices2 = self.word_indices[word2]
# Continue with two-pointer logic...
5. Confusion About When to Update Minimum Distance
Some developers might think they should only update the minimum when indices are "close enough" or add unnecessary conditions.
Overcomplicated logic:
while pointer1 < len(indices1) and pointer2 < len(indices2):
current_distance = abs(indices1[pointer1] - indices2[pointer2])
# Unnecessary condition - we should always check for minimum
if current_distance < min_distance:
min_distance = current_distance
# WRONG: Early termination
if min_distance == 1:
return 1 # Missing other potential distance-1 pairs
Why it's problematic: While early termination when distance equals 1 might seem like an optimization, it's unnecessary complexity that doesn't significantly improve performance and could lead to maintenance issues.
What is an advantages of top-down dynamic programming vs bottom-up dynamic programming?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!