244. Shortest Word Distance II

MediumDesignArrayHash TableTwo PointersString

Problem Description

You need to design a data structure that stores an array of strings and can efficiently answer queries about the shortest distance between any two different strings in that array.

The data structure should support two operations:

Initialization: Create a WordDistance object with an array of strings called wordsDict. This array can contain duplicate words at different positions.
Query: Given two different strings word1 and word2, find the minimum distance between any occurrence of word1 and any occurrence of word2 in the original array. The distance is calculated as the absolute difference between their indices.

For example, if wordsDict = ["practice", "makes", "perfect", "coding", "makes"]:

The word "makes" appears at indices 1 and 4
The word "coding" appears at index 3
The shortest distance between "makes" and "coding" would be min(|1-3|, |4-3|) = 1

The key insight in the solution is to preprocess the input array during initialization by storing all indices where each word appears. When querying for the shortest distance, it uses a two-pointer technique to efficiently find the minimum distance between the two sorted lists of indices.

The two-pointer approach works because both index lists are naturally sorted (since we iterate through the original array from left to right). At each step, we calculate the distance between the current pair of indices and move the pointer pointing to the smaller index forward, ensuring we explore all possible pairs efficiently in O(m + n) time where m and n are the number of occurrences of word1 and word2 respectively.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

The naive approach would be to scan through the entire array for each query, finding all positions of word1 and word2, then computing all pairwise distances. However, this would be inefficient if we have many queries to answer.

Since we're going to answer multiple queries on the same array, we can optimize by preprocessing the data once during initialization. The key observation is that we need to know where each word appears in the array. By storing these positions upfront, we can avoid repeatedly scanning the entire array.

Once we have the positions of both words, we need to find the closest pair of indices. If word1 appears at positions [1, 5, 9] and word2 appears at positions [3, 7, 11], we need to find the minimum of all possible distances: |1-3|, |1-7|, |1-11|, |5-3|, |5-7|, |5-11|, |9-3|, |9-7|, |9-11|.

Computing all pairwise distances would take O(m * n) time. But notice that these position lists are sorted! This is natural because we collected them by iterating through the array from left to right.

With sorted lists, we can use a two-pointer technique. Start with pointers at the beginning of both lists. At each step:

Calculate the distance between the current pair
Move the pointer pointing to the smaller value forward

Why move the smaller pointer? Because if we have positions a[i] and b[j] where a[i] < b[j], moving j forward will only increase the distance b[j] - a[i]. The only way to potentially find a smaller distance is to move i forward to get closer to b[j].

This greedy approach ensures we check all potentially optimal pairs while avoiding unnecessary comparisons, reducing the time complexity from O(m * n) to O(m + n).

Solution Approach

The implementation consists of two main components: initialization and the shortest distance query.

Initialization (__init__ method):

We use a defaultdict(list) to store a mapping from each word to all its positions in the array. This data structure allows us to:

Automatically create an empty list for new words
Append positions efficiently in O(1) time

self.d = defaultdict(list)
for i, w in enumerate(wordsDict):
    self.d[w].append(i)

By iterating through wordsDict once with enumerate, we collect all indices where each word appears. Since we iterate from left to right, the indices for each word are naturally stored in ascending order.

Shortest Distance Query (shortest method):

First, we retrieve the position lists for both words:

a, b = self.d[word1], self.d[word2]

Then we apply the two-pointer technique:

ans = inf
i = j = 0
while i < len(a) and j < len(b):
    ans = min(ans, abs(a[i] - b[j]))
    if a[i] <= b[j]:
        i += 1
    else:
        j += 1

The algorithm works as follows:

Initialize two pointers i and j at the start of both position lists
Initialize ans to infinity to track the minimum distance
While both pointers are within bounds:
- Calculate the distance between current positions abs(a[i] - b[j])
- Update ans if we found a smaller distance
- Move the pointer pointing to the smaller position forward

The key decision is which pointer to advance. If a[i] <= b[j]:

Moving j forward would only increase the distance since b[j+1] > b[j]
Moving i forward might decrease the distance by getting a[i] closer to b[j]

This greedy strategy ensures we explore all potentially optimal pairs efficiently.

Time Complexity:

Initialization: O(n) where n is the length of wordsDict
Query: O(m + k) where m and k are the number of occurrences of word1 and word2

Space Complexity:

O(n) to store all word positions in the hash map

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to illustrate how the solution works.

Given: wordsDict = ["the", "quick", "brown", "fox", "quick"]

Step 1: Initialization

We build a hash map storing each word's positions:

"the" → [0]
"quick" → [1, 4]
"brown" → [2]
"fox" → [3]

Step 2: Query for shortest("quick", "fox")

We retrieve the position lists:

a = [1, 4] (positions of "quick")
b = [3] (positions of "fox")

Now we use the two-pointer technique:

Step	i	j	a[i]	b[j]	Distance	ans	Action
1	0	0	1	3	\|1-3\| = 2	2	a[i] < b[j], move i
2	1	0	4	3	\|4-3\| = 1	1	a[i] > b[j], move j
3	1	1	-	-	j out of bounds	1	Stop

The algorithm correctly finds that the minimum distance is 1 (between "fox" at index 3 and "quick" at index 4).

Why the two-pointer approach works:

Consider what happens at Step 1: We have "quick" at position 1 and "fox" at position 3. Since 1 < 3, we move pointer i forward. Why? Because:

If we moved j forward instead, we'd be out of bounds (no more "fox" positions)
The only way to potentially find a smaller distance is to try the next "quick" position (index 4), which is closer to "fox" at index 3

This greedy choice ensures we examine all potentially optimal pairs without checking unnecessary combinations, achieving O(m + n) time complexity instead of O(m × n).

Solution Implementation

1from collections import defaultdict
2from typing import List
3import math
4
5class WordDistance:
6    def __init__(self, wordsDict: List[str]):
7        """
8        Initialize the WordDistance object with a list of words.
9        Store the indices of each word for efficient lookup.
10
11        Args:
12            wordsDict: List of strings representing words in a document
13        """
14        # Dictionary to store list of indices for each word
15        self.word_indices = defaultdict(list)
16
17        # Store the index of each word occurrence
18        for index, word in enumerate(wordsDict):
19            self.word_indices[word].append(index)
20
21    def shortest(self, word1: str, word2: str) -> int:
22        """
23        Find the shortest distance between two words in the original list.
24        Uses two-pointer technique to efficiently find minimum distance.
25
26        Args:
27            word1: First word to find
28            word2: Second word to find
29
30        Returns:
31            The minimum distance between any occurrence of word1 and word2
32        """
33        # Get the sorted lists of indices for both words
34        indices1 = self.word_indices[word1]
35        indices2 = self.word_indices[word2]
36
37        # Initialize minimum distance to infinity
38        min_distance = math.inf
39
40        # Two pointers to traverse both index lists
41        pointer1 = 0
42        pointer2 = 0
43
44        # Use two-pointer technique to find minimum distance
45        while pointer1 < len(indices1) and pointer2 < len(indices2):
46            # Calculate distance between current indices
47            current_distance = abs(indices1[pointer1] - indices2[pointer2])
48            min_distance = min(min_distance, current_distance)
49
50            # Move the pointer pointing to the smaller index
51            # This ensures we check all possible pairs efficiently
52            if indices1[pointer1] <= indices2[pointer2]:
53                pointer1 += 1
54            else:
55                pointer2 += 1
56
57        return min_distance
58
59
60# Your WordDistance object will be instantiated and called as such:
61# obj = WordDistance(wordsDict)
62# param_1 = obj.shortest(word1, word2)
63

1class WordDistance {
2    // Map to store each word and its list of indices in the original array
3    private Map<String, List<Integer>> wordIndicesMap = new HashMap<>();
4
5    /**
6     * Constructor that preprocesses the words dictionary.
7     * Stores each word along with all positions where it appears.
8     * @param wordsDict Array of words to preprocess
9     */
10    public WordDistance(String[] wordsDict) {
11        // Iterate through the words array and record each word's position
12        for (int index = 0; index < wordsDict.length; index++) {
13            // If word doesn't exist in map, create new list; then add current index
14            wordIndicesMap.computeIfAbsent(wordsDict[index], key -> new ArrayList<>()).add(index);
15        }
16    }
17
18    /**
19     * Finds the shortest distance between two words in the dictionary.
20     * Uses two-pointer technique since indices lists are sorted.
21     * @param word1 First word to find
22     * @param word2 Second word to find
23     * @return Minimum distance between any occurrence of word1 and word2
24     */
25    public int shortest(String word1, String word2) {
26        // Get the sorted lists of indices for both words
27        List<Integer> indicesWord1 = wordIndicesMap.get(word1);
28        List<Integer> indicesWord2 = wordIndicesMap.get(word2);
29
30        // Initialize minimum distance to a large value
31        int minDistance = Integer.MAX_VALUE;
32
33        // Two pointers for traversing both index lists
34        int pointer1 = 0;
35        int pointer2 = 0;
36
37        // Use two-pointer technique to find minimum distance
38        while (pointer1 < indicesWord1.size() && pointer2 < indicesWord2.size()) {
39            // Calculate distance between current indices and update minimum
40            int currentDistance = Math.abs(indicesWord1.get(pointer1) - indicesWord2.get(pointer2));
41            minDistance = Math.min(minDistance, currentDistance);
42
43            // Move the pointer pointing to the smaller index
44            // This ensures we check all possible pairs efficiently
45            if (indicesWord1.get(pointer1) <= indicesWord2.get(pointer2)) {
46                pointer1++;
47            } else {
48                pointer2++;
49            }
50        }
51
52        return minDistance;
53    }
54}
55
56/**
57 * Your WordDistance object will be instantiated and called as such:
58 * WordDistance obj = new WordDistance(wordsDict);
59 * int param_1 = obj.shortest(word1,word2);
60 */
61

1class WordDistance {
2public:
3    /**
4     * Constructor: Preprocesses the word list to store positions of each word
5     * @param wordsDict: Input array of words
6     * Time Complexity: O(n) where n is the size of wordsDict
7     * Space Complexity: O(n) for storing all word positions
8     */
9    WordDistance(vector<string>& wordsDict) {
10        // Build a map where key is word and value is list of positions
11        for (int i = 0; i < wordsDict.size(); ++i) {
12            wordPositions[wordsDict[i]].push_back(i);
13        }
14    }
15
16    /**
17     * Finds the shortest distance between two words in the dictionary
18     * @param word1: First word
19     * @param word2: Second word
20     * @return: Minimum distance between any occurrence of word1 and word2
21     * Time Complexity: O(m + n) where m and n are occurrence counts of word1 and word2
22     */
23    int shortest(string word1, string word2) {
24        // Get position lists for both words
25        vector<int> positions1 = wordPositions[word1];
26        vector<int> positions2 = wordPositions[word2];
27
28        // Two pointers to traverse both position lists
29        int i = 0;
30        int j = 0;
31        int minDistance = INT_MAX;
32
33        // Use two-pointer technique since position lists are sorted
34        while (i < positions1.size() && j < positions2.size()) {
35            // Update minimum distance
36            minDistance = min(minDistance, abs(positions1[i] - positions2[j]));
37
38            // Move pointer pointing to smaller position
39            // This ensures we check all possible pairs efficiently
40            if (positions1[i] <= positions2[j]) {
41                ++i;
42            } else {
43                ++j;
44            }
45        }
46
47        return minDistance;
48    }
49
50private:
51    // Map to store word -> list of positions in the original array
52    unordered_map<string, vector<int>> wordPositions;
53};
54
55/**
56 * Your WordDistance object will be instantiated and called as such:
57 * WordDistance* obj = new WordDistance(wordsDict);
58 * int param_1 = obj->shortest(word1,word2);
59 */
60

1// Map to store word -> list of positions in the original array
2const wordPositions: Map<string, number[]> = new Map();
3
4/**
5 * Constructor: Preprocesses the word list to store positions of each word
6 * @param wordsDict - Input array of words
7 * Time Complexity: O(n) where n is the size of wordsDict
8 * Space Complexity: O(n) for storing all word positions
9 */
10function WordDistance(wordsDict: string[]): void {
11    // Clear previous data
12    wordPositions.clear();
13
14    // Build a map where key is word and value is list of positions
15    for (let i = 0; i < wordsDict.length; i++) {
16        const word = wordsDict[i];
17        if (!wordPositions.has(word)) {
18            wordPositions.set(word, []);
19        }
20        wordPositions.get(word)!.push(i);
21    }
22}
23
24/**
25 * Finds the shortest distance between two words in the dictionary
26 * @param word1 - First word
27 * @param word2 - Second word
28 * @returns Minimum distance between any occurrence of word1 and word2
29 * Time Complexity: O(m + n) where m and n are occurrence counts of word1 and word2
30 */
31function shortest(word1: string, word2: string): number {
32    // Get position lists for both words
33    const positions1 = wordPositions.get(word1) || [];
34    const positions2 = wordPositions.get(word2) || [];
35
36    // Two pointers to traverse both position lists
37    let i = 0;
38    let j = 0;
39    let minDistance = Number.MAX_SAFE_INTEGER;
40
41    // Use two-pointer technique since position lists are sorted
42    while (i < positions1.length && j < positions2.length) {
43        // Update minimum distance
44        minDistance = Math.min(minDistance, Math.abs(positions1[i] - positions2[j]));
45
46        // Move pointer pointing to smaller position
47        // This ensures we check all possible pairs efficiently
48        if (positions1[i] <= positions2[j]) {
49            i++;
50        } else {
51            j++;
52        }
53    }
54
55    return minDistance;
56}
57

Time and Space Complexity

Time Complexity:

Constructor __init__: O(n) where n is the length of wordsDict. We iterate through the entire list once to build the dictionary, and each append operation is O(1).
Method shortest: O(p + q) where p is the length of the position list for word1 and q is the length of the position list for word2. The two-pointer approach ensures we visit each element in both lists at most once. In the worst case where one word appears m times and another appears k times, the complexity would be O(m + k).

Space Complexity:

Constructor __init__: O(n) where n is the length of wordsDict. We store all indices in the dictionary. In the worst case, if all words are unique, we store n entries with one index each. If all words are the same, we store one entry with n indices. Either way, the total space is O(n).
Method shortest: O(1) as we only use a constant amount of extra space for variables a, b, ans, i, and j. Note that a and b are references to existing lists in the dictionary, not new copies.
Overall space complexity for the class: O(n) to maintain the dictionary throughout the object's lifetime.

Common Pitfalls

1. Not Handling Edge Cases with Single Word Occurrences

A common mistake is assuming both words will have multiple occurrences. When either word appears only once, the algorithm still works correctly, but developers might overthink and add unnecessary special case handling that could introduce bugs.

Problematic approach:

def shortest(self, word1: str, word2: str) -> int:
    indices1 = self.word_indices[word1]
    indices2 = self.word_indices[word2]

    # Unnecessary special case handling
    if len(indices1) == 1 and len(indices2) == 1:
        return abs(indices1[0] - indices2[0])

    # Rest of the two-pointer logic...

Why it's a pitfall: The two-pointer algorithm already handles single occurrences correctly without special cases, making this extra code redundant and potentially error-prone.

2. Incorrect Pointer Movement Logic

A critical mistake is moving both pointers simultaneously or using the wrong comparison for deciding which pointer to advance.

Incorrect implementation:

while pointer1 < len(indices1) and pointer2 < len(indices2):
    current_distance = abs(indices1[pointer1] - indices2[pointer2])
    min_distance = min(min_distance, current_distance)

    # WRONG: Moving both pointers
    pointer1 += 1
    pointer2 += 1

Why it fails: This would miss many valid pairs. For example, with indices1=[1,5] and indices2=[3,7], it would only check pairs (1,3) and (5,7), missing the optimal pair (5,3) with distance 2.

Correct approach: Always move only the pointer pointing to the smaller index to ensure all potentially optimal pairs are examined.

3. Using Regular Dictionary Instead of defaultdict

Problematic initialization:

def __init__(self, wordsDict: List[str]):
    self.word_indices = {}  # Regular dict

    for index, word in enumerate(wordsDict):
        # Need to check if key exists first
        if word not in self.word_indices:
            self.word_indices[word] = []
        self.word_indices[word].append(index)

Why it's less optimal: While functionally correct, this requires an extra existence check for every word, making the code more verbose and slightly less efficient.

4. Not Considering Words That Don't Exist in the Dictionary

Missing validation:

def shortest(self, word1: str, word2: str) -> int:
    # Directly accessing without checking existence
    indices1 = self.word_indices[word1]  # Could be empty list with defaultdict
    indices2 = self.word_indices[word2]  # or KeyError with regular dict

Solution: Add validation if the problem requires handling non-existent words:

def shortest(self, word1: str, word2: str) -> int:
    if word1 not in self.word_indices or word2 not in self.word_indices:
        return -1  # or raise an exception based on requirements

    indices1 = self.word_indices[word1]
    indices2 = self.word_indices[word2]
    # Continue with two-pointer logic...

5. Confusion About When to Update Minimum Distance

Some developers might think they should only update the minimum when indices are "close enough" or add unnecessary conditions.

Overcomplicated logic:

while pointer1 < len(indices1) and pointer2 < len(indices2):
    current_distance = abs(indices1[pointer1] - indices2[pointer2])

    # Unnecessary condition - we should always check for minimum
    if current_distance < min_distance:
        min_distance = current_distance
        # WRONG: Early termination
        if min_distance == 1:
            return 1  # Missing other potential distance-1 pairs

Why it's problematic: While early termination when distance equals 1 might seem like an optimization, it's unnecessary complexity that doesn't significantly improve performance and could lead to maintenance issues.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

You are given an array of intervals where intervals[i] = [start_i, end_i] represent the start and end of the ith interval. You need to merge all overlapping intervals and return an array of the non-overlapping intervals that cover all the intervals in the input.

Sort the intervals by their start time, then iterate through the sorted intervals, merging any overlapping intervals.

Brute force through all interval combinations and find the answer with the least total intervals.

Iterate through the intervals in their original order, merging any overlapping intervals.

Use a priority queue to keep track of the intervals, merging any overlapping intervals.

I don't know