1570. Dot Product of Two Sparse Vectors

Problem Description

You need to implement a class called SparseVector that efficiently handles sparse vectors (vectors with mostly zero values) and computes their dot product.

The class should support:

Constructor SparseVector(nums): Takes a list of integers nums and initializes the sparse vector
Method dotProduct(vec): Computes the dot product between the current sparse vector instance and another sparse vector vec

A sparse vector is a vector containing mostly zeros. Instead of storing all elements including zeros, you should store it efficiently by only keeping track of non-zero values.

The dot product of two vectors is calculated by multiplying corresponding elements and summing the results. For example, if vector A = [1, 0, 3] and vector B = [2, 0, 4], their dot product would be 1*2 + 0*0 + 3*4 = 14.

The solution uses a hash map (dictionary) to store only non-zero elements, where:

Keys represent the indices of non-zero elements
Values represent the actual non-zero values at those indices

When computing the dot product:

The code identifies which vector has fewer non-zero elements for optimization
It iterates through the smaller hash map
For each element, it checks if the same index exists in the other vector's hash map
If both vectors have non-zero values at the same index, their product is added to the result

Follow-up consideration: The problem asks what happens if only one vector is sparse. The current implementation handles this efficiently by always iterating through the smaller hash map, minimizing the number of lookups needed.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

When dealing with sparse vectors that contain mostly zeros, storing all elements wastes memory. Consider a vector with 10,000 elements where only 10 are non-zero - storing all 10,000 values is inefficient.

The key insight is that when computing a dot product, zeros don't contribute to the final sum. If either element in a pair is zero, their product is zero. Therefore, we only need to care about positions where both vectors have non-zero values.

This leads us to think about storing only the non-zero elements. A hash map is perfect for this because:

We can store only index-value pairs where the value is non-zero
Looking up whether a specific index exists is O(1) on average
The space complexity becomes proportional to the number of non-zero elements rather than the total vector length

For the dot product calculation, we need to find indices where both vectors have non-zero values. Instead of checking all indices, we can iterate through one hash map and check if each index exists in the other.

The optimization of iterating through the smaller hash map comes from minimizing the number of lookups. If vector A has 5 non-zero elements and vector B has 100 non-zero elements, it's more efficient to iterate through A's 5 elements and look them up in B, rather than iterating through B's 100 elements and looking them up in A. This reduces the operation from 100 lookups to just 5 lookups.

The get(i, 0) method elegantly handles the case where an index doesn't exist in the other vector - it returns 0, which correctly contributes nothing to the dot product sum.

Learn more about Two Pointers patterns.

Solution Approach

The implementation uses a hash map to efficiently store and compute operations on sparse vectors.

Data Structure Choice: We use a dictionary d to store the sparse vector, where:

Keys are the indices of non-zero elements
Values are the corresponding non-zero values

Initialization (__init__ method):

self.d = {i: v for i, v in enumerate(nums) if v}

This dictionary comprehension iterates through the input array nums with enumerate() to get both index and value. The condition if v filters out zero values (since 0 evaluates to False in Python). Only non-zero elements are stored as key-value pairs.

Dot Product Calculation (dotProduct method):

Extract the hash maps:
```
a, b = self.d, vec.d
```
We get references to both vectors' hash maps for easier manipulation.
Optimization - Choose the smaller hash map:
```
if len(b) < len(a):
    a, b = b, a
```
This swapping ensures we always iterate through the hash map with fewer elements, minimizing the number of lookup operations.
Compute the dot product:
```
return sum(v * b.get(i, 0) for i, v in a.items())
```
- We iterate through each (index, value) pair in the smaller hash map a
- For each index i with value v in a, we look up the corresponding value in hash map b using b.get(i, 0)
- The get() method returns the value if the key exists, otherwise returns 0
- We multiply the values: v * b.get(i, 0)
- The sum() function adds up all these products

Time Complexity: O(min(n, m)) where n and m are the number of non-zero elements in the two vectors. We iterate through the smaller hash map once.

Space Complexity: O(n + m) for storing the non-zero elements of both vectors in their respective hash maps.

This approach efficiently handles the follow-up question about one sparse and one dense vector - the algorithm naturally adapts by storing all non-zero elements of the dense vector while still only computing products for indices that exist in both hash maps.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with two sparse vectors:

Vector A: [1, 0, 0, 2, 3]
Vector B: [0, 3, 0, 4, 0]

Step 1: Initialize SparseVector A

vecA = SparseVector([1, 0, 0, 2, 3])

The constructor creates a hash map storing only non-zero elements:

vecA.d = {0: 1, 3: 2, 4: 3}
Index 0 has value 1, index 3 has value 2, index 4 has value 3
Indices 1 and 2 are not stored since they have value 0

Step 2: Initialize SparseVector B

vecB = SparseVector([0, 3, 0, 4, 0])

Similarly, we get:

vecB.d = {1: 3, 3: 4}
Index 1 has value 3, index 3 has value 4
Indices 0, 2, and 4 are not stored

Step 3: Compute Dot Product

result = vecA.dotProduct(vecB)

Now let's trace through the dotProduct method:

Extract hash maps:
- a = {0: 1, 3: 2, 4: 3} (3 elements)
- b = {1: 3, 3: 4} (2 elements)
Optimize by swapping: Since len(b) < len(a) (2 < 3), we swap:
- a = {1: 3, 3: 4} (now the smaller one)
- b = {0: 1, 3: 2, 4: 3} (now the larger one)
Calculate dot product: Iterate through a:
- Index 1, value 3: Look up index 1 in b → b.get(1, 0) = 0 (not found)
  - Product: 3 * 0 = 0
- Index 3, value 4: Look up index 3 in b → b.get(3, 0) = 2 (found!)
  - Product: 4 * 2 = 8
Sum: 0 + 8 = 8

Verification with traditional dot product: [1, 0, 0, 2, 3] • [0, 3, 0, 4, 0] = 1*0 + 0*3 + 0*0 + 2*4 + 3*0 = 0 + 0 + 0 + 8 + 0 = 8 ✓

The solution correctly computes the dot product while only performing 2 lookups (iterating through the smaller hash map) instead of checking all 5 positions.

Solution Implementation

1from typing import List, Dict
2
3class SparseVector:
4    def __init__(self, nums: List[int]) -> None:
5        """
6        Initialize a sparse vector by storing only non-zero elements.
7
8        Args:
9            nums: List of integers representing the vector
10        """
11        # Store non-zero values with their indices as a dictionary
12        # This saves space for vectors with many zero elements
13        self.non_zero_elements: Dict[int, int] = {
14            index: value
15            for index, value in enumerate(nums)
16            if value != 0
17        }
18
19    def dotProduct(self, vec: "SparseVector") -> int:
20        """
21        Calculate the dot product of two sparse vectors.
22
23        Args:
24            vec: Another SparseVector instance
25
26        Returns:
27            The dot product of the two vectors
28        """
29        # Get references to both dictionaries for easier access
30        dict_a = self.non_zero_elements
31        dict_b = vec.non_zero_elements
32
33        # Optimize by iterating through the smaller dictionary
34        # This reduces the number of lookups needed
35        if len(dict_b) < len(dict_a):
36            dict_a, dict_b = dict_b, dict_a
37
38        # Calculate dot product by iterating through smaller dictionary
39        # For each index in dict_a, multiply its value by the corresponding
40        # value in dict_b (or 0 if the index doesn't exist in dict_b)
41        dot_product = sum(
42            value * dict_b.get(index, 0)
43            for index, value in dict_a.items()
44        )
45
46        return dot_product
47
48
49# Your SparseVector object will be instantiated and called as such:
50# v1 = SparseVector(nums1)
51# v2 = SparseVector(nums2)
52# ans = v1.dotProduct(v2)
53

1class SparseVector {
2    // HashMap to store non-zero elements: key = index, value = element value
3    private Map<Integer, Integer> indexToValueMap;
4
5    /**
6     * Constructor to create a sparse vector from an array
7     * Only stores non-zero elements to save space
8     * @param nums the input array
9     */
10    SparseVector(int[] nums) {
11        // Initialize HashMap with initial capacity for better performance
12        this.indexToValueMap = new HashMap<>(128);
13
14        // Iterate through the array and store only non-zero elements
15        for (int i = 0; i < nums.length; i++) {
16            if (nums[i] != 0) {
17                indexToValueMap.put(i, nums[i]);
18            }
19        }
20    }
21
22    /**
23     * Calculate the dot product of two sparse vectors
24     * Optimized by iterating through the smaller map
25     * @param vec the other sparse vector
26     * @return the dot product result
27     */
28    public int dotProduct(SparseVector vec) {
29        // Get references to both maps for easier manipulation
30        Map<Integer, Integer> mapA = this.indexToValueMap;
31        Map<Integer, Integer> mapB = vec.indexToValueMap;
32
33        // Optimization: iterate through the smaller map to reduce iterations
34        if (mapB.size() < mapA.size()) {
35            // Swap references if mapB is smaller
36            Map<Integer, Integer> temp = mapA;
37            mapA = mapB;
38            mapB = temp;
39        }
40
41        // Calculate dot product by iterating through the smaller map
42        int result = 0;
43        for (Map.Entry<Integer, Integer> entry : mapA.entrySet()) {
44            int index = entry.getKey();
45            int value = entry.getValue();
46            // Multiply corresponding values if they exist in both vectors
47            // Use getOrDefault to handle missing indices (treated as 0)
48            result += value * mapB.getOrDefault(index, 0);
49        }
50
51        return result;
52    }
53}
54
55// Your SparseVector object will be instantiated and called as such:
56// SparseVector v1 = new SparseVector(nums1);
57// SparseVector v2 = new SparseVector(nums2);
58// int ans = v1.dotProduct(v2);
59

1class SparseVector {
2public:
3    // Store non-zero elements as index-value pairs
4    unordered_map<int, int> indexToValue;
5
6    SparseVector(vector<int>& nums) {
7        // Only store non-zero elements to save space
8        for (int i = 0; i < nums.size(); ++i) {
9            if (nums[i] != 0) {
10                indexToValue[i] = nums[i];
11            }
12        }
13    }
14
15    // Return the dotProduct of two sparse vectors
16    int dotProduct(SparseVector& vec) {
17        // Create references to both sparse vectors' maps
18        auto& mapA = indexToValue;
19        auto& mapB = vec.indexToValue;
20
21        // Optimize by iterating through the smaller map
22        // This reduces the number of lookups needed
23        if (mapA.size() > mapB.size()) {
24            return vec.dotProduct(*this);
25        }
26
27        int result = 0;
28
29        // Iterate through the smaller map and check if indices exist in the larger map
30        for (const auto& [index, value] : mapA) {
31            // If the same index exists in both vectors, multiply and add to result
32            if (mapB.count(index)) {
33                result += value * mapB.at(index);
34            }
35        }
36
37        return result;
38    }
39};
40
41// Your SparseVector object will be instantiated and called as such:
42// SparseVector v1(nums1);
43// SparseVector v2(nums2);
44// int ans = v1.dotProduct(v2);
45

1// Store sparse vector as a map of index to non-zero values
2let sparseVectorData: Map<number, number>;
3
4/**
5 * Initialize the sparse vector from an array of numbers
6 * Only store non-zero values to save space
7 * @param nums - Input array of numbers
8 */
9function initializeSparseVector(nums: number[]): void {
10    sparseVectorData = new Map<number, number>();
11
12    // Iterate through the array and store only non-zero values
13    for (let index = 0; index < nums.length; index++) {
14        if (nums[index] !== 0) {
15            sparseVectorData.set(index, nums[index]);
16        }
17    }
18}
19
20/**
21 * Calculate the dot product of two sparse vectors
22 * Optimized by iterating through the smaller map
23 * @param otherVector - The other sparse vector's data map
24 * @returns The dot product result
25 */
26function dotProduct(otherVector: Map<number, number>): number {
27    // Choose the smaller map to iterate for better performance
28    let smallerMap: Map<number, number> = sparseVectorData;
29    let largerMap: Map<number, number> = otherVector;
30
31    if (smallerMap.size > largerMap.size) {
32        // Swap if current vector's map is larger
33        [smallerMap, largerMap] = [largerMap, smallerMap];
34    }
35
36    let result: number = 0;
37
38    // Iterate through smaller map and multiply matching indices
39    for (const [index, value] of smallerMap) {
40        if (largerMap.has(index)) {
41            result += value * largerMap.get(index)!;
42        }
43    }
44
45    return result;
46}
47

Time and Space Complexity

Time Complexity: O(n) for initialization and O(min(m, k)) for dot product operation, where n is the length of the input array, m and k are the number of non-zero elements in the two sparse vectors respectively.

Initialization (__init__): The dictionary comprehension iterates through all n elements of the input array to identify and store non-zero values, resulting in O(n) time complexity.
Dot Product (dotProduct): The method first compares the sizes of the two dictionaries and swaps them if needed (O(1)). Then it iterates through the smaller dictionary, performing a lookup in the larger dictionary for each element. Since dictionary lookup is O(1) on average, and we iterate through min(m, k) elements, the time complexity is O(min(m, k)).

Space Complexity: O(m) where m is the number of non-zero elements in the sparse vector.

The dictionary self.d stores only the non-zero elements and their indices. In the worst case where all elements are non-zero, this would be O(n), but for truly sparse vectors where m << n, the space complexity is O(m).
The dotProduct method uses O(1) additional space as it only creates references to existing dictionaries and computes the sum on the fly without storing intermediate results.

Note: The reference answer states O(n) for both complexities, which represents the worst-case scenario where all elements in the vector are non-zero. For sparse vectors with few non-zero elements, the actual complexity can be much better than O(n).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Not Handling Zero Values Correctly in Dictionary Comprehension

Pitfall: Using if value instead of if value != 0 in the dictionary comprehension can cause issues with negative numbers or when explicitly checking for non-zero values.

# Problematic code:
self.non_zero_elements = {i: v for i, v in enumerate(nums) if v}

While this works in most cases since 0 evaluates to False, it's less explicit and could be confusing when reading the code. Additionally, if the code is later modified to handle other data types or conditions, this implicit boolean conversion might cause unexpected behavior.

Solution: Always use explicit comparison for clarity:

self.non_zero_elements = {i: v for i, v in enumerate(nums) if v != 0}

2. Modifying Original Dictionaries During Swap

Pitfall: Accidentally modifying the instance's dictionary reference when swapping:

# Dangerous if you later modify dict_a or dict_b thinking they're temporary
dict_a = self.non_zero_elements
dict_b = vec.non_zero_elements
if len(dict_b) < len(dict_a):
    dict_a, dict_b = dict_b, dict_a
# If you modify dict_a here, you might modify the original dictionary

Solution: The current implementation is safe because we only read from the dictionaries. However, if you need to modify them, create copies first:

dict_a = self.non_zero_elements.copy()
dict_b = vec.non_zero_elements.copy()

3. Integer Overflow in Large Dot Products

Pitfall: When dealing with very large vectors or large values, the dot product calculation might overflow in languages with fixed integer sizes. While Python handles arbitrary precision integers automatically, this could be an issue when porting to other languages.

# Could overflow in other languages:
dot_product = sum(value * dict_b.get(index, 0) for index, value in dict_a.items())

Solution: For other languages or when expecting very large numbers, consider:

Using appropriate data types (e.g., long long in C++)
Checking for overflow conditions
Using modular arithmetic if required by the problem

4. Assuming Input Vector is Always Valid

Pitfall: Not validating that the input vector has the same dimensions as the current vector:

# Current code doesn't verify vectors have the same length
def dotProduct(self, vec: "SparseVector") -> int:
    # Proceeds directly to calculation

Solution: Add dimension validation if needed:

def __init__(self, nums: List[int]) -> None:
    self.dimension = len(nums)  # Store original dimension
    self.non_zero_elements = {i: v for i, v in enumerate(nums) if v != 0}

def dotProduct(self, vec: "SparseVector") -> int:
    if self.dimension != vec.dimension:
        raise ValueError("Vectors must have the same dimension")
    # Continue with calculation

5. Memory Inefficiency with Dense Vectors

Pitfall: Using this sparse vector implementation for dense vectors (vectors with mostly non-zero values) actually uses more memory than a regular array due to dictionary overhead.

Solution: Consider a hybrid approach or threshold:

def __init__(self, nums: List[int]) -> None:
    non_zero_count = sum(1 for v in nums if v != 0)
    sparsity_ratio = non_zero_count / len(nums) if nums else 0

    # Use sparse representation only if less than 50% non-zero
    if sparsity_ratio < 0.5:
        self.is_sparse = True
        self.non_zero_elements = {i: v for i, v in enumerate(nums) if v != 0}
    else:
        self.is_sparse = False
        self.dense_array = nums

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan: