1570. Dot Product of Two Sparse Vectors π
Problem Description
You need to implement a class called SparseVector
that efficiently handles sparse vectors (vectors with mostly zero values) and computes their dot product.
The class should support:
- Constructor
SparseVector(nums)
: Takes a list of integersnums
and initializes the sparse vector - Method
dotProduct(vec)
: Computes the dot product between the current sparse vector instance and another sparse vectorvec
A sparse vector is a vector containing mostly zeros. Instead of storing all elements including zeros, you should store it efficiently by only keeping track of non-zero values.
The dot product of two vectors is calculated by multiplying corresponding elements and summing the results. For example, if vector A = [1, 0, 3] and vector B = [2, 0, 4], their dot product would be 1*2 + 0*0 + 3*4 = 14
.
The solution uses a hash map (dictionary) to store only non-zero elements, where:
- Keys represent the indices of non-zero elements
- Values represent the actual non-zero values at those indices
When computing the dot product:
- The code identifies which vector has fewer non-zero elements for optimization
- It iterates through the smaller hash map
- For each element, it checks if the same index exists in the other vector's hash map
- If both vectors have non-zero values at the same index, their product is added to the result
Follow-up consideration: The problem asks what happens if only one vector is sparse. The current implementation handles this efficiently by always iterating through the smaller hash map, minimizing the number of lookups needed.
Intuition
When dealing with sparse vectors that contain mostly zeros, storing all elements wastes memory. Consider a vector with 10,000 elements where only 10 are non-zero - storing all 10,000 values is inefficient.
The key insight is that when computing a dot product, zeros don't contribute to the final sum. If either element in a pair is zero, their product is zero. Therefore, we only need to care about positions where both vectors have non-zero values.
This leads us to think about storing only the non-zero elements. A hash map is perfect for this because:
- We can store only index-value pairs where the value is non-zero
- Looking up whether a specific index exists is
O(1)
on average - The space complexity becomes proportional to the number of non-zero elements rather than the total vector length
For the dot product calculation, we need to find indices where both vectors have non-zero values. Instead of checking all indices, we can iterate through one hash map and check if each index exists in the other.
The optimization of iterating through the smaller hash map comes from minimizing the number of lookups. If vector A has 5 non-zero elements and vector B has 100 non-zero elements, it's more efficient to iterate through A's 5 elements and look them up in B, rather than iterating through B's 100 elements and looking them up in A. This reduces the operation from 100 lookups to just 5 lookups.
The get(i, 0)
method elegantly handles the case where an index doesn't exist in the other vector - it returns 0, which correctly contributes nothing to the dot product sum.
Learn more about Two Pointers patterns.
Solution Approach
The implementation uses a hash map to efficiently store and compute operations on sparse vectors.
Data Structure Choice:
We use a dictionary d
to store the sparse vector, where:
- Keys are the indices of non-zero elements
- Values are the corresponding non-zero values
Initialization (__init__
method):
self.d = {i: v for i, v in enumerate(nums) if v}
This dictionary comprehension iterates through the input array nums
with enumerate()
to get both index and value. The condition if v
filters out zero values (since 0 evaluates to False
in Python). Only non-zero elements are stored as key-value pairs.
Dot Product Calculation (dotProduct
method):
-
Extract the hash maps:
a, b = self.d, vec.d
We get references to both vectors' hash maps for easier manipulation.
-
Optimization - Choose the smaller hash map:
if len(b) < len(a): a, b = b, a
This swapping ensures we always iterate through the hash map with fewer elements, minimizing the number of lookup operations.
-
Compute the dot product:
return sum(v * b.get(i, 0) for i, v in a.items())
- We iterate through each
(index, value)
pair in the smaller hash mapa
- For each index
i
with valuev
ina
, we look up the corresponding value in hash mapb
usingb.get(i, 0)
- The
get()
method returns the value if the key exists, otherwise returns 0 - We multiply the values:
v * b.get(i, 0)
- The
sum()
function adds up all these products
- We iterate through each
Time Complexity: O(min(n, m))
where n
and m
are the number of non-zero elements in the two vectors. We iterate through the smaller hash map once.
Space Complexity: O(n + m)
for storing the non-zero elements of both vectors in their respective hash maps.
This approach efficiently handles the follow-up question about one sparse and one dense vector - the algorithm naturally adapts by storing all non-zero elements of the dense vector while still only computing products for indices that exist in both hash maps.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with two sparse vectors:
- Vector A:
[1, 0, 0, 2, 3]
- Vector B:
[0, 3, 0, 4, 0]
Step 1: Initialize SparseVector A
vecA = SparseVector([1, 0, 0, 2, 3])
The constructor creates a hash map storing only non-zero elements:
vecA.d = {0: 1, 3: 2, 4: 3}
- Index 0 has value 1, index 3 has value 2, index 4 has value 3
- Indices 1 and 2 are not stored since they have value 0
Step 2: Initialize SparseVector B
vecB = SparseVector([0, 3, 0, 4, 0])
Similarly, we get:
vecB.d = {1: 3, 3: 4}
- Index 1 has value 3, index 3 has value 4
- Indices 0, 2, and 4 are not stored
Step 3: Compute Dot Product
result = vecA.dotProduct(vecB)
Now let's trace through the dotProduct
method:
-
Extract hash maps:
a = {0: 1, 3: 2, 4: 3}
(3 elements)b = {1: 3, 3: 4}
(2 elements)
-
Optimize by swapping: Since
len(b) < len(a)
(2 < 3), we swap:a = {1: 3, 3: 4}
(now the smaller one)b = {0: 1, 3: 2, 4: 3}
(now the larger one)
-
Calculate dot product: Iterate through
a
:- Index 1, value 3: Look up index 1 in
b
βb.get(1, 0) = 0
(not found)- Product:
3 * 0 = 0
- Product:
- Index 3, value 4: Look up index 3 in
b
βb.get(3, 0) = 2
(found!)- Product:
4 * 2 = 8
- Product:
Sum:
0 + 8 = 8
- Index 1, value 3: Look up index 1 in
Verification with traditional dot product:
[1, 0, 0, 2, 3] β’ [0, 3, 0, 4, 0] = 1*0 + 0*3 + 0*0 + 2*4 + 3*0 = 0 + 0 + 0 + 8 + 0 = 8
β
The solution correctly computes the dot product while only performing 2 lookups (iterating through the smaller hash map) instead of checking all 5 positions.
Solution Implementation
1from typing import List, Dict
2
3class SparseVector:
4 def __init__(self, nums: List[int]) -> None:
5 """
6 Initialize a sparse vector by storing only non-zero elements.
7
8 Args:
9 nums: List of integers representing the vector
10 """
11 # Store non-zero values with their indices as a dictionary
12 # This saves space for vectors with many zero elements
13 self.non_zero_elements: Dict[int, int] = {
14 index: value
15 for index, value in enumerate(nums)
16 if value != 0
17 }
18
19 def dotProduct(self, vec: "SparseVector") -> int:
20 """
21 Calculate the dot product of two sparse vectors.
22
23 Args:
24 vec: Another SparseVector instance
25
26 Returns:
27 The dot product of the two vectors
28 """
29 # Get references to both dictionaries for easier access
30 dict_a = self.non_zero_elements
31 dict_b = vec.non_zero_elements
32
33 # Optimize by iterating through the smaller dictionary
34 # This reduces the number of lookups needed
35 if len(dict_b) < len(dict_a):
36 dict_a, dict_b = dict_b, dict_a
37
38 # Calculate dot product by iterating through smaller dictionary
39 # For each index in dict_a, multiply its value by the corresponding
40 # value in dict_b (or 0 if the index doesn't exist in dict_b)
41 dot_product = sum(
42 value * dict_b.get(index, 0)
43 for index, value in dict_a.items()
44 )
45
46 return dot_product
47
48
49# Your SparseVector object will be instantiated and called as such:
50# v1 = SparseVector(nums1)
51# v2 = SparseVector(nums2)
52# ans = v1.dotProduct(v2)
53
1class SparseVector {
2 // HashMap to store non-zero elements: key = index, value = element value
3 private Map<Integer, Integer> indexToValueMap;
4
5 /**
6 * Constructor to create a sparse vector from an array
7 * Only stores non-zero elements to save space
8 * @param nums the input array
9 */
10 SparseVector(int[] nums) {
11 // Initialize HashMap with initial capacity for better performance
12 this.indexToValueMap = new HashMap<>(128);
13
14 // Iterate through the array and store only non-zero elements
15 for (int i = 0; i < nums.length; i++) {
16 if (nums[i] != 0) {
17 indexToValueMap.put(i, nums[i]);
18 }
19 }
20 }
21
22 /**
23 * Calculate the dot product of two sparse vectors
24 * Optimized by iterating through the smaller map
25 * @param vec the other sparse vector
26 * @return the dot product result
27 */
28 public int dotProduct(SparseVector vec) {
29 // Get references to both maps for easier manipulation
30 Map<Integer, Integer> mapA = this.indexToValueMap;
31 Map<Integer, Integer> mapB = vec.indexToValueMap;
32
33 // Optimization: iterate through the smaller map to reduce iterations
34 if (mapB.size() < mapA.size()) {
35 // Swap references if mapB is smaller
36 Map<Integer, Integer> temp = mapA;
37 mapA = mapB;
38 mapB = temp;
39 }
40
41 // Calculate dot product by iterating through the smaller map
42 int result = 0;
43 for (Map.Entry<Integer, Integer> entry : mapA.entrySet()) {
44 int index = entry.getKey();
45 int value = entry.getValue();
46 // Multiply corresponding values if they exist in both vectors
47 // Use getOrDefault to handle missing indices (treated as 0)
48 result += value * mapB.getOrDefault(index, 0);
49 }
50
51 return result;
52 }
53}
54
55// Your SparseVector object will be instantiated and called as such:
56// SparseVector v1 = new SparseVector(nums1);
57// SparseVector v2 = new SparseVector(nums2);
58// int ans = v1.dotProduct(v2);
59
1class SparseVector {
2public:
3 // Store non-zero elements as index-value pairs
4 unordered_map<int, int> indexToValue;
5
6 SparseVector(vector<int>& nums) {
7 // Only store non-zero elements to save space
8 for (int i = 0; i < nums.size(); ++i) {
9 if (nums[i] != 0) {
10 indexToValue[i] = nums[i];
11 }
12 }
13 }
14
15 // Return the dotProduct of two sparse vectors
16 int dotProduct(SparseVector& vec) {
17 // Create references to both sparse vectors' maps
18 auto& mapA = indexToValue;
19 auto& mapB = vec.indexToValue;
20
21 // Optimize by iterating through the smaller map
22 // This reduces the number of lookups needed
23 if (mapA.size() > mapB.size()) {
24 return vec.dotProduct(*this);
25 }
26
27 int result = 0;
28
29 // Iterate through the smaller map and check if indices exist in the larger map
30 for (const auto& [index, value] : mapA) {
31 // If the same index exists in both vectors, multiply and add to result
32 if (mapB.count(index)) {
33 result += value * mapB.at(index);
34 }
35 }
36
37 return result;
38 }
39};
40
41// Your SparseVector object will be instantiated and called as such:
42// SparseVector v1(nums1);
43// SparseVector v2(nums2);
44// int ans = v1.dotProduct(v2);
45
1// Store sparse vector as a map of index to non-zero values
2let sparseVectorData: Map<number, number>;
3
4/**
5 * Initialize the sparse vector from an array of numbers
6 * Only store non-zero values to save space
7 * @param nums - Input array of numbers
8 */
9function initializeSparseVector(nums: number[]): void {
10 sparseVectorData = new Map<number, number>();
11
12 // Iterate through the array and store only non-zero values
13 for (let index = 0; index < nums.length; index++) {
14 if (nums[index] !== 0) {
15 sparseVectorData.set(index, nums[index]);
16 }
17 }
18}
19
20/**
21 * Calculate the dot product of two sparse vectors
22 * Optimized by iterating through the smaller map
23 * @param otherVector - The other sparse vector's data map
24 * @returns The dot product result
25 */
26function dotProduct(otherVector: Map<number, number>): number {
27 // Choose the smaller map to iterate for better performance
28 let smallerMap: Map<number, number> = sparseVectorData;
29 let largerMap: Map<number, number> = otherVector;
30
31 if (smallerMap.size > largerMap.size) {
32 // Swap if current vector's map is larger
33 [smallerMap, largerMap] = [largerMap, smallerMap];
34 }
35
36 let result: number = 0;
37
38 // Iterate through smaller map and multiply matching indices
39 for (const [index, value] of smallerMap) {
40 if (largerMap.has(index)) {
41 result += value * largerMap.get(index)!;
42 }
43 }
44
45 return result;
46}
47
Time and Space Complexity
Time Complexity: O(n)
for initialization and O(min(m, k))
for dot product operation, where n
is the length of the input array, m
and k
are the number of non-zero elements in the two sparse vectors respectively.
-
Initialization (
__init__
): The dictionary comprehension iterates through alln
elements of the input array to identify and store non-zero values, resulting inO(n)
time complexity. -
Dot Product (
dotProduct
): The method first compares the sizes of the two dictionaries and swaps them if needed (O(1)
). Then it iterates through the smaller dictionary, performing a lookup in the larger dictionary for each element. Since dictionary lookup isO(1)
on average, and we iterate throughmin(m, k)
elements, the time complexity isO(min(m, k))
.
Space Complexity: O(m)
where m
is the number of non-zero elements in the sparse vector.
-
The dictionary
self.d
stores only the non-zero elements and their indices. In the worst case where all elements are non-zero, this would beO(n)
, but for truly sparse vectors wherem << n
, the space complexity isO(m)
. -
The
dotProduct
method usesO(1)
additional space as it only creates references to existing dictionaries and computes the sum on the fly without storing intermediate results.
Note: The reference answer states O(n)
for both complexities, which represents the worst-case scenario where all elements in the vector are non-zero. For sparse vectors with few non-zero elements, the actual complexity can be much better than O(n)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Not Handling Zero Values Correctly in Dictionary Comprehension
Pitfall: Using if value
instead of if value != 0
in the dictionary comprehension can cause issues with negative numbers or when explicitly checking for non-zero values.
# Problematic code:
self.non_zero_elements = {i: v for i, v in enumerate(nums) if v}
While this works in most cases since 0 evaluates to False
, it's less explicit and could be confusing when reading the code. Additionally, if the code is later modified to handle other data types or conditions, this implicit boolean conversion might cause unexpected behavior.
Solution: Always use explicit comparison for clarity:
self.non_zero_elements = {i: v for i, v in enumerate(nums) if v != 0}
2. Modifying Original Dictionaries During Swap
Pitfall: Accidentally modifying the instance's dictionary reference when swapping:
# Dangerous if you later modify dict_a or dict_b thinking they're temporary
dict_a = self.non_zero_elements
dict_b = vec.non_zero_elements
if len(dict_b) < len(dict_a):
dict_a, dict_b = dict_b, dict_a
# If you modify dict_a here, you might modify the original dictionary
Solution: The current implementation is safe because we only read from the dictionaries. However, if you need to modify them, create copies first:
dict_a = self.non_zero_elements.copy() dict_b = vec.non_zero_elements.copy()
3. Integer Overflow in Large Dot Products
Pitfall: When dealing with very large vectors or large values, the dot product calculation might overflow in languages with fixed integer sizes. While Python handles arbitrary precision integers automatically, this could be an issue when porting to other languages.
# Could overflow in other languages:
dot_product = sum(value * dict_b.get(index, 0) for index, value in dict_a.items())
Solution: For other languages or when expecting very large numbers, consider:
- Using appropriate data types (e.g.,
long long
in C++) - Checking for overflow conditions
- Using modular arithmetic if required by the problem
4. Assuming Input Vector is Always Valid
Pitfall: Not validating that the input vector has the same dimensions as the current vector:
# Current code doesn't verify vectors have the same length
def dotProduct(self, vec: "SparseVector") -> int:
# Proceeds directly to calculation
Solution: Add dimension validation if needed:
def __init__(self, nums: List[int]) -> None:
self.dimension = len(nums) # Store original dimension
self.non_zero_elements = {i: v for i, v in enumerate(nums) if v != 0}
def dotProduct(self, vec: "SparseVector") -> int:
if self.dimension != vec.dimension:
raise ValueError("Vectors must have the same dimension")
# Continue with calculation
5. Memory Inefficiency with Dense Vectors
Pitfall: Using this sparse vector implementation for dense vectors (vectors with mostly non-zero values) actually uses more memory than a regular array due to dictionary overhead.
Solution: Consider a hybrid approach or threshold:
def __init__(self, nums: List[int]) -> None:
non_zero_count = sum(1 for v in nums if v != 0)
sparsity_ratio = non_zero_count / len(nums) if nums else 0
# Use sparse representation only if less than 50% non-zero
if sparsity_ratio < 0.5:
self.is_sparse = True
self.non_zero_elements = {i: v for i, v in enumerate(nums) if v != 0}
else:
self.is_sparse = False
self.dense_array = nums
How would you design a stack which has a function min
that returns the minimum element in the stack, in addition to push
and pop
? All push
, pop
, min
should have running time O(1)
.
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!