2887. Fill Missing Data
Problem Description
You are given a DataFrame called products
with three columns: name
(object type), quantity
(integer), and price
(integer). The quantity
column contains some missing values represented as None
or NaN
.
Your task is to replace all missing values in the quantity
column with the value 0
.
For example, if you have products like "Wristwatch" and "WirelessEarbuds" with None
values in their quantity column, these should be replaced with 0
. Products that already have valid quantity values (like "GolfClubs" with 779 and "Printer" with 849) should remain unchanged.
The solution uses pandas' fillna()
method to replace all NaN
or None
values in the quantity
column with 0
. This is accomplished with the simple operation products['quantity'].fillna(0)
, which identifies all missing values in the specified column and substitutes them with the provided value.
The function should return the modified DataFrame with all missing quantities filled in as 0
.
Intuition
When dealing with missing data in a DataFrame, we need to decide how to handle these gaps. In this case, the missing quantities should be treated as zero items in stock rather than unknown values. This makes business sense - if we don't have quantity information for a product, it's reasonable to assume we have none available.
The key insight is recognizing that pandas provides built-in methods specifically designed for handling missing data. Instead of manually iterating through each row to check for None
or NaN
values, we can leverage the fillna()
method which efficiently handles this operation in a vectorized manner.
Why use fillna(0)
specifically? The method takes a value that will replace all missing entries in the selected column. Since we want to replace missing quantities with 0
, we simply pass 0
as the argument. This approach is both concise and performant - it processes the entire column at once rather than row by row.
The solution pattern here is straightforward: identify the column with missing values (quantity
), apply the appropriate pandas method (fillna()
), and specify the replacement value (0
). This transforms the operation from a potentially complex loop with conditional checks into a single, readable line of code that clearly expresses our intent: "fill missing values with zero."
Solution Approach
The implementation is straightforward and consists of a single operation on the DataFrame:
-
Access the target column: We first access the
quantity
column from the products DataFrame using bracket notation:products['quantity']
. -
Apply the fillna() method: On this column, we call the
fillna()
method with the argument0
. This method scans through all values in the column and replaces anyNone
orNaN
values with the specified replacement value. -
Reassign the column: The result of
fillna(0)
returns a new Series with the missing values replaced. We assign this back toproducts['quantity']
to update the original DataFrame in place. -
Return the modified DataFrame: Finally, we return the entire
products
DataFrame, which now has all missing quantity values replaced with0
.
The complete operation is achieved in a single line:
products['quantity'] = products['quantity'].fillna(0)
This approach modifies the DataFrame in place, meaning the original products
DataFrame is directly updated rather than creating a new copy. The fillna()
method handles all the complexity of identifying missing values (whether they are None
, NaN
, or other null representations in pandas) and replacing them efficiently.
The time complexity is O(n) where n is the number of rows in the DataFrame, as the method needs to examine each value once. The space complexity is O(1) for the in-place modification, though internally pandas may create temporary structures during the operation.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate how the solution works.
Suppose we have the following DataFrame products
:
name | quantity | price |
---|---|---|
Laptop | 15 | 999 |
Mouse | None | 25 |
Keyboard | 8 | 75 |
Monitor | NaN | 350 |
Webcam | 0 | 120 |
Step 1: Identify the target column
We need to work with the quantity
column, which we access using products['quantity']
. This gives us the series:
15, None, 8, NaN, 0
Step 2: Apply fillna(0)
When we call fillna(0)
on this series, the method scans through each value:
15
→ remains15
(not a missing value)None
→ replaced with0
8
→ remains8
(not a missing value)NaN
→ replaced with0
0
→ remains0
(already zero, not missing)
This produces a new series:
15, 0, 8, 0, 0
Step 3: Update the DataFrame
We assign this transformed series back to products['quantity']
, updating the DataFrame in place.
Final Result:
name | quantity | price |
---|---|---|
Laptop | 15 | 999 |
Mouse | 0 | 25 |
Keyboard | 8 | 75 |
Monitor | 0 | 350 |
Webcam | 0 | 120 |
Notice that:
- The Mouse's
None
value became0
- The Monitor's
NaN
value became0
- All other values remained unchanged
- The Webcam's original
0
stayed as0
(it wasn't missing, just zero)
The entire operation is accomplished with the single line:
products['quantity'] = products['quantity'].fillna(0)
This efficiently handles all missing values regardless of whether they're represented as None
or NaN
, making the data consistent and ready for further analysis.
Solution Implementation
1import pandas as pd
2
3
4def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
5 """
6 Fill missing values in the quantity column with 0.
7
8 Args:
9 products: DataFrame containing product information with potential missing values
10
11 Returns:
12 DataFrame with missing quantity values filled with 0
13 """
14 # Replace all NaN values in the 'quantity' column with 0
15 products['quantity'] = products['quantity'].fillna(0)
16
17 # Return the modified DataFrame
18 return products
19
1import java.util.List;
2import java.util.ArrayList;
3import java.util.Map;
4import java.util.HashMap;
5
6public class Solution {
7 /**
8 * Fill missing values in the quantity column with 0.
9 *
10 * @param products List of Maps representing product information with potential missing values
11 * @return List of Maps with missing quantity values filled with 0
12 */
13 public List<Map<String, Object>> fillMissingValues(List<Map<String, Object>> products) {
14 // Create a new list to store the modified products
15 List<Map<String, Object>> result = new ArrayList<>();
16
17 // Iterate through each product in the input list
18 for (Map<String, Object> product : products) {
19 // Create a copy of the current product map
20 Map<String, Object> modifiedProduct = new HashMap<>(product);
21
22 // Check if quantity field is null or missing
23 if (!modifiedProduct.containsKey("quantity") || modifiedProduct.get("quantity") == null) {
24 // Replace null or missing quantity with 0
25 modifiedProduct.put("quantity", 0);
26 }
27
28 // Add the modified product to the result list
29 result.add(modifiedProduct);
30 }
31
32 // Return the modified list of products
33 return result;
34 }
35}
36
1#include <vector>
2#include <string>
3#include <algorithm>
4#include <cmath>
5
6class Solution {
7public:
8 /**
9 * Fill missing values in the quantity column with 0.
10 *
11 * @param products: 2D vector representing product information with potential missing values
12 * Each row represents a product, columns represent different attributes
13 * Assuming the quantity column is at a fixed index
14 * @return: 2D vector with missing quantity values filled with 0
15 */
16 std::vector<std::vector<double>> fillMissingValues(std::vector<std::vector<double>>& products) {
17 // Assuming quantity is in a specific column (e.g., column index 1)
18 const int QUANTITY_COLUMN_INDEX = 1;
19
20 // Iterate through each row in the products vector
21 for (size_t i = 0; i < products.size(); ++i) {
22 // Check if the current row has enough columns
23 if (products[i].size() > QUANTITY_COLUMN_INDEX) {
24 // Check if the quantity value is NaN (Not a Number)
25 if (std::isnan(products[i][QUANTITY_COLUMN_INDEX])) {
26 // Replace NaN with 0
27 products[i][QUANTITY_COLUMN_INDEX] = 0.0;
28 }
29 }
30 }
31
32 // Return the modified 2D vector
33 return products;
34 }
35};
36
1// Import necessary types for DataFrame operations
2// Note: In TypeScript, we'd typically use a library like danfojs or similar for DataFrame operations
3
4/**
5 * Fill missing values in the quantity column with 0.
6 *
7 * @param products - DataFrame containing product information with potential missing values
8 * @returns DataFrame with missing quantity values filled with 0
9 */
10function fillMissingValues(products: DataFrame): DataFrame {
11 // Replace all NaN values in the 'quantity' column with 0
12 // Using fillna method to replace missing values
13 products.loc({ columns: ['quantity'] }).fillNa(0, { inplace: true });
14
15 // Return the modified DataFrame
16 return products;
17}
18```
19
20Note: TypeScript doesn't have a native pandas equivalent. The above code assumes you're using a DataFrame library like DanfoJS or similar that provides DataFrame functionality in JavaScript/TypeScript. The exact syntax might vary depending on the specific library you're using. If you're using DanfoJS specifically, the syntax would be slightly different:
21
22```typescript
23/**
24 * Fill missing values in the quantity column with 0.
25 *
26 * @param products - DataFrame containing product information with potential missing values
27 * @returns DataFrame with missing quantity values filled with 0
28 */
29function fillMissingValues(products: any): any {
30 // Replace all NaN values in the 'quantity' column with 0
31 const filledProducts = products.fillNa(0, { columns: ['quantity'] });
32
33 // Return the modified DataFrame
34 return filledProducts;
35}
36
Time and Space Complexity
Time Complexity: O(n)
where n
is the number of rows in the DataFrame. The fillna()
method needs to iterate through all rows in the 'quantity' column to identify and replace NaN values with 0.
Space Complexity: O(1)
for the additional space used by the algorithm itself. The fillna()
operation modifies the DataFrame in-place when called on a specific column, so no additional copy of the data is created. The only extra space used is for temporary variables and the return reference, which is constant regardless of input size.
Note: The total space occupied by the DataFrame is O(n × m)
where n
is the number of rows and m
is the number of columns, but this is the input space, not additional space created by the algorithm.
Common Pitfalls
1. Data Type Inconsistency After Filling
One common pitfall is that fillna(0)
might not preserve the integer data type if the column contains NaN
values. In pandas, columns with NaN
values are automatically converted to float type because NaN
is a float value. After filling with 0
, the column might remain as float type (e.g., 0.0
instead of 0
).
Solution: Convert the column back to integer type after filling:
products['quantity'] = products['quantity'].fillna(0).astype(int)
Or use the newer nullable integer type:
products['quantity'] = products['quantity'].fillna(0).astype('Int64')
2. Not Creating a Copy When Required
The current implementation modifies the DataFrame in place. If the original DataFrame needs to be preserved for other operations or comparisons, this direct modification could cause issues.
Solution: Create a copy of the DataFrame before modification:
def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
products_copy = products.copy()
products_copy['quantity'] = products_copy['quantity'].fillna(0)
return products_copy
3. Assuming Column Exists
The code assumes the 'quantity' column exists in the DataFrame. If it doesn't, a KeyError
will be raised.
Solution: Add a check for column existence:
def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
if 'quantity' in products.columns:
products['quantity'] = products['quantity'].fillna(0)
return products
4. Not Handling Different Types of Missing Values
While fillna()
handles NaN
and None
, it doesn't handle empty strings, whitespace, or other representations of missing data that might exist in real-world datasets.
Solution: Clean the data more thoroughly:
def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
# Replace empty strings and whitespace with NaN first
products['quantity'] = products['quantity'].replace(['', ' ', 'null'], pd.NA)
# Then fill NaN values with 0
products['quantity'] = products['quantity'].fillna(0)
return products
5. Chain Assignment Warning
In some pandas configurations, the direct assignment might trigger a SettingWithCopyWarning if the DataFrame is a view of another DataFrame.
Solution:
Use the loc
accessor for explicit assignment:
products.loc[:, 'quantity'] = products['quantity'].fillna(0)
Which of the following problems can be solved with backtracking (select multiple)
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!