Facebook Pixel

2887. Fill Missing Data

Problem Description

You are given a DataFrame called products with three columns: name (object type), quantity (integer), and price (integer). The quantity column contains some missing values represented as None or NaN.

Your task is to replace all missing values in the quantity column with the value 0.

For example, if you have products like "Wristwatch" and "WirelessEarbuds" with None values in their quantity column, these should be replaced with 0. Products that already have valid quantity values (like "GolfClubs" with 779 and "Printer" with 849) should remain unchanged.

The solution uses pandas' fillna() method to replace all NaN or None values in the quantity column with 0. This is accomplished with the simple operation products['quantity'].fillna(0), which identifies all missing values in the specified column and substitutes them with the provided value.

The function should return the modified DataFrame with all missing quantities filled in as 0.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When dealing with missing data in a DataFrame, we need to decide how to handle these gaps. In this case, the missing quantities should be treated as zero items in stock rather than unknown values. This makes business sense - if we don't have quantity information for a product, it's reasonable to assume we have none available.

The key insight is recognizing that pandas provides built-in methods specifically designed for handling missing data. Instead of manually iterating through each row to check for None or NaN values, we can leverage the fillna() method which efficiently handles this operation in a vectorized manner.

Why use fillna(0) specifically? The method takes a value that will replace all missing entries in the selected column. Since we want to replace missing quantities with 0, we simply pass 0 as the argument. This approach is both concise and performant - it processes the entire column at once rather than row by row.

The solution pattern here is straightforward: identify the column with missing values (quantity), apply the appropriate pandas method (fillna()), and specify the replacement value (0). This transforms the operation from a potentially complex loop with conditional checks into a single, readable line of code that clearly expresses our intent: "fill missing values with zero."

Solution Approach

The implementation is straightforward and consists of a single operation on the DataFrame:

  1. Access the target column: We first access the quantity column from the products DataFrame using bracket notation: products['quantity'].

  2. Apply the fillna() method: On this column, we call the fillna() method with the argument 0. This method scans through all values in the column and replaces any None or NaN values with the specified replacement value.

  3. Reassign the column: The result of fillna(0) returns a new Series with the missing values replaced. We assign this back to products['quantity'] to update the original DataFrame in place.

  4. Return the modified DataFrame: Finally, we return the entire products DataFrame, which now has all missing quantity values replaced with 0.

The complete operation is achieved in a single line:

products['quantity'] = products['quantity'].fillna(0)

This approach modifies the DataFrame in place, meaning the original products DataFrame is directly updated rather than creating a new copy. The fillna() method handles all the complexity of identifying missing values (whether they are None, NaN, or other null representations in pandas) and replacing them efficiently.

The time complexity is O(n) where n is the number of rows in the DataFrame, as the method needs to examine each value once. The space complexity is O(1) for the in-place modification, though internally pandas may create temporary structures during the operation.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate how the solution works.

Suppose we have the following DataFrame products:

namequantityprice
Laptop15999
MouseNone25
Keyboard875
MonitorNaN350
Webcam0120

Step 1: Identify the target column We need to work with the quantity column, which we access using products['quantity']. This gives us the series:

15, None, 8, NaN, 0

Step 2: Apply fillna(0) When we call fillna(0) on this series, the method scans through each value:

  • 15 → remains 15 (not a missing value)
  • None → replaced with 0
  • 8 → remains 8 (not a missing value)
  • NaN → replaced with 0
  • 0 → remains 0 (already zero, not missing)

This produces a new series:

15, 0, 8, 0, 0

Step 3: Update the DataFrame We assign this transformed series back to products['quantity'], updating the DataFrame in place.

Final Result:

namequantityprice
Laptop15999
Mouse025
Keyboard875
Monitor0350
Webcam0120

Notice that:

  • The Mouse's None value became 0
  • The Monitor's NaN value became 0
  • All other values remained unchanged
  • The Webcam's original 0 stayed as 0 (it wasn't missing, just zero)

The entire operation is accomplished with the single line:

products['quantity'] = products['quantity'].fillna(0)

This efficiently handles all missing values regardless of whether they're represented as None or NaN, making the data consistent and ready for further analysis.

Solution Implementation

1import pandas as pd
2
3
4def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
5    """
6    Fill missing values in the quantity column with 0.
7  
8    Args:
9        products: DataFrame containing product information with potential missing values
10      
11    Returns:
12        DataFrame with missing quantity values filled with 0
13    """
14    # Replace all NaN values in the 'quantity' column with 0
15    products['quantity'] = products['quantity'].fillna(0)
16  
17    # Return the modified DataFrame
18    return products
19
1import java.util.List;
2import java.util.ArrayList;
3import java.util.Map;
4import java.util.HashMap;
5
6public class Solution {
7    /**
8     * Fill missing values in the quantity column with 0.
9     * 
10     * @param products List of Maps representing product information with potential missing values
11     * @return List of Maps with missing quantity values filled with 0
12     */
13    public List<Map<String, Object>> fillMissingValues(List<Map<String, Object>> products) {
14        // Create a new list to store the modified products
15        List<Map<String, Object>> result = new ArrayList<>();
16      
17        // Iterate through each product in the input list
18        for (Map<String, Object> product : products) {
19            // Create a copy of the current product map
20            Map<String, Object> modifiedProduct = new HashMap<>(product);
21          
22            // Check if quantity field is null or missing
23            if (!modifiedProduct.containsKey("quantity") || modifiedProduct.get("quantity") == null) {
24                // Replace null or missing quantity with 0
25                modifiedProduct.put("quantity", 0);
26            }
27          
28            // Add the modified product to the result list
29            result.add(modifiedProduct);
30        }
31      
32        // Return the modified list of products
33        return result;
34    }
35}
36
1#include <vector>
2#include <string>
3#include <algorithm>
4#include <cmath>
5
6class Solution {
7public:
8    /**
9     * Fill missing values in the quantity column with 0.
10     * 
11     * @param products: 2D vector representing product information with potential missing values
12     *                  Each row represents a product, columns represent different attributes
13     *                  Assuming the quantity column is at a fixed index
14     * @return: 2D vector with missing quantity values filled with 0
15     */
16    std::vector<std::vector<double>> fillMissingValues(std::vector<std::vector<double>>& products) {
17        // Assuming quantity is in a specific column (e.g., column index 1)
18        const int QUANTITY_COLUMN_INDEX = 1;
19      
20        // Iterate through each row in the products vector
21        for (size_t i = 0; i < products.size(); ++i) {
22            // Check if the current row has enough columns
23            if (products[i].size() > QUANTITY_COLUMN_INDEX) {
24                // Check if the quantity value is NaN (Not a Number)
25                if (std::isnan(products[i][QUANTITY_COLUMN_INDEX])) {
26                    // Replace NaN with 0
27                    products[i][QUANTITY_COLUMN_INDEX] = 0.0;
28                }
29            }
30        }
31      
32        // Return the modified 2D vector
33        return products;
34    }
35};
36
1// Import necessary types for DataFrame operations
2// Note: In TypeScript, we'd typically use a library like danfojs or similar for DataFrame operations
3
4/**
5 * Fill missing values in the quantity column with 0.
6 * 
7 * @param products - DataFrame containing product information with potential missing values
8 * @returns DataFrame with missing quantity values filled with 0
9 */
10function fillMissingValues(products: DataFrame): DataFrame {
11    // Replace all NaN values in the 'quantity' column with 0
12    // Using fillna method to replace missing values
13    products.loc({ columns: ['quantity'] }).fillNa(0, { inplace: true });
14  
15    // Return the modified DataFrame
16    return products;
17}
18```
19
20Note: TypeScript doesn't have a native pandas equivalent. The above code assumes you're using a DataFrame library like DanfoJS or similar that provides DataFrame functionality in JavaScript/TypeScript. The exact syntax might vary depending on the specific library you're using. If you're using DanfoJS specifically, the syntax would be slightly different:
21
22```typescript
23/**
24 * Fill missing values in the quantity column with 0.
25 * 
26 * @param products - DataFrame containing product information with potential missing values
27 * @returns DataFrame with missing quantity values filled with 0
28 */
29function fillMissingValues(products: any): any {
30    // Replace all NaN values in the 'quantity' column with 0
31    const filledProducts = products.fillNa(0, { columns: ['quantity'] });
32  
33    // Return the modified DataFrame
34    return filledProducts;
35}
36

Time and Space Complexity

Time Complexity: O(n) where n is the number of rows in the DataFrame. The fillna() method needs to iterate through all rows in the 'quantity' column to identify and replace NaN values with 0.

Space Complexity: O(1) for the additional space used by the algorithm itself. The fillna() operation modifies the DataFrame in-place when called on a specific column, so no additional copy of the data is created. The only extra space used is for temporary variables and the return reference, which is constant regardless of input size.

Note: The total space occupied by the DataFrame is O(n × m) where n is the number of rows and m is the number of columns, but this is the input space, not additional space created by the algorithm.

Common Pitfalls

1. Data Type Inconsistency After Filling

One common pitfall is that fillna(0) might not preserve the integer data type if the column contains NaN values. In pandas, columns with NaN values are automatically converted to float type because NaN is a float value. After filling with 0, the column might remain as float type (e.g., 0.0 instead of 0).

Solution: Convert the column back to integer type after filling:

products['quantity'] = products['quantity'].fillna(0).astype(int)

Or use the newer nullable integer type:

products['quantity'] = products['quantity'].fillna(0).astype('Int64')

2. Not Creating a Copy When Required

The current implementation modifies the DataFrame in place. If the original DataFrame needs to be preserved for other operations or comparisons, this direct modification could cause issues.

Solution: Create a copy of the DataFrame before modification:

def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
    products_copy = products.copy()
    products_copy['quantity'] = products_copy['quantity'].fillna(0)
    return products_copy

3. Assuming Column Exists

The code assumes the 'quantity' column exists in the DataFrame. If it doesn't, a KeyError will be raised.

Solution: Add a check for column existence:

def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
    if 'quantity' in products.columns:
        products['quantity'] = products['quantity'].fillna(0)
    return products

4. Not Handling Different Types of Missing Values

While fillna() handles NaN and None, it doesn't handle empty strings, whitespace, or other representations of missing data that might exist in real-world datasets.

Solution: Clean the data more thoroughly:

def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
    # Replace empty strings and whitespace with NaN first
    products['quantity'] = products['quantity'].replace(['', ' ', 'null'], pd.NA)
    # Then fill NaN values with 0
    products['quantity'] = products['quantity'].fillna(0)
    return products

5. Chain Assignment Warning

In some pandas configurations, the direct assignment might trigger a SettingWithCopyWarning if the DataFrame is a view of another DataFrame.

Solution: Use the loc accessor for explicit assignment:

products.loc[:, 'quantity'] = products['quantity'].fillna(0)
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following problems can be solved with backtracking (select multiple)


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More