2887. Fill Missing Data


Problem Description

The task is to process a DataFrame named products which represents a collection of products with their names, quantities, and prices. The DataFrame has the following columns: name which is of type object to represent the product names, quantity which is an integer indicating how many of the products are available, and price which is also an integer signifying the cost of each product.

Our main goal is to identify rows in the quantity column where the quantity is missing (indicated by None) and then fill in these missing values with zeros (0). This operation is required to ensure that the dataset is cleaner or perhaps more consistent for subsequent processing, which could include database insertion, data analysis or any other operation that requires complete data.

As an example, consider that we have a DataFrame with products like Wristwatch and WirelessEarbuds having a missing quantity. After the operation, these missing values should be replaced by 0 without affecting any other columns or the other (non-missing) values in the quantity column.

Intuition

The solution to this problem leverages the functionality provided by the pandas library in Python, which is widely used for data manipulation and analysis. Specifically, it involves the use of the fillna() function, which is a convenient method to fill NA/NaN values in a DataFrame.

Here's the rationale:

  1. The fillna() function allows us to specify a value that replaces the missing or NaN (Not a Number) entries in a DataFrame or Series. Since we want to replace missing values in the quantity column with zero (0), this function is a suitable choice.

  2. This function is called on the specific column (quantity) of our DataFrame (products). We use products['quantity'] to isolate this column and then apply the fillna(0), with 0 being the value to substitute for missing entries.

  3. The fillna() operation is done in-place, meaning that it directly modifies the input DataFrame without the need for assignment unless otherwise specified by the inplace parameter.

  4. Finally, the modified DataFrame is then returned with the missing quantity values replaced by zeros.

The operation is straightforward and efficient, requiring only a single line of code to achieve the desired result.

Solution Approach

In our solution, the essential function is fillna(), part of the pandas library's DataFrame methods. It's used to fill NA/NaN values with a specified scalar value or a dictionary/array. The fill value for missing data in our case is 0. The choice of this function is driven by the simplicity and effectiveness in dealing with missing data in pandas DataFrames.

Here is the breakdown of the approach used in the provided code snippet:

  1. The function fillMissingValues(products: pd.DataFrame) -> pd.DataFrame is defined to take a DataFrame as an input parameter and returns a DataFrame with the missing values filled in.

  2. Inside the function, we access the quantity column of the provided products DataFrame using products['quantity'].

  3. We then call fillna() on this column with the argument 0, which represents the value we want to use to replace the NaN (or None) values. The expression becomes products['quantity'].fillna(0).

  4. The fillna() function, by default, does not modify the existing DataFrame. Instead, it returns a new Series with the missing values filled. Therefore, we directly assign the result back to products['quantity'] to update that column with the filled in values.

  5. After the fillna() operation, the quantity column no longer has missing values; all such instances have been replaced with 0.

  6. The last step is to return the modified products DataFrame from the function, now with all the missing values in the quantity column filled with 0.

We do not use additional data structures, algorithms, or patterns as the problem can be effectively solved using the DataFrame and its methods provided by pandas. This approach is very efficient because it utilizes highly optimized pandas library functions designed specifically to handle such data manipulation tasks.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's visualize how the solution approach will work with a small example. Suppose we have the following initial products DataFrame:

namequantityprice
WristwatchNone50
WirelessEarbudsNone150
Notebook205

According to the problem description, we need to replace the None values in the quantity column with 0. By following the solution approach, this is what happens:

  1. We define the function fillMissingValues(products: pd.DataFrame) -> pd.DataFrame.

  2. Inside this function, we target the quantity column of products using products['quantity'].

  3. We use products['quantity'].fillna(0) which generates a new Series:

    quantity
    0
    0
    20
  4. This result is then assigned back to products['quantity'], effectively updating the original DataFrame.

  5. The function then returns the updated DataFrame which now looks like this:

namequantityprice
Wristwatch050
WirelessEarbuds0150
Notebook205

Observing the final result, we can confirm that the missing values in the quantity column were successfully replaced with 0, achieving the objective of processing the data as required.

Solution Implementation

1import pandas as pd
2
3# Define a function to fill missing values in the 'quantity' column of a DataFrame
4def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
5    # Replace NaN values in the 'quantity' column with 0
6    products['quantity'] = products['quantity'].fillna(0)
7    # Return the DataFrame after filling in missing values
8    return products
9
1import java.util.List;
2import java.util.Objects;
3
4public class Product {
5    private String name;
6    private Integer quantity;
7
8    // Constructor, getters and setters for product
9
10    public Product(String name, Integer quantity) {
11        this.name = name;
12        this.quantity = quantity;
13    }
14
15    public String getName() {
16        return name;
17    }
18
19    public void setName(String name) {
20        this.name = name;
21    }
22
23    public Integer getQuantity() {
24        return quantity;
25    }
26
27    public void setQuantity(Integer quantity) {
28        this.quantity = quantity;
29    }
30}
31
32public class ProductUtils {
33
34    /**
35     * Fills missing values in the 'quantity' field of a list of Product objects.
36     *
37     * @param products List of Product objects.
38     * @return The same List of Product objects with 'quantity' missing values replaced with 0.
39     */
40    public static List<Product> fillMissingValues(List<Product> products) {
41        for (Product product : products) {
42            // If the quantity is null (equivalent to NaN in pandas), set it to 0
43            if (product.getQuantity() == null) {
44                product.setQuantity(0);
45            }
46        }
47        // Return the list after filling in missing values
48        return products;
49    }
50}
51
1#include <vector>
2
3// Define a struct to represent a product with a quantity attribute
4struct Product {
5    // You can add other product attributes here
6    int quantity;
7  
8    // Constructor to initialize the product with a quantity
9    Product(int qty): quantity(qty) {}
10};
11
12// Define a function to fill missing values in the 'quantity' field of a vector of Products
13std::vector<Product> fillMissingValues(std::vector<Product> &products) {
14    // Iterate over each Product in the vector by reference
15    for (Product &p : products) {
16        // Check if the quantity is marked as 'missing' using a negative value as the indicator
17        if (p.quantity < 0) {
18            // Replace the 'missing' value with 0
19            p.quantity = 0;
20        }
21    }
22    // Return the vector after filling in missing values
23    return products;
24}
25
1// Import your custom types or interfaces that support DataFrame operations
2import { DataFrame } from './path-to-your-dataframe-definitions';
3
4// Define a function to fill missing values in the 'quantity' column of a DataFrame
5function fillMissingValues(products: DataFrame): DataFrame {
6    // Check if 'products' has a 'quantity' column and fill NaN values with 0
7    if (products.quantity) {
8        // Assuming 'replaceNaNWithZero' is a method provided by your DataFrame library
9        products.quantity = products.quantity.replaceNaNWithZero(0);
10    }
11    // Return the DataFrame after filling in missing values
12    return products;
13}
14

Time and Space Complexity

The time complexity of the code can be considered as O(n), where n is the number of rows in the DataFrame products. This is because the fillna method needs to scan through the 'quantity' column and fill missing values with zeros.

The space complexity of the method is O(1). This is because the fillna operation is done in place, and no additional space proportional to the size of the input data is used.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

Which of the two traversal algorithms (BFS and DFS) can be used to find whether two nodes are connected?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!