Facebook Pixel

2890. Reshape Data Melt

Problem Description

You have a DataFrame called report that contains sales data for different products across four quarters. The data is currently in a "wide" format where each quarter is a separate column.

The DataFrame has the following structure:

  • product: The name of the product (object/string type)
  • quarter_1: Sales for the first quarter (integer)
  • quarter_2: Sales for the second quarter (integer)
  • quarter_3: Sales for the third quarter (integer)
  • quarter_4: Sales for the fourth quarter (integer)

Your task is to reshape this data from wide format to long format. In the reshaped data:

  • Each row should represent the sales of one product in one specific quarter
  • Instead of having four separate quarter columns, you should have:
    • A quarter column that contains the quarter name (e.g., "quarter_1", "quarter_2", etc.)
    • A sales column that contains the sales value for that product in that quarter

For example, if the original data has one row with a product and its sales across 4 quarters, the reshaped data would have 4 rows for that same product - one row for each quarter with its corresponding sales value.

The solution uses pd.melt() function which "unpivots" the DataFrame from wide to long format. The id_vars=['product'] parameter keeps the product column as an identifier, var_name='quarter' names the new column containing the original column names (quarter_1, quarter_2, etc.), and value_name='sales' names the new column containing the sales values.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight here is recognizing that we need to transform data from a "wide" format (where related data spreads across multiple columns) to a "long" format (where related data stacks vertically in rows). This is a common data reshaping pattern in data analysis.

Think of it like converting a wide table where you can see all quarters side-by-side for each product into a tall, narrow table where each product-quarter combination gets its own row. Instead of reading across columns to see all quarters for a product, you'd read down rows.

The natural approach in pandas for this transformation is the melt() function, which is specifically designed for "unpivoting" data. The name "melt" is quite descriptive - imagine the wide DataFrame "melting" down into a longer, narrower shape.

To arrive at this solution, we need to identify:

  1. What should stay constant in each row (the product name) - this becomes our id_vars
  2. What columns should be "melted" or unpivoted (all the quarter columns) - these are implicitly all columns not in id_vars
  3. What to call the new column that will hold the old column names (quarter) - this is our var_name
  4. What to call the new column that will hold the values from those columns (sales) - this is our value_name

The beauty of pd.melt() is that it handles this entire transformation in a single function call, converting each quarter column into rows while preserving the product association, effectively multiplying the number of rows by the number of quarter columns while reducing the total number of columns.

Solution Approach

The solution uses pandas' melt() function to reshape the DataFrame from wide to long format. Here's the step-by-step implementation:

def meltTable(report: pd.DataFrame) -> pd.DataFrame:
    return pd.melt(report, id_vars=['product'], var_name='quarter', value_name='sales')

The pd.melt() function takes several key parameters:

  1. report: The input DataFrame that needs to be reshaped.

  2. id_vars=['product']: This specifies which columns should be kept as identifier variables. These columns won't be "melted" and will be repeated for each unpivoted column. In our case, we want to keep the product column as is, so each product will appear multiple times in the result (once for each quarter).

  3. var_name='quarter': This parameter names the new column that will contain the original column headers. Since we're melting quarter_1, quarter_2, quarter_3, and quarter_4, this new column will contain these values as strings.

  4. value_name='sales': This parameter names the new column that will contain the actual values from the melted columns. All the sales numbers that were previously spread across four columns will now be stacked vertically in this single column.

The transformation process works as follows:

  • For each row in the original DataFrame, melt() creates 4 new rows (one for each quarter column)
  • The product value is copied to all 4 new rows
  • Each new row gets the quarter column name in the quarter column
  • Each new row gets the corresponding sales value in the sales column

If the original DataFrame had N products, the resulting DataFrame will have N × 4 rows, transforming from a wide format with 5 columns to a long format with 3 columns.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate how the melt() function transforms our data.

Starting DataFrame (Wide Format):

| product | quarter_1 | quarter_2 | quarter_3 | quarter_4 |
|---------|-----------|-----------|-----------|-----------|
| Umbrella| 417       | 224       | 379       | 611       |
| SleepingBag| 800    | 936       | 93        | 875       |

We have 2 products with their sales across 4 quarters, giving us 2 rows × 5 columns.

Step 1: Apply pd.melt() When we call pd.melt(report, id_vars=['product'], var_name='quarter', value_name='sales'):

Step 2: Process First Product (Umbrella) The function takes the Umbrella row and creates 4 new rows:

  • Row 1: product='Umbrella', quarter='quarter_1', sales=417
  • Row 2: product='Umbrella', quarter='quarter_2', sales=224
  • Row 3: product='Umbrella', quarter='quarter_3', sales=379
  • Row 4: product='Umbrella', quarter='quarter_4', sales=611

Step 3: Process Second Product (SleepingBag) Similarly, for SleepingBag, it creates 4 more rows:

  • Row 5: product='SleepingBag', quarter='quarter_1', sales=800
  • Row 6: product='SleepingBag', quarter='quarter_2', sales=936
  • Row 7: product='SleepingBag', quarter='quarter_3', sales=93
  • Row 8: product='SleepingBag', quarter='quarter_4', sales=875

Final DataFrame (Long Format):

| product     | quarter   | sales |
|-------------|-----------|-------|
| Umbrella    | quarter_1 | 417   |
| Umbrella    | quarter_2 | 224   |
| Umbrella    | quarter_3 | 379   |
| Umbrella    | quarter_4 | 611   |
| SleepingBag | quarter_1 | 800   |
| SleepingBag | quarter_2 | 936   |
| SleepingBag | quarter_3 | 93    |
| SleepingBag | quarter_4 | 875   |

The transformation converted our 2 rows × 5 columns wide format into 8 rows × 3 columns long format. Each product-quarter combination now has its own row, making the data structure more suitable for certain types of analysis and visualization.

Solution Implementation

1import pandas as pd
2
3
4def meltTable(report: pd.DataFrame) -> pd.DataFrame:
5    """
6    Reshape a DataFrame from wide format to long format.
7  
8    Transforms a DataFrame where each quarter is a separate column
9    into a format where quarter becomes a variable column and sales
10    values are in a single column.
11  
12    Args:
13        report: Input DataFrame with 'product' column and quarter columns
14      
15    Returns:
16        DataFrame in long format with columns: product, quarter, sales
17    """
18    # Melt the DataFrame to convert quarter columns into rows
19    # Keep 'product' as identifier column
20    # Name the variable column as 'quarter' (contains column names)
21    # Name the value column as 'sales' (contains the actual sales values)
22    melted_df = pd.melt(
23        frame=report,
24        id_vars=['product'],
25        var_name='quarter',
26        value_name='sales'
27    )
28  
29    return melted_df
30
1import java.util.*;
2import java.util.stream.*;
3
4public class Solution {
5    /**
6     * Reshape a DataFrame from wide format to long format.
7     * 
8     * Transforms a DataFrame where each quarter is a separate column
9     * into a format where quarter becomes a variable column and sales
10     * values are in a single column.
11     * 
12     * @param report Input DataFrame with 'product' column and quarter columns
13     * @return DataFrame in long format with columns: product, quarter, sales
14     */
15    public static DataFrame meltTable(DataFrame report) {
16        // Get all column names from the report DataFrame
17        List<String> columns = report.getColumnNames();
18      
19        // Separate the 'product' column from quarter columns
20        List<String> quarterColumns = columns.stream()
21            .filter(col -> !col.equals("product"))
22            .collect(Collectors.toList());
23      
24        // Create lists to store the melted data
25        List<String> products = new ArrayList<>();
26        List<String> quarters = new ArrayList<>();
27        List<Integer> sales = new ArrayList<>();
28      
29        // Iterate through each row in the original DataFrame
30        for (int rowIndex = 0; rowIndex < report.getRowCount(); rowIndex++) {
31            // Get the product value for current row
32            String product = (String) report.getValue(rowIndex, "product");
33          
34            // For each quarter column, create a new row in melted format
35            for (String quarterColumn : quarterColumns) {
36                // Add product value
37                products.add(product);
38              
39                // Add quarter name (column name becomes a value)
40                quarters.add(quarterColumn);
41              
42                // Add sales value from the corresponding cell
43                Integer saleValue = (Integer) report.getValue(rowIndex, quarterColumn);
44                sales.add(saleValue);
45            }
46        }
47      
48        // Create new DataFrame with melted structure
49        DataFrame meltedDf = new DataFrame();
50      
51        // Add columns to the melted DataFrame
52        meltedDf.addColumn("product", products);
53        meltedDf.addColumn("quarter", quarters);
54        meltedDf.addColumn("sales", sales);
55      
56        return meltedDf;
57    }
58}
59
1#include <iostream>
2#include <vector>
3#include <string>
4#include <map>
5#include <algorithm>
6
7// Structure to represent a row in the original wide format
8struct WideFormatRow {
9    std::string product;
10    std::map<std::string, int> quarterSales;  // Maps quarter name to sales value
11};
12
13// Structure to represent a row in the melted long format
14struct LongFormatRow {
15    std::string product;
16    std::string quarter;
17    int sales;
18};
19
20// Class to represent a DataFrame-like structure
21class DataFrame {
22public:
23    std::vector<LongFormatRow> data;
24  
25    // Constructor for creating DataFrame from long format data
26    DataFrame(const std::vector<LongFormatRow>& rows) : data(rows) {}
27  
28    // Default constructor
29    DataFrame() {}
30  
31    // Method to add a row to the DataFrame
32    void addRow(const LongFormatRow& row) {
33        data.push_back(row);
34    }
35  
36    // Method to display the DataFrame (for debugging)
37    void display() const {
38        std::cout << "product\tquarter\tsales\n";
39        for (const auto& row : data) {
40            std::cout << row.product << "\t" << row.quarter << "\t" << row.sales << "\n";
41        }
42    }
43};
44
45/**
46 * Reshape a DataFrame from wide format to long format.
47 * 
48 * Transforms a DataFrame where each quarter is a separate column
49 * into a format where quarter becomes a variable column and sales
50 * values are in a single column.
51 * 
52 * @param report: Input data in wide format with 'product' column and quarter columns
53 * @return DataFrame in long format with columns: product, quarter, sales
54 */
55DataFrame meltTable(const std::vector<WideFormatRow>& report) {
56    DataFrame melted_df;
57  
58    // Iterate through each row in the wide format report
59    for (const auto& wideRow : report) {
60        // For each quarter-sales pair in the current row
61        for (const auto& [quarter, sales] : wideRow.quarterSales) {
62            // Create a new row in long format
63            LongFormatRow longRow;
64          
65            // Keep 'product' as identifier column
66            longRow.product = wideRow.product;
67          
68            // Name the variable column as 'quarter' (contains column names)
69            longRow.quarter = quarter;
70          
71            // Name the value column as 'sales' (contains the actual sales values)
72            longRow.sales = sales;
73          
74            // Add the transformed row to the melted DataFrame
75            melted_df.addRow(longRow);
76        }
77    }
78  
79    return melted_df;
80}
81
82// Example usage and test function
83int main() {
84    // Create sample data in wide format
85    std::vector<WideFormatRow> report = {
86        {"Umbrella", {{"quarter_1", 417}, {"quarter_2", 224}, {"quarter_3", 379}, {"quarter_4", 611}}},
87        {"SleepingBag", {{"quarter_1", 800}, {"quarter_2", 936}, {"quarter_3", 93}, {"quarter_4", 875}}}
88    };
89  
90    // Melt the DataFrame to convert quarter columns into rows
91    DataFrame melted = meltTable(report);
92  
93    // Display the result
94    melted.display();
95  
96    return 0;
97}
98
1import * as pd from 'pandas-js';
2
3/**
4 * Reshape a DataFrame from wide format to long format.
5 * 
6 * Transforms a DataFrame where each quarter is a separate column
7 * into a format where quarter becomes a variable column and sales
8 * values are in a single column.
9 * 
10 * @param report - Input DataFrame with 'product' column and quarter columns
11 * @returns DataFrame in long format with columns: product, quarter, sales
12 */
13function meltTable(report: pd.DataFrame): pd.DataFrame {
14    // Melt the DataFrame to convert quarter columns into rows
15    // Keep 'product' as identifier column
16    // Name the variable column as 'quarter' (contains column names)
17    // Name the value column as 'sales' (contains the actual sales values)
18    const meltedDf: pd.DataFrame = pd.melt({
19        frame: report,
20        idVars: ['product'],
21        varName: 'quarter',
22        valueName: 'sales'
23    });
24  
25    return meltedDf;
26}
27

Time and Space Complexity

Time Complexity: O(n * m) where n is the number of rows in the DataFrame and m is the number of columns being melted (total columns minus id_vars columns).

The pd.melt() function needs to iterate through each row and transform each value column into a separate row in the resulting DataFrame. For each of the n rows, it processes m value columns (where m = total_columns - 1 since 'product' is the id_var), creating n * m new rows in the output DataFrame.

Space Complexity: O(n * m) where n is the number of rows in the original DataFrame and m is the number of columns being melted.

The space complexity is dominated by the output DataFrame created by pd.melt(). The resulting DataFrame will have n * m rows and 3 columns ('product', 'quarter', 'sales'). Since each original row with m value columns gets transformed into m rows in the output, the total space required is proportional to O(n * m * 3) = O(n * m).

Additionally, there may be temporary space used during the melting operation, but this doesn't change the overall space complexity class.

Common Pitfalls

1. Assuming Column Names Match Exactly

One common pitfall is assuming that the quarter columns will always be named exactly quarter_1, quarter_2, quarter_3, and quarter_4. If the DataFrame has different column names (like Q1, Q2, or q1_sales), the current solution will attempt to melt ALL columns except 'product', which might include unwanted columns.

Solution: Explicitly specify which columns to melt using the value_vars parameter:

def meltTable(report: pd.DataFrame) -> pd.DataFrame:
    quarter_cols = ['quarter_1', 'quarter_2', 'quarter_3', 'quarter_4']
    return pd.melt(report, 
                   id_vars=['product'], 
                   value_vars=quarter_cols,
                   var_name='quarter', 
                   value_name='sales')

2. Missing or NaN Values in Sales Data

The original wide-format data might contain NaN or missing values in some quarter columns. After melting, these NaN values will create rows with missing sales data, potentially affecting downstream analysis.

Solution: Add a dropna() call to remove rows with missing sales values:

def meltTable(report: pd.DataFrame) -> pd.DataFrame:
    melted_df = pd.melt(report, 
                        id_vars=['product'], 
                        var_name='quarter', 
                        value_name='sales')
    return melted_df.dropna(subset=['sales'])

3. Data Type Inconsistencies

If the quarter columns contain mixed data types (some integers, some strings, or some floats), the melted 'sales' column might end up with an unexpected dtype (like object), which can cause issues in numerical operations.

Solution: Ensure consistent data types after melting:

def meltTable(report: pd.DataFrame) -> pd.DataFrame:
    melted_df = pd.melt(report, 
                        id_vars=['product'], 
                        var_name='quarter', 
                        value_name='sales')
    melted_df['sales'] = pd.to_numeric(melted_df['sales'], errors='coerce')
    return melted_df

4. Multiple Identifier Columns

If the DataFrame has additional identifier columns beyond 'product' (like 'region', 'store_id', etc.) that should be preserved in the long format, forgetting to include them in id_vars will cause data loss.

Solution: Dynamically identify all non-quarter columns as identifiers:

def meltTable(report: pd.DataFrame) -> pd.DataFrame:
    quarter_cols = ['quarter_1', 'quarter_2', 'quarter_3', 'quarter_4']
    id_columns = [col for col in report.columns if col not in quarter_cols]
    return pd.melt(report, 
                   id_vars=id_columns,
                   value_vars=quarter_cols,
                   var_name='quarter', 
                   value_name='sales')
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

What does the following code do?

1def f(arr1, arr2):
2  i, j = 0, 0
3  new_arr = []
4  while i < len(arr1) and j < len(arr2):
5      if arr1[i] < arr2[j]:
6          new_arr.append(arr1[i])
7          i += 1
8      else:
9          new_arr.append(arr2[j])
10          j += 1
11  new_arr.extend(arr1[i:])
12  new_arr.extend(arr2[j:])
13  return new_arr
14
1public static List<Integer> f(int[] arr1, int[] arr2) {
2  int i = 0, j = 0;
3  List<Integer> newArr = new ArrayList<>();
4
5  while (i < arr1.length && j < arr2.length) {
6      if (arr1[i] < arr2[j]) {
7          newArr.add(arr1[i]);
8          i++;
9      } else {
10          newArr.add(arr2[j]);
11          j++;
12      }
13  }
14
15  while (i < arr1.length) {
16      newArr.add(arr1[i]);
17      i++;
18  }
19
20  while (j < arr2.length) {
21      newArr.add(arr2[j]);
22      j++;
23  }
24
25  return newArr;
26}
27
1function f(arr1, arr2) {
2  let i = 0, j = 0;
3  let newArr = [];
4  
5  while (i < arr1.length && j < arr2.length) {
6      if (arr1[i] < arr2[j]) {
7          newArr.push(arr1[i]);
8          i++;
9      } else {
10          newArr.push(arr2[j]);
11          j++;
12      }
13  }
14  
15  while (i < arr1.length) {
16      newArr.push(arr1[i]);
17      i++;
18  }
19  
20  while (j < arr2.length) {
21      newArr.push(arr2[j]);
22      j++;
23  }
24  
25  return newArr;
26}
27

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More