2890. Reshape Data Melt
Problem Description
You have a DataFrame called report
that contains sales data for different products across four quarters. The data is currently in a "wide" format where each quarter is a separate column.
The DataFrame has the following structure:
product
: The name of the product (object/string type)quarter_1
: Sales for the first quarter (integer)quarter_2
: Sales for the second quarter (integer)quarter_3
: Sales for the third quarter (integer)quarter_4
: Sales for the fourth quarter (integer)
Your task is to reshape this data from wide format to long format. In the reshaped data:
- Each row should represent the sales of one product in one specific quarter
- Instead of having four separate quarter columns, you should have:
- A
quarter
column that contains the quarter name (e.g., "quarter_1", "quarter_2", etc.) - A
sales
column that contains the sales value for that product in that quarter
- A
For example, if the original data has one row with a product and its sales across 4 quarters, the reshaped data would have 4 rows for that same product - one row for each quarter with its corresponding sales value.
The solution uses pd.melt()
function which "unpivots" the DataFrame from wide to long format. The id_vars=['product']
parameter keeps the product column as an identifier, var_name='quarter'
names the new column containing the original column names (quarter_1, quarter_2, etc.), and value_name='sales'
names the new column containing the sales values.
Intuition
The key insight here is recognizing that we need to transform data from a "wide" format (where related data spreads across multiple columns) to a "long" format (where related data stacks vertically in rows). This is a common data reshaping pattern in data analysis.
Think of it like converting a wide table where you can see all quarters side-by-side for each product into a tall, narrow table where each product-quarter combination gets its own row. Instead of reading across columns to see all quarters for a product, you'd read down rows.
The natural approach in pandas for this transformation is the melt()
function, which is specifically designed for "unpivoting" data. The name "melt" is quite descriptive - imagine the wide DataFrame "melting" down into a longer, narrower shape.
To arrive at this solution, we need to identify:
- What should stay constant in each row (the product name) - this becomes our
id_vars
- What columns should be "melted" or unpivoted (all the quarter columns) - these are implicitly all columns not in
id_vars
- What to call the new column that will hold the old column names (
quarter
) - this is ourvar_name
- What to call the new column that will hold the values from those columns (
sales
) - this is ourvalue_name
The beauty of pd.melt()
is that it handles this entire transformation in a single function call, converting each quarter column into rows while preserving the product association, effectively multiplying the number of rows by the number of quarter columns while reducing the total number of columns.
Solution Approach
The solution uses pandas' melt()
function to reshape the DataFrame from wide to long format. Here's the step-by-step implementation:
def meltTable(report: pd.DataFrame) -> pd.DataFrame:
return pd.melt(report, id_vars=['product'], var_name='quarter', value_name='sales')
The pd.melt()
function takes several key parameters:
-
report
: The input DataFrame that needs to be reshaped. -
id_vars=['product']
: This specifies which columns should be kept as identifier variables. These columns won't be "melted" and will be repeated for each unpivoted column. In our case, we want to keep theproduct
column as is, so each product will appear multiple times in the result (once for each quarter). -
var_name='quarter'
: This parameter names the new column that will contain the original column headers. Since we're meltingquarter_1
,quarter_2
,quarter_3
, andquarter_4
, this new column will contain these values as strings. -
value_name='sales'
: This parameter names the new column that will contain the actual values from the melted columns. All the sales numbers that were previously spread across four columns will now be stacked vertically in this single column.
The transformation process works as follows:
- For each row in the original DataFrame,
melt()
creates 4 new rows (one for each quarter column) - The
product
value is copied to all 4 new rows - Each new row gets the quarter column name in the
quarter
column - Each new row gets the corresponding sales value in the
sales
column
If the original DataFrame had N products, the resulting DataFrame will have N × 4 rows, transforming from a wide format with 5 columns to a long format with 3 columns.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate how the melt()
function transforms our data.
Starting DataFrame (Wide Format):
| product | quarter_1 | quarter_2 | quarter_3 | quarter_4 | |---------|-----------|-----------|-----------|-----------| | Umbrella| 417 | 224 | 379 | 611 | | SleepingBag| 800 | 936 | 93 | 875 |
We have 2 products with their sales across 4 quarters, giving us 2 rows × 5 columns.
Step 1: Apply pd.melt()
When we call pd.melt(report, id_vars=['product'], var_name='quarter', value_name='sales')
:
Step 2: Process First Product (Umbrella) The function takes the Umbrella row and creates 4 new rows:
- Row 1:
product='Umbrella'
,quarter='quarter_1'
,sales=417
- Row 2:
product='Umbrella'
,quarter='quarter_2'
,sales=224
- Row 3:
product='Umbrella'
,quarter='quarter_3'
,sales=379
- Row 4:
product='Umbrella'
,quarter='quarter_4'
,sales=611
Step 3: Process Second Product (SleepingBag) Similarly, for SleepingBag, it creates 4 more rows:
- Row 5:
product='SleepingBag'
,quarter='quarter_1'
,sales=800
- Row 6:
product='SleepingBag'
,quarter='quarter_2'
,sales=936
- Row 7:
product='SleepingBag'
,quarter='quarter_3'
,sales=93
- Row 8:
product='SleepingBag'
,quarter='quarter_4'
,sales=875
Final DataFrame (Long Format):
| product | quarter | sales | |-------------|-----------|-------| | Umbrella | quarter_1 | 417 | | Umbrella | quarter_2 | 224 | | Umbrella | quarter_3 | 379 | | Umbrella | quarter_4 | 611 | | SleepingBag | quarter_1 | 800 | | SleepingBag | quarter_2 | 936 | | SleepingBag | quarter_3 | 93 | | SleepingBag | quarter_4 | 875 |
The transformation converted our 2 rows × 5 columns wide format into 8 rows × 3 columns long format. Each product-quarter combination now has its own row, making the data structure more suitable for certain types of analysis and visualization.
Solution Implementation
1import pandas as pd
2
3
4def meltTable(report: pd.DataFrame) -> pd.DataFrame:
5 """
6 Reshape a DataFrame from wide format to long format.
7
8 Transforms a DataFrame where each quarter is a separate column
9 into a format where quarter becomes a variable column and sales
10 values are in a single column.
11
12 Args:
13 report: Input DataFrame with 'product' column and quarter columns
14
15 Returns:
16 DataFrame in long format with columns: product, quarter, sales
17 """
18 # Melt the DataFrame to convert quarter columns into rows
19 # Keep 'product' as identifier column
20 # Name the variable column as 'quarter' (contains column names)
21 # Name the value column as 'sales' (contains the actual sales values)
22 melted_df = pd.melt(
23 frame=report,
24 id_vars=['product'],
25 var_name='quarter',
26 value_name='sales'
27 )
28
29 return melted_df
30
1import java.util.*;
2import java.util.stream.*;
3
4public class Solution {
5 /**
6 * Reshape a DataFrame from wide format to long format.
7 *
8 * Transforms a DataFrame where each quarter is a separate column
9 * into a format where quarter becomes a variable column and sales
10 * values are in a single column.
11 *
12 * @param report Input DataFrame with 'product' column and quarter columns
13 * @return DataFrame in long format with columns: product, quarter, sales
14 */
15 public static DataFrame meltTable(DataFrame report) {
16 // Get all column names from the report DataFrame
17 List<String> columns = report.getColumnNames();
18
19 // Separate the 'product' column from quarter columns
20 List<String> quarterColumns = columns.stream()
21 .filter(col -> !col.equals("product"))
22 .collect(Collectors.toList());
23
24 // Create lists to store the melted data
25 List<String> products = new ArrayList<>();
26 List<String> quarters = new ArrayList<>();
27 List<Integer> sales = new ArrayList<>();
28
29 // Iterate through each row in the original DataFrame
30 for (int rowIndex = 0; rowIndex < report.getRowCount(); rowIndex++) {
31 // Get the product value for current row
32 String product = (String) report.getValue(rowIndex, "product");
33
34 // For each quarter column, create a new row in melted format
35 for (String quarterColumn : quarterColumns) {
36 // Add product value
37 products.add(product);
38
39 // Add quarter name (column name becomes a value)
40 quarters.add(quarterColumn);
41
42 // Add sales value from the corresponding cell
43 Integer saleValue = (Integer) report.getValue(rowIndex, quarterColumn);
44 sales.add(saleValue);
45 }
46 }
47
48 // Create new DataFrame with melted structure
49 DataFrame meltedDf = new DataFrame();
50
51 // Add columns to the melted DataFrame
52 meltedDf.addColumn("product", products);
53 meltedDf.addColumn("quarter", quarters);
54 meltedDf.addColumn("sales", sales);
55
56 return meltedDf;
57 }
58}
59
1#include <iostream>
2#include <vector>
3#include <string>
4#include <map>
5#include <algorithm>
6
7// Structure to represent a row in the original wide format
8struct WideFormatRow {
9 std::string product;
10 std::map<std::string, int> quarterSales; // Maps quarter name to sales value
11};
12
13// Structure to represent a row in the melted long format
14struct LongFormatRow {
15 std::string product;
16 std::string quarter;
17 int sales;
18};
19
20// Class to represent a DataFrame-like structure
21class DataFrame {
22public:
23 std::vector<LongFormatRow> data;
24
25 // Constructor for creating DataFrame from long format data
26 DataFrame(const std::vector<LongFormatRow>& rows) : data(rows) {}
27
28 // Default constructor
29 DataFrame() {}
30
31 // Method to add a row to the DataFrame
32 void addRow(const LongFormatRow& row) {
33 data.push_back(row);
34 }
35
36 // Method to display the DataFrame (for debugging)
37 void display() const {
38 std::cout << "product\tquarter\tsales\n";
39 for (const auto& row : data) {
40 std::cout << row.product << "\t" << row.quarter << "\t" << row.sales << "\n";
41 }
42 }
43};
44
45/**
46 * Reshape a DataFrame from wide format to long format.
47 *
48 * Transforms a DataFrame where each quarter is a separate column
49 * into a format where quarter becomes a variable column and sales
50 * values are in a single column.
51 *
52 * @param report: Input data in wide format with 'product' column and quarter columns
53 * @return DataFrame in long format with columns: product, quarter, sales
54 */
55DataFrame meltTable(const std::vector<WideFormatRow>& report) {
56 DataFrame melted_df;
57
58 // Iterate through each row in the wide format report
59 for (const auto& wideRow : report) {
60 // For each quarter-sales pair in the current row
61 for (const auto& [quarter, sales] : wideRow.quarterSales) {
62 // Create a new row in long format
63 LongFormatRow longRow;
64
65 // Keep 'product' as identifier column
66 longRow.product = wideRow.product;
67
68 // Name the variable column as 'quarter' (contains column names)
69 longRow.quarter = quarter;
70
71 // Name the value column as 'sales' (contains the actual sales values)
72 longRow.sales = sales;
73
74 // Add the transformed row to the melted DataFrame
75 melted_df.addRow(longRow);
76 }
77 }
78
79 return melted_df;
80}
81
82// Example usage and test function
83int main() {
84 // Create sample data in wide format
85 std::vector<WideFormatRow> report = {
86 {"Umbrella", {{"quarter_1", 417}, {"quarter_2", 224}, {"quarter_3", 379}, {"quarter_4", 611}}},
87 {"SleepingBag", {{"quarter_1", 800}, {"quarter_2", 936}, {"quarter_3", 93}, {"quarter_4", 875}}}
88 };
89
90 // Melt the DataFrame to convert quarter columns into rows
91 DataFrame melted = meltTable(report);
92
93 // Display the result
94 melted.display();
95
96 return 0;
97}
98
1import * as pd from 'pandas-js';
2
3/**
4 * Reshape a DataFrame from wide format to long format.
5 *
6 * Transforms a DataFrame where each quarter is a separate column
7 * into a format where quarter becomes a variable column and sales
8 * values are in a single column.
9 *
10 * @param report - Input DataFrame with 'product' column and quarter columns
11 * @returns DataFrame in long format with columns: product, quarter, sales
12 */
13function meltTable(report: pd.DataFrame): pd.DataFrame {
14 // Melt the DataFrame to convert quarter columns into rows
15 // Keep 'product' as identifier column
16 // Name the variable column as 'quarter' (contains column names)
17 // Name the value column as 'sales' (contains the actual sales values)
18 const meltedDf: pd.DataFrame = pd.melt({
19 frame: report,
20 idVars: ['product'],
21 varName: 'quarter',
22 valueName: 'sales'
23 });
24
25 return meltedDf;
26}
27
Time and Space Complexity
Time Complexity: O(n * m)
where n
is the number of rows in the DataFrame and m
is the number of columns being melted (total columns minus id_vars columns).
The pd.melt()
function needs to iterate through each row and transform each value column into a separate row in the resulting DataFrame. For each of the n
rows, it processes m
value columns (where m = total_columns - 1
since 'product' is the id_var), creating n * m
new rows in the output DataFrame.
Space Complexity: O(n * m)
where n
is the number of rows in the original DataFrame and m
is the number of columns being melted.
The space complexity is dominated by the output DataFrame created by pd.melt()
. The resulting DataFrame will have n * m
rows and 3 columns ('product', 'quarter', 'sales'). Since each original row with m
value columns gets transformed into m
rows in the output, the total space required is proportional to O(n * m * 3) = O(n * m)
.
Additionally, there may be temporary space used during the melting operation, but this doesn't change the overall space complexity class.
Common Pitfalls
1. Assuming Column Names Match Exactly
One common pitfall is assuming that the quarter columns will always be named exactly quarter_1
, quarter_2
, quarter_3
, and quarter_4
. If the DataFrame has different column names (like Q1
, Q2
, or q1_sales
), the current solution will attempt to melt ALL columns except 'product', which might include unwanted columns.
Solution:
Explicitly specify which columns to melt using the value_vars
parameter:
def meltTable(report: pd.DataFrame) -> pd.DataFrame:
quarter_cols = ['quarter_1', 'quarter_2', 'quarter_3', 'quarter_4']
return pd.melt(report,
id_vars=['product'],
value_vars=quarter_cols,
var_name='quarter',
value_name='sales')
2. Missing or NaN Values in Sales Data
The original wide-format data might contain NaN or missing values in some quarter columns. After melting, these NaN values will create rows with missing sales data, potentially affecting downstream analysis.
Solution: Add a dropna() call to remove rows with missing sales values:
def meltTable(report: pd.DataFrame) -> pd.DataFrame:
melted_df = pd.melt(report,
id_vars=['product'],
var_name='quarter',
value_name='sales')
return melted_df.dropna(subset=['sales'])
3. Data Type Inconsistencies
If the quarter columns contain mixed data types (some integers, some strings, or some floats), the melted 'sales' column might end up with an unexpected dtype (like object), which can cause issues in numerical operations.
Solution: Ensure consistent data types after melting:
def meltTable(report: pd.DataFrame) -> pd.DataFrame:
melted_df = pd.melt(report,
id_vars=['product'],
var_name='quarter',
value_name='sales')
melted_df['sales'] = pd.to_numeric(melted_df['sales'], errors='coerce')
return melted_df
4. Multiple Identifier Columns
If the DataFrame has additional identifier columns beyond 'product' (like 'region', 'store_id', etc.) that should be preserved in the long format, forgetting to include them in id_vars
will cause data loss.
Solution: Dynamically identify all non-quarter columns as identifiers:
def meltTable(report: pd.DataFrame) -> pd.DataFrame:
quarter_cols = ['quarter_1', 'quarter_2', 'quarter_3', 'quarter_4']
id_columns = [col for col in report.columns if col not in quarter_cols]
return pd.melt(report,
id_vars=id_columns,
value_vars=quarter_cols,
var_name='quarter',
value_name='sales')
What does the following code do?
1def f(arr1, arr2):
2 i, j = 0, 0
3 new_arr = []
4 while i < len(arr1) and j < len(arr2):
5 if arr1[i] < arr2[j]:
6 new_arr.append(arr1[i])
7 i += 1
8 else:
9 new_arr.append(arr2[j])
10 j += 1
11 new_arr.extend(arr1[i:])
12 new_arr.extend(arr2[j:])
13 return new_arr
14
1public static List<Integer> f(int[] arr1, int[] arr2) {
2 int i = 0, j = 0;
3 List<Integer> newArr = new ArrayList<>();
4
5 while (i < arr1.length && j < arr2.length) {
6 if (arr1[i] < arr2[j]) {
7 newArr.add(arr1[i]);
8 i++;
9 } else {
10 newArr.add(arr2[j]);
11 j++;
12 }
13 }
14
15 while (i < arr1.length) {
16 newArr.add(arr1[i]);
17 i++;
18 }
19
20 while (j < arr2.length) {
21 newArr.add(arr2[j]);
22 j++;
23 }
24
25 return newArr;
26}
27
1function f(arr1, arr2) {
2 let i = 0, j = 0;
3 let newArr = [];
4
5 while (i < arr1.length && j < arr2.length) {
6 if (arr1[i] < arr2[j]) {
7 newArr.push(arr1[i]);
8 i++;
9 } else {
10 newArr.push(arr2[j]);
11 j++;
12 }
13 }
14
15 while (i < arr1.length) {
16 newArr.push(arr1[i]);
17 i++;
18 }
19
20 while (j < arr2.length) {
21 newArr.push(arr2[j]);
22 j++;
23 }
24
25 return newArr;
26}
27
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!