2889. Reshape Data Pivot
Problem Description
This problem asks you to transform a DataFrame from a long format to a wide format using a pivot operation.
You're given a DataFrame called weather
with three columns:
city
: the name of a city (string/object type)month
: the name of a month (string/object type)temperature
: the temperature value (integer)
The initial data is in "long format" where each row represents a single temperature measurement for one city in one month.
Your task is to pivot this data so that:
- Each row represents a specific month
- Each city becomes its own column
- The temperature values fill the cells where a month and city intersect
For example, if the original data has a row with Jacksonville, January, 13
, after pivoting:
- There will be a row for
January
- There will be a column for
Jacksonville
- The cell at the intersection of the
January
row andJacksonville
column will contain13
The resulting DataFrame should have:
- The
month
as the index (row labels) - City names as column headers
- Temperature values in the cells
- Rows sorted alphabetically by month name
This is a common data reshaping operation that makes it easier to compare temperatures across different cities for the same months, as each month's data for all cities appears in a single row.
Intuition
When we look at the transformation needed, we're essentially reorganizing data from a "tall" structure to a "wide" structure. Each piece of information (city, month, temperature) needs to be repositioned based on its role in the new layout.
The key insight is recognizing that this is a classic pivot operation where:
- We want months to become our row identifiers
- We want cities to spread out and become column headers
- We want temperatures to fill in as the values
Think of it like creating a spreadsheet or table where you want to quickly compare values across different categories. Instead of scanning through multiple rows to find all temperatures for a given month, we want them all in one row.
The pandas pivot()
function is designed exactly for this type of reshaping. We need to tell it three things:
- What should become the rows? → The months should be our index (rows)
- What should become the columns? → The cities should spread out as columns
- What values should fill the table? → The temperature values should populate the cells
This maps directly to the parameters of pivot()
:
index='month'
- makes each unique month a rowcolumns='city'
- makes each unique city a columnvalues='temperature'
- fills the resulting table with temperature values
The function automatically handles the matching - it knows to put the temperature of Jacksonville in January at the intersection of the January row and Jacksonville column. The result is a more readable format where you can easily compare temperatures across cities for any given month by looking at a single row.
Solution Approach
The solution uses pandas' built-in pivot()
method to reshape the DataFrame. Here's how the implementation works:
def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
return weather.pivot(index='month', columns='city', values='temperature')
The pivot()
method takes three key parameters:
-
index='month'
: This parameter specifies which column should become the new index (row labels) of the pivoted DataFrame. Each unique value in the 'month' column becomes a row in the output. -
columns='city'
: This parameter determines which column's unique values should become the new column headers. Each unique city name becomes a column in the output. -
values='temperature'
: This parameter specifies which column contains the values that will populate the cells of the pivoted table. The temperature values are placed at the appropriate row-column intersections.
The algorithm internally:
- Groups the data by the unique combinations of index and column values
- Creates a new DataFrame structure with months as rows and cities as columns
- Maps each temperature value to its corresponding (month, city) position
- Automatically handles the sorting of the index alphabetically
For the given example:
- Input row:
(Jacksonville, January, 13)
- Output position: Row labeled 'January', Column labeled 'Jacksonville', Value = 13
The method handles all the reshaping automatically without needing explicit loops or manual mapping. The resulting DataFrame has a more intuitive structure for cross-city temperature comparisons, with each row representing all cities' temperatures for a particular month.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate how the pivot operation transforms the data.
Starting DataFrame (Long Format):
city month temperature 0 Boston January 0 1 Boston February 8 2 Austin January 20 3 Austin February 24
Step 1: Identify the pivot components
- Index (rows):
month
column → unique values are "January" and "February" - Columns:
city
column → unique values are "Boston" and "Austin" - Values:
temperature
column → these numbers will fill our new table
Step 2: Create the new structure The pivot operation creates a grid where:
- Rows are labeled with months
- Columns are labeled with cities
- Each cell needs to be filled with the appropriate temperature
Step 3: Map values to their positions
- Row "January" + Column "Boston" → temperature = 0
- Row "January" + Column "Austin" → temperature = 20
- Row "February" + Column "Boston" → temperature = 8
- Row "February" + Column "Austin" → temperature = 24
Step 4: Execute the pivot
result = weather.pivot(index='month', columns='city', values='temperature')
Final DataFrame (Wide Format):
city Austin Boston month February 24 8 January 20 0
Notice how:
- Each month now occupies exactly one row
- Each city has its own column
- Temperature values are placed at the correct intersections
- The months are automatically sorted alphabetically ("February" before "January")
- We can now easily compare temperatures across cities for the same month by reading horizontally
This transformation makes it much easier to answer questions like "What were all cities' temperatures in January?" - just look at the January row instead of filtering through multiple rows in the original format.
Solution Implementation
1import pandas as pd
2
3
4def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
5 """
6 Pivot the weather DataFrame to reshape it from long format to wide format.
7
8 Args:
9 weather: DataFrame containing weather data with columns 'month', 'city', and 'temperature'
10
11 Returns:
12 DataFrame with months as index, cities as columns, and temperature values in cells
13 """
14 # Reshape the DataFrame using pivot method
15 # - index='month': Each unique month becomes a row
16 # - columns='city': Each unique city becomes a column
17 # - values='temperature': Temperature values fill the cells
18 return weather.pivot(index='month', columns='city', values='temperature')
19
1import java.util.*;
2
3public class Solution {
4 /**
5 * Pivot the weather data to reshape it from long format to wide format.
6 *
7 * @param weather List of weather records containing month, city, and temperature data
8 * @return A pivoted data structure with months as rows, cities as columns, and temperatures as values
9 */
10 public static Map<String, Map<String, Double>> pivotTable(List<WeatherRecord> weather) {
11 // Create a nested map structure to represent the pivot table
12 // Outer map: month -> inner map
13 // Inner map: city -> temperature
14 Map<String, Map<String, Double>> pivotedData = new TreeMap<>();
15
16 // Collect all unique cities for column headers
17 Set<String> cities = new TreeSet<>();
18
19 // Process each weather record
20 for (WeatherRecord record : weather) {
21 String month = record.getMonth();
22 String city = record.getCity();
23 Double temperature = record.getTemperature();
24
25 // Add city to the set of unique cities
26 cities.add(city);
27
28 // Initialize inner map for this month if not exists
29 if (!pivotedData.containsKey(month)) {
30 pivotedData.put(month, new TreeMap<>());
31 }
32
33 // Add temperature value for this month-city combination
34 pivotedData.get(month).put(city, temperature);
35 }
36
37 // Ensure all months have entries for all cities (fill with null if missing)
38 for (String month : pivotedData.keySet()) {
39 Map<String, Double> cityTemperatures = pivotedData.get(month);
40 for (String city : cities) {
41 if (!cityTemperatures.containsKey(city)) {
42 cityTemperatures.put(city, null);
43 }
44 }
45 }
46
47 return pivotedData;
48 }
49
50 /**
51 * Helper class to represent a weather record
52 */
53 static class WeatherRecord {
54 private String month;
55 private String city;
56 private Double temperature;
57
58 public WeatherRecord(String month, String city, Double temperature) {
59 this.month = month;
60 this.city = city;
61 this.temperature = temperature;
62 }
63
64 public String getMonth() {
65 return month;
66 }
67
68 public String getCity() {
69 return city;
70 }
71
72 public Double getTemperature() {
73 return temperature;
74 }
75 }
76}
77
1#include <iostream>
2#include <vector>
3#include <map>
4#include <string>
5#include <set>
6#include <iomanip>
7
8// Structure to represent a weather record
9struct WeatherRecord {
10 std::string month;
11 std::string city;
12 double temperature;
13};
14
15// Structure to represent the pivoted table
16struct PivotTable {
17 std::vector<std::string> months; // Row indices (months)
18 std::vector<std::string> cities; // Column headers (cities)
19 std::map<std::pair<std::string, std::string>, double> data; // (month, city) -> temperature
20};
21
22/**
23 * Pivot the weather data to reshape it from long format to wide format.
24 *
25 * @param weather Vector containing weather data with month, city, and temperature
26 * @return PivotTable with months as rows, cities as columns, and temperature values in cells
27 */
28PivotTable pivotTable(const std::vector<WeatherRecord>& weather) {
29 PivotTable result;
30
31 // Use sets to collect unique months and cities while maintaining order
32 std::set<std::string> uniqueMonths;
33 std::set<std::string> uniqueCities;
34
35 // First pass: collect all unique months and cities
36 for (const auto& record : weather) {
37 uniqueMonths.insert(record.month);
38 uniqueCities.insert(record.city);
39 }
40
41 // Convert sets to vectors for ordered access
42 result.months = std::vector<std::string>(uniqueMonths.begin(), uniqueMonths.end());
43 result.cities = std::vector<std::string>(uniqueCities.begin(), uniqueCities.end());
44
45 // Second pass: populate the data map with temperature values
46 // Map key is (month, city) pair, value is temperature
47 for (const auto& record : weather) {
48 result.data[{record.month, record.city}] = record.temperature;
49 }
50
51 return result;
52}
53
54// Helper function to display the pivot table
55void displayPivotTable(const PivotTable& table) {
56 // Print header with city names
57 std::cout << std::setw(15) << "Month";
58 for (const auto& city : table.cities) {
59 std::cout << std::setw(15) << city;
60 }
61 std::cout << std::endl;
62
63 // Print each row with month and corresponding temperatures
64 for (const auto& month : table.months) {
65 std::cout << std::setw(15) << month;
66 for (const auto& city : table.cities) {
67 auto it = table.data.find({month, city});
68 if (it != table.data.end()) {
69 std::cout << std::setw(15) << it->second;
70 } else {
71 // If no data exists for this month-city combination, print N/A
72 std::cout << std::setw(15) << "N/A";
73 }
74 }
75 std::cout << std::endl;
76 }
77}
78
1// Import pandas equivalent library (assuming a pandas-like library exists for TypeScript)
2import * as pd from 'pandas-js';
3
4/**
5 * Pivot the weather DataFrame to reshape it from long format to wide format.
6 *
7 * @param weather - DataFrame containing weather data with columns 'month', 'city', and 'temperature'
8 * @returns DataFrame with months as index, cities as columns, and temperature values in cells
9 */
10function pivotTable(weather: pd.DataFrame): pd.DataFrame {
11 // Reshape the DataFrame using pivot method
12 // - index='month': Each unique month becomes a row
13 // - columns='city': Each unique city becomes a column
14 // - values='temperature': Temperature values fill the cells
15 return weather.pivot({
16 index: 'month',
17 columns: 'city',
18 values: 'temperature'
19 });
20}
21
Time and Space Complexity
Time Complexity: O(n log n)
where n
is the number of rows in the input DataFrame.
The pivot
operation in pandas involves:
- Grouping data by the index and column keys, which requires sorting operations internally:
O(n log n)
- Reshaping the data into the pivoted format:
O(n)
- The overall complexity is dominated by the sorting step
Space Complexity: O(n)
where n
is the number of rows in the input DataFrame.
The space complexity includes:
- The output pivoted DataFrame which stores all the temperature values:
O(n)
- Temporary data structures used during the pivot operation (hash maps for grouping, intermediate arrays):
O(n)
- The total space requirement is linear with respect to the input size
Note: In the worst case where all month-city combinations are unique, the resulting pivoted table would have dimensions of O(m × c)
where m
is the number of unique months and c
is the number of unique cities, but this still equals O(n)
total cells.
Common Pitfalls
1. Duplicate Index-Column Combinations
The most critical pitfall occurs when the DataFrame contains duplicate combinations of (month, city) pairs. The pivot()
method will raise a ValueError
if there are multiple temperature values for the same month-city combination, as it cannot determine which value to use in the pivoted cell.
Example of problematic data:
# This will cause an error weather = pd.DataFrame({ 'city': ['Jacksonville', 'Jacksonville', 'ElPaso'], 'month': ['January', 'January', 'January'], 'temperature': [13, 15, 5] # Two different temps for Jacksonville in January })
Solution:
Use pivot_table()
instead, which can handle duplicates by aggregating them:
def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
# Use pivot_table with an aggregation function (e.g., mean, first, last)
return weather.pivot_table(index='month', columns='city', values='temperature',
aggfunc='mean') # or 'first', 'last', etc.
2. Missing Data Handling
When not all cities have temperature data for every month, the pivot operation creates NaN (missing) values in the resulting DataFrame. This might cause issues in downstream processing.
Solution: Fill missing values with a default temperature or interpolate:
def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
result = weather.pivot(index='month', columns='city', values='temperature')
# Option 1: Fill with a default value
return result.fillna(0)
# Option 2: Forward fill or interpolate
# return result.fillna(method='ffill')
3. Incorrect Month Sorting
The pivoted DataFrame sorts the index alphabetically, which means months appear as: April, August, December, February, etc., rather than in chronological order.
Solution: Reindex with proper month ordering:
def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
result = weather.pivot(index='month', columns='city', values='temperature')
# Define correct month order
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
# Reindex to get chronological order (only include months present in data)
present_months = [m for m in month_order if m in result.index]
return result.reindex(present_months)
4. Data Type Issues
If the temperature column contains non-numeric values or mixed types, the pivot operation might succeed but produce unexpected results or make numerical operations difficult.
Solution: Ensure proper data type conversion before pivoting:
def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
# Convert temperature to numeric, handling errors
weather['temperature'] = pd.to_numeric(weather['temperature'], errors='coerce')
return weather.pivot(index='month', columns='city', values='temperature')
What are the most two important steps in writing a depth first search function? (Select 2)
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!