Facebook Pixel

2889. Reshape Data Pivot

Problem Description

This problem asks you to transform a DataFrame from a long format to a wide format using a pivot operation.

You're given a DataFrame called weather with three columns:

  • city: the name of a city (string/object type)
  • month: the name of a month (string/object type)
  • temperature: the temperature value (integer)

The initial data is in "long format" where each row represents a single temperature measurement for one city in one month.

Your task is to pivot this data so that:

  • Each row represents a specific month
  • Each city becomes its own column
  • The temperature values fill the cells where a month and city intersect

For example, if the original data has a row with Jacksonville, January, 13, after pivoting:

  • There will be a row for January
  • There will be a column for Jacksonville
  • The cell at the intersection of the January row and Jacksonville column will contain 13

The resulting DataFrame should have:

  • The month as the index (row labels)
  • City names as column headers
  • Temperature values in the cells
  • Rows sorted alphabetically by month name

This is a common data reshaping operation that makes it easier to compare temperatures across different cities for the same months, as each month's data for all cities appears in a single row.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When we look at the transformation needed, we're essentially reorganizing data from a "tall" structure to a "wide" structure. Each piece of information (city, month, temperature) needs to be repositioned based on its role in the new layout.

The key insight is recognizing that this is a classic pivot operation where:

  • We want months to become our row identifiers
  • We want cities to spread out and become column headers
  • We want temperatures to fill in as the values

Think of it like creating a spreadsheet or table where you want to quickly compare values across different categories. Instead of scanning through multiple rows to find all temperatures for a given month, we want them all in one row.

The pandas pivot() function is designed exactly for this type of reshaping. We need to tell it three things:

  1. What should become the rows? → The months should be our index (rows)
  2. What should become the columns? → The cities should spread out as columns
  3. What values should fill the table? → The temperature values should populate the cells

This maps directly to the parameters of pivot():

  • index='month' - makes each unique month a row
  • columns='city' - makes each unique city a column
  • values='temperature' - fills the resulting table with temperature values

The function automatically handles the matching - it knows to put the temperature of Jacksonville in January at the intersection of the January row and Jacksonville column. The result is a more readable format where you can easily compare temperatures across cities for any given month by looking at a single row.

Solution Approach

The solution uses pandas' built-in pivot() method to reshape the DataFrame. Here's how the implementation works:

def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
    return weather.pivot(index='month', columns='city', values='temperature')

The pivot() method takes three key parameters:

  1. index='month': This parameter specifies which column should become the new index (row labels) of the pivoted DataFrame. Each unique value in the 'month' column becomes a row in the output.

  2. columns='city': This parameter determines which column's unique values should become the new column headers. Each unique city name becomes a column in the output.

  3. values='temperature': This parameter specifies which column contains the values that will populate the cells of the pivoted table. The temperature values are placed at the appropriate row-column intersections.

The algorithm internally:

  • Groups the data by the unique combinations of index and column values
  • Creates a new DataFrame structure with months as rows and cities as columns
  • Maps each temperature value to its corresponding (month, city) position
  • Automatically handles the sorting of the index alphabetically

For the given example:

  • Input row: (Jacksonville, January, 13)
  • Output position: Row labeled 'January', Column labeled 'Jacksonville', Value = 13

The method handles all the reshaping automatically without needing explicit loops or manual mapping. The resulting DataFrame has a more intuitive structure for cross-city temperature comparisons, with each row representing all cities' temperatures for a particular month.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate how the pivot operation transforms the data.

Starting DataFrame (Long Format):

       city      month  temperature
0    Boston    January           0
1    Boston   February           8
2    Austin    January          20
3    Austin   February          24

Step 1: Identify the pivot components

  • Index (rows): month column → unique values are "January" and "February"
  • Columns: city column → unique values are "Boston" and "Austin"
  • Values: temperature column → these numbers will fill our new table

Step 2: Create the new structure The pivot operation creates a grid where:

  • Rows are labeled with months
  • Columns are labeled with cities
  • Each cell needs to be filled with the appropriate temperature

Step 3: Map values to their positions

  • Row "January" + Column "Boston" → temperature = 0
  • Row "January" + Column "Austin" → temperature = 20
  • Row "February" + Column "Boston" → temperature = 8
  • Row "February" + Column "Austin" → temperature = 24

Step 4: Execute the pivot

result = weather.pivot(index='month', columns='city', values='temperature')

Final DataFrame (Wide Format):

city       Austin  Boston
month                    
February       24       8
January        20       0

Notice how:

  • Each month now occupies exactly one row
  • Each city has its own column
  • Temperature values are placed at the correct intersections
  • The months are automatically sorted alphabetically ("February" before "January")
  • We can now easily compare temperatures across cities for the same month by reading horizontally

This transformation makes it much easier to answer questions like "What were all cities' temperatures in January?" - just look at the January row instead of filtering through multiple rows in the original format.

Solution Implementation

1import pandas as pd
2
3
4def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
5    """
6    Pivot the weather DataFrame to reshape it from long format to wide format.
7  
8    Args:
9        weather: DataFrame containing weather data with columns 'month', 'city', and 'temperature'
10  
11    Returns:
12        DataFrame with months as index, cities as columns, and temperature values in cells
13    """
14    # Reshape the DataFrame using pivot method
15    # - index='month': Each unique month becomes a row
16    # - columns='city': Each unique city becomes a column
17    # - values='temperature': Temperature values fill the cells
18    return weather.pivot(index='month', columns='city', values='temperature')
19
1import java.util.*;
2
3public class Solution {
4    /**
5     * Pivot the weather data to reshape it from long format to wide format.
6     * 
7     * @param weather List of weather records containing month, city, and temperature data
8     * @return A pivoted data structure with months as rows, cities as columns, and temperatures as values
9     */
10    public static Map<String, Map<String, Double>> pivotTable(List<WeatherRecord> weather) {
11        // Create a nested map structure to represent the pivot table
12        // Outer map: month -> inner map
13        // Inner map: city -> temperature
14        Map<String, Map<String, Double>> pivotedData = new TreeMap<>();
15      
16        // Collect all unique cities for column headers
17        Set<String> cities = new TreeSet<>();
18      
19        // Process each weather record
20        for (WeatherRecord record : weather) {
21            String month = record.getMonth();
22            String city = record.getCity();
23            Double temperature = record.getTemperature();
24          
25            // Add city to the set of unique cities
26            cities.add(city);
27          
28            // Initialize inner map for this month if not exists
29            if (!pivotedData.containsKey(month)) {
30                pivotedData.put(month, new TreeMap<>());
31            }
32          
33            // Add temperature value for this month-city combination
34            pivotedData.get(month).put(city, temperature);
35        }
36      
37        // Ensure all months have entries for all cities (fill with null if missing)
38        for (String month : pivotedData.keySet()) {
39            Map<String, Double> cityTemperatures = pivotedData.get(month);
40            for (String city : cities) {
41                if (!cityTemperatures.containsKey(city)) {
42                    cityTemperatures.put(city, null);
43                }
44            }
45        }
46      
47        return pivotedData;
48    }
49  
50    /**
51     * Helper class to represent a weather record
52     */
53    static class WeatherRecord {
54        private String month;
55        private String city;
56        private Double temperature;
57      
58        public WeatherRecord(String month, String city, Double temperature) {
59            this.month = month;
60            this.city = city;
61            this.temperature = temperature;
62        }
63      
64        public String getMonth() {
65            return month;
66        }
67      
68        public String getCity() {
69            return city;
70        }
71      
72        public Double getTemperature() {
73            return temperature;
74        }
75    }
76}
77
1#include <iostream>
2#include <vector>
3#include <map>
4#include <string>
5#include <set>
6#include <iomanip>
7
8// Structure to represent a weather record
9struct WeatherRecord {
10    std::string month;
11    std::string city;
12    double temperature;
13};
14
15// Structure to represent the pivoted table
16struct PivotTable {
17    std::vector<std::string> months;        // Row indices (months)
18    std::vector<std::string> cities;        // Column headers (cities)
19    std::map<std::pair<std::string, std::string>, double> data;  // (month, city) -> temperature
20};
21
22/**
23 * Pivot the weather data to reshape it from long format to wide format.
24 * 
25 * @param weather Vector containing weather data with month, city, and temperature
26 * @return PivotTable with months as rows, cities as columns, and temperature values in cells
27 */
28PivotTable pivotTable(const std::vector<WeatherRecord>& weather) {
29    PivotTable result;
30  
31    // Use sets to collect unique months and cities while maintaining order
32    std::set<std::string> uniqueMonths;
33    std::set<std::string> uniqueCities;
34  
35    // First pass: collect all unique months and cities
36    for (const auto& record : weather) {
37        uniqueMonths.insert(record.month);
38        uniqueCities.insert(record.city);
39    }
40  
41    // Convert sets to vectors for ordered access
42    result.months = std::vector<std::string>(uniqueMonths.begin(), uniqueMonths.end());
43    result.cities = std::vector<std::string>(uniqueCities.begin(), uniqueCities.end());
44  
45    // Second pass: populate the data map with temperature values
46    // Map key is (month, city) pair, value is temperature
47    for (const auto& record : weather) {
48        result.data[{record.month, record.city}] = record.temperature;
49    }
50  
51    return result;
52}
53
54// Helper function to display the pivot table
55void displayPivotTable(const PivotTable& table) {
56    // Print header with city names
57    std::cout << std::setw(15) << "Month";
58    for (const auto& city : table.cities) {
59        std::cout << std::setw(15) << city;
60    }
61    std::cout << std::endl;
62  
63    // Print each row with month and corresponding temperatures
64    for (const auto& month : table.months) {
65        std::cout << std::setw(15) << month;
66        for (const auto& city : table.cities) {
67            auto it = table.data.find({month, city});
68            if (it != table.data.end()) {
69                std::cout << std::setw(15) << it->second;
70            } else {
71                // If no data exists for this month-city combination, print N/A
72                std::cout << std::setw(15) << "N/A";
73            }
74        }
75        std::cout << std::endl;
76    }
77}
78
1// Import pandas equivalent library (assuming a pandas-like library exists for TypeScript)
2import * as pd from 'pandas-js';
3
4/**
5 * Pivot the weather DataFrame to reshape it from long format to wide format.
6 * 
7 * @param weather - DataFrame containing weather data with columns 'month', 'city', and 'temperature'
8 * @returns DataFrame with months as index, cities as columns, and temperature values in cells
9 */
10function pivotTable(weather: pd.DataFrame): pd.DataFrame {
11    // Reshape the DataFrame using pivot method
12    // - index='month': Each unique month becomes a row
13    // - columns='city': Each unique city becomes a column  
14    // - values='temperature': Temperature values fill the cells
15    return weather.pivot({
16        index: 'month',
17        columns: 'city', 
18        values: 'temperature'
19    });
20}
21

Time and Space Complexity

Time Complexity: O(n log n) where n is the number of rows in the input DataFrame.

The pivot operation in pandas involves:

  • Grouping data by the index and column keys, which requires sorting operations internally: O(n log n)
  • Reshaping the data into the pivoted format: O(n)
  • The overall complexity is dominated by the sorting step

Space Complexity: O(n) where n is the number of rows in the input DataFrame.

The space complexity includes:

  • The output pivoted DataFrame which stores all the temperature values: O(n)
  • Temporary data structures used during the pivot operation (hash maps for grouping, intermediate arrays): O(n)
  • The total space requirement is linear with respect to the input size

Note: In the worst case where all month-city combinations are unique, the resulting pivoted table would have dimensions of O(m × c) where m is the number of unique months and c is the number of unique cities, but this still equals O(n) total cells.

Common Pitfalls

1. Duplicate Index-Column Combinations

The most critical pitfall occurs when the DataFrame contains duplicate combinations of (month, city) pairs. The pivot() method will raise a ValueError if there are multiple temperature values for the same month-city combination, as it cannot determine which value to use in the pivoted cell.

Example of problematic data:

# This will cause an error
weather = pd.DataFrame({
    'city': ['Jacksonville', 'Jacksonville', 'ElPaso'],
    'month': ['January', 'January', 'January'],
    'temperature': [13, 15, 5]  # Two different temps for Jacksonville in January
})

Solution: Use pivot_table() instead, which can handle duplicates by aggregating them:

def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
    # Use pivot_table with an aggregation function (e.g., mean, first, last)
    return weather.pivot_table(index='month', columns='city', values='temperature', 
                              aggfunc='mean')  # or 'first', 'last', etc.

2. Missing Data Handling

When not all cities have temperature data for every month, the pivot operation creates NaN (missing) values in the resulting DataFrame. This might cause issues in downstream processing.

Solution: Fill missing values with a default temperature or interpolate:

def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
    result = weather.pivot(index='month', columns='city', values='temperature')
    # Option 1: Fill with a default value
    return result.fillna(0)
    # Option 2: Forward fill or interpolate
    # return result.fillna(method='ffill')

3. Incorrect Month Sorting

The pivoted DataFrame sorts the index alphabetically, which means months appear as: April, August, December, February, etc., rather than in chronological order.

Solution: Reindex with proper month ordering:

def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
    result = weather.pivot(index='month', columns='city', values='temperature')
    # Define correct month order
    month_order = ['January', 'February', 'March', 'April', 'May', 'June',
                   'July', 'August', 'September', 'October', 'November', 'December']
    # Reindex to get chronological order (only include months present in data)
    present_months = [m for m in month_order if m in result.index]
    return result.reindex(present_months)

4. Data Type Issues

If the temperature column contains non-numeric values or mixed types, the pivot operation might succeed but produce unexpected results or make numerical operations difficult.

Solution: Ensure proper data type conversion before pivoting:

def pivotTable(weather: pd.DataFrame) -> pd.DataFrame:
    # Convert temperature to numeric, handling errors
    weather['temperature'] = pd.to_numeric(weather['temperature'], errors='coerce')
    return weather.pivot(index='month', columns='city', values='temperature')
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

What are the most two important steps in writing a depth first search function? (Select 2)


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More