Facebook Pixel

2888. Reshape Data Concatenate

Problem Description

This problem asks you to combine two DataFrames vertically into a single DataFrame. You are given two DataFrames, df1 and df2, that have identical column structures:

  • student_id: integer type
  • name: object type (string)
  • age: integer type

The task is to stack df2 below df1 to create one unified DataFrame containing all rows from both DataFrames. This operation is known as vertical concatenation.

The solution uses pd.concat() with ignore_index=True to:

  1. Combine the two DataFrames by placing all rows from df2 after all rows from df1
  2. Reset the index to create a new continuous index sequence (0, 1, 2, ...) for the combined DataFrame

For example, if df1 contains 3 students and df2 contains 2 students, the resulting DataFrame would contain all 5 students with their information preserved and a new index running from 0 to 4.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When we need to combine two DataFrames that have the same structure, we think about stacking them together like building blocks. Since both DataFrames have identical columns (student_id, name, and age), we can simply place one on top of the other.

The natural approach in pandas for this operation is pd.concat(), which is designed specifically for combining DataFrames along a particular axis. By default, concat() works along axis 0 (rows), which is exactly what we need for vertical concatenation.

The key consideration is what to do with the index values. If we keep the original indices from both DataFrames, we might end up with duplicate index values (both DataFrames could have indices 0, 1, 2, etc.). This could cause confusion when accessing rows later. Therefore, we use ignore_index=True to tell pandas to discard the original indices and create a fresh, continuous sequence starting from 0.

This approach ensures that:

  • All data from both DataFrames is preserved
  • The columns remain aligned since they're identical
  • We get a clean, sequential index for easy row access
  • The operation is performed efficiently using pandas' built-in functionality

The simplicity of using pd.concat([df1, df2], ignore_index=True) makes it the most straightforward and readable solution for this vertical concatenation task.

Solution Approach

The implementation uses pandas' concat() function to vertically combine the two DataFrames. Here's how the solution works:

  1. Function Definition: The function concatenateTables() takes two parameters - df1 and df2, both of type pd.DataFrame, and returns a single concatenated DataFrame.

  2. Using pd.concat(): The core operation is performed with pd.concat([df1, df2], ignore_index=True). Let's break down the parameters:

    • First parameter [df1, df2]: A list containing the DataFrames to concatenate in order. The order matters - df1 rows will appear first, followed by df2 rows.
    • ignore_index=True: This parameter tells pandas to reset the index of the resulting DataFrame to a default integer index (0, 1, 2, ...). Without this, the original indices would be preserved, potentially causing duplicate index values.
  3. Vertical Concatenation: By default, pd.concat() operates along axis=0 (rows), which means it stacks the DataFrames vertically. Since we don't specify the axis parameter, it uses this default behavior.

  4. Return Value: The function directly returns the concatenated result without any additional processing, as pd.concat() handles all the necessary operations internally.

The algorithm's time complexity is O(n + m) where n and m are the number of rows in df1 and df2 respectively, as it needs to copy all rows into the new DataFrame. The space complexity is also O(n + m) for storing the combined result.

This approach is optimal because:

  • It leverages pandas' optimized internal implementation
  • It handles the index management automatically
  • It preserves the column structure and data types
  • It's a single-line solution that's both efficient and readable

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to understand how the solution works.

Given Input:

DataFrame df1:

   student_id    name  age
0         101   Alice   20
1         102     Bob   21

DataFrame df2:

   student_id    name  age
0         201  Charlie   19
1         202    Diana   22
2         203     Eve   20

Step-by-step Process:

  1. Initial State: We have two separate DataFrames with identical column structures. Notice that both DataFrames have their own index starting from 0.

  2. Calling pd.concat([df1, df2]): When we pass the list [df1, df2] to pd.concat(), it prepares to stack them vertically. The order in the list determines the order in the result - df1 first, then df2.

  3. Without ignore_index (for comparison): If we didn't use ignore_index=True, the result would look like:

       student_id    name  age
    0         101   Alice   20
    1         102     Bob   21
    0         201  Charlie   19  # Notice duplicate index 0
    1         202    Diana   22  # Notice duplicate index 1
    2         203     Eve   20

    This has duplicate indices (0 and 1 appear twice), which could cause issues.

  4. With ignore_index=True: The parameter tells pandas to discard the original indices and create a new sequential index:

       student_id    name  age
    0         101   Alice   20
    1         102     Bob   21
    2         201  Charlie   19  # New index: 2
    3         202    Diana   22  # New index: 3
    4         203     Eve   20  # New index: 4
  5. Final Result: The function returns this combined DataFrame with:

    • All 5 rows preserved (2 from df1 + 3 from df2)
    • Original data intact
    • Clean, sequential index from 0 to 4
    • Same column structure as the input DataFrames

The entire operation completes in a single line: return pd.concat([df1, df2], ignore_index=True), making it both efficient and elegant.

Solution Implementation

1import pandas as pd
2
3
4def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
5    """
6    Concatenates two DataFrames vertically (row-wise).
7  
8    Args:
9        df1: First DataFrame to concatenate
10        df2: Second DataFrame to concatenate
11  
12    Returns:
13        A new DataFrame containing all rows from both input DataFrames
14        with reset index starting from 0
15    """
16    # Concatenate the two DataFrames vertically
17    # ignore_index=True resets the index to 0, 1, 2, ... instead of preserving original indices
18    result = pd.concat([df1, df2], ignore_index=True)
19  
20    return result
21
1import java.util.ArrayList;
2import java.util.List;
3
4public class DataFrameConcatenator {
5  
6    /**
7     * Concatenates two DataFrames vertically (row-wise).
8     * 
9     * @param df1 First DataFrame to concatenate
10     * @param df2 Second DataFrame to concatenate
11     * @return A new DataFrame containing all rows from both input DataFrames
12     *         with reset index starting from 0
13     */
14    public static DataFrame concatenateTables(DataFrame df1, DataFrame df2) {
15        // Create a new DataFrame to store the concatenated result
16        DataFrame result = new DataFrame();
17      
18        // Copy all rows from the first DataFrame
19        for (int i = 0; i < df1.getRowCount(); i++) {
20            result.addRow(df1.getRow(i));
21        }
22      
23        // Append all rows from the second DataFrame
24        for (int i = 0; i < df2.getRowCount(); i++) {
25            result.addRow(df2.getRow(i));
26        }
27      
28        // The index is automatically reset starting from 0 when adding rows
29        // This is equivalent to ignore_index=True in pandas
30      
31        return result;
32    }
33  
34    /**
35     * Alternative implementation using Java Lists for better performance
36     */
37    public static DataFrame concatenateTablesOptimized(DataFrame df1, DataFrame df2) {
38        // Get all rows from both DataFrames
39        List<Row> allRows = new ArrayList<>();
40      
41        // Add all rows from first DataFrame
42        allRows.addAll(df1.getAllRows());
43      
44        // Add all rows from second DataFrame
45        allRows.addAll(df2.getAllRows());
46      
47        // Create new DataFrame with concatenated rows
48        // Index will be automatically reset from 0
49        return new DataFrame(allRows);
50    }
51}
52
1#include <vector>
2#include <algorithm>
3#include <unordered_map>
4#include <string>
5
6// Note: C++ doesn't have a built-in DataFrame equivalent like pandas
7// This implementation assumes a simplified DataFrame structure using vectors
8// In practice, you might use libraries like DuckDB, Apache Arrow, or custom implementations
9
10template<typename T>
11class DataFrame {
12public:
13    std::vector<std::string> columns;
14    std::vector<std::vector<T>> data;
15  
16    // Constructor
17    DataFrame() {}
18  
19    // Get number of rows
20    size_t size() const {
21        return data.empty() ? 0 : data[0].size();
22    }
23  
24    // Get number of columns
25    size_t numColumns() const {
26        return columns.size();
27    }
28};
29
30template<typename T>
31DataFrame<T> concatenateTables(const DataFrame<T>& df1, const DataFrame<T>& df2) {
32    /**
33     * Concatenates two DataFrames vertically (row-wise).
34     * 
35     * @param df1: First DataFrame to concatenate
36     * @param df2: Second DataFrame to concatenate
37     * 
38     * @return: A new DataFrame containing all rows from both input DataFrames
39     *          with reset index starting from 0
40     */
41  
42    // Create result DataFrame
43    DataFrame<T> result;
44  
45    // Copy column names from first DataFrame
46    // Assumes both DataFrames have the same columns
47    result.columns = df1.columns;
48  
49    // Initialize data vectors for each column
50    result.data.resize(df1.numColumns());
51  
52    // Concatenate data from both DataFrames
53    for (size_t col = 0; col < df1.numColumns(); ++col) {
54        // Reserve space for efficiency
55        result.data[col].reserve(df1.data[col].size() + df2.data[col].size());
56      
57        // Copy all rows from df1 for current column
58        result.data[col].insert(result.data[col].end(), 
59                               df1.data[col].begin(), 
60                               df1.data[col].end());
61      
62        // Append all rows from df2 for current column
63        result.data[col].insert(result.data[col].end(), 
64                               df2.data[col].begin(), 
65                               df2.data[col].end());
66    }
67  
68    // Index is implicitly reset as we're using vector indices (0, 1, 2, ...)
69    // No explicit index reset needed in this implementation
70  
71    return result;
72}
73
1// Import statement would be handled differently in TypeScript
2// TypeScript doesn't have pandas, but we'll represent the structure
3
4interface DataFrame {
5    // DataFrame interface properties would go here
6    [key: string]: any;
7}
8
9/**
10 * Concatenates two DataFrames vertically (row-wise).
11 * 
12 * @param df1 - First DataFrame to concatenate
13 * @param df2 - Second DataFrame to concatenate
14 * @returns A new DataFrame containing all rows from both input DataFrames
15 *          with reset index starting from 0
16 */
17function concatenateTables(df1: DataFrame, df2: DataFrame): DataFrame {
18    // Concatenate the two DataFrames vertically
19    // In TypeScript/JavaScript, we would typically use array operations
20    // or a library like Danfo.js for DataFrame operations
21  
22    // Since pandas.concat doesn't exist in TypeScript, this represents
23    // the conceptual operation of concatenating DataFrames
24    // ignore_index=True equivalent would reset the index to 0, 1, 2, ...
25    const result: DataFrame = concat([df1, df2], { ignoreIndex: true });
26  
27    return result;
28}
29
30/**
31 * Helper function to represent pandas.concat functionality
32 * This would be implemented using appropriate TypeScript DataFrame library
33 */
34declare function concat(dataFrames: DataFrame[], options: { ignoreIndex: boolean }): DataFrame;
35

Time and Space Complexity

Time Complexity: O(n + m) where n is the number of rows in df1 and m is the number of rows in df2. The pd.concat() function needs to iterate through all rows in both dataframes to create the concatenated result. Since ignore_index=True is specified, it also needs to generate new sequential indices for all n + m rows.

Space Complexity: O(n + m) where n is the total size (rows × columns) of df1 and m is the total size of df2. The function creates a new DataFrame that contains copies of all the data from both input DataFrames. The original DataFrames remain in memory during the operation, but the primary space cost is the new concatenated DataFrame containing all n + m rows.

Common Pitfalls

1. Not Resetting the Index (Missing ignore_index=True)

One of the most common mistakes is forgetting to use ignore_index=True when concatenating DataFrames. This can lead to duplicate index values in the resulting DataFrame.

Problem Example:

# Without ignore_index=True
result = pd.concat([df1, df2])  # WRONG!

If df1 has indices [0, 1, 2] and df2 also has indices [0, 1], the result will have duplicate index values [0, 1, 2, 0, 1], which can cause issues when:

  • Trying to access rows by index
  • Performing operations that require unique indices
  • Exporting or further processing the data

Solution: Always include ignore_index=True when you want a clean, sequential index:

result = pd.concat([df1, df2], ignore_index=True)  # CORRECT!

2. Assuming Column Alignment Without Verification

While the problem states that both DataFrames have identical column structures, in real-world scenarios, developers often concatenate DataFrames without verifying column compatibility.

Problem Example:

# If df1 has columns ['student_id', 'name', 'age']
# but df2 has columns ['id', 'student_name', 'age']
result = pd.concat([df1, df2], ignore_index=True)
# This creates NaN values for mismatched columns!

Solution: Verify column names match before concatenation or rename columns as needed:

# Check if columns match
if list(df1.columns) != list(df2.columns):
    # Handle the mismatch appropriately
    df2.columns = df1.columns  # If structure is same but names differ
  
result = pd.concat([df1, df2], ignore_index=True)

3. Modifying the Original DataFrames

Some developers might think they need to modify the original DataFrames or that concat() modifies them in-place.

Problem Example:

def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
    df1 = pd.concat([df1, df2], ignore_index=True)  # Reassigning parameter
    return df1  # This works but is misleading and bad practice

Solution: pd.concat() returns a new DataFrame without modifying the originals. Use a new variable:

def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
    result = pd.concat([df1, df2], ignore_index=True)
    return result

4. Using the Wrong Axis for Concatenation

Although the default axis=0 is correct for vertical concatenation, explicitly specifying it can prevent confusion, especially when maintaining code.

Problem Example:

# Someone might accidentally use axis=1 thinking it means "first dimension"
result = pd.concat([df1, df2], axis=1, ignore_index=True)  # WRONG! This concatenates horizontally

Solution: Either rely on the default (axis=0) or explicitly specify it for clarity:

result = pd.concat([df1, df2], axis=0, ignore_index=True)  # Explicit vertical concatenation
# or simply
result = pd.concat([df1, df2], ignore_index=True)  # Default is axis=0
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following array represent a max heap?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More