2888. Reshape Data Concatenate
Problem Description
This problem asks you to combine two DataFrames vertically into a single DataFrame. You are given two DataFrames, df1
and df2
, that have identical column structures:
student_id
: integer typename
: object type (string)age
: integer type
The task is to stack df2
below df1
to create one unified DataFrame containing all rows from both DataFrames. This operation is known as vertical concatenation.
The solution uses pd.concat()
with ignore_index=True
to:
- Combine the two DataFrames by placing all rows from
df2
after all rows fromdf1
- Reset the index to create a new continuous index sequence (0, 1, 2, ...) for the combined DataFrame
For example, if df1
contains 3 students and df2
contains 2 students, the resulting DataFrame would contain all 5 students with their information preserved and a new index running from 0 to 4.
Intuition
When we need to combine two DataFrames that have the same structure, we think about stacking them together like building blocks. Since both DataFrames have identical columns (student_id
, name
, and age
), we can simply place one on top of the other.
The natural approach in pandas for this operation is pd.concat()
, which is designed specifically for combining DataFrames along a particular axis. By default, concat()
works along axis 0 (rows), which is exactly what we need for vertical concatenation.
The key consideration is what to do with the index values. If we keep the original indices from both DataFrames, we might end up with duplicate index values (both DataFrames could have indices 0, 1, 2, etc.). This could cause confusion when accessing rows later. Therefore, we use ignore_index=True
to tell pandas to discard the original indices and create a fresh, continuous sequence starting from 0.
This approach ensures that:
- All data from both DataFrames is preserved
- The columns remain aligned since they're identical
- We get a clean, sequential index for easy row access
- The operation is performed efficiently using pandas' built-in functionality
The simplicity of using pd.concat([df1, df2], ignore_index=True)
makes it the most straightforward and readable solution for this vertical concatenation task.
Solution Approach
The implementation uses pandas' concat()
function to vertically combine the two DataFrames. Here's how the solution works:
-
Function Definition: The function
concatenateTables()
takes two parameters -df1
anddf2
, both of typepd.DataFrame
, and returns a single concatenated DataFrame. -
Using pd.concat(): The core operation is performed with
pd.concat([df1, df2], ignore_index=True)
. Let's break down the parameters:- First parameter
[df1, df2]
: A list containing the DataFrames to concatenate in order. The order matters -df1
rows will appear first, followed bydf2
rows. ignore_index=True
: This parameter tells pandas to reset the index of the resulting DataFrame to a default integer index (0, 1, 2, ...). Without this, the original indices would be preserved, potentially causing duplicate index values.
- First parameter
-
Vertical Concatenation: By default,
pd.concat()
operates alongaxis=0
(rows), which means it stacks the DataFrames vertically. Since we don't specify the axis parameter, it uses this default behavior. -
Return Value: The function directly returns the concatenated result without any additional processing, as
pd.concat()
handles all the necessary operations internally.
The algorithm's time complexity is O(n + m) where n and m are the number of rows in df1
and df2
respectively, as it needs to copy all rows into the new DataFrame. The space complexity is also O(n + m) for storing the combined result.
This approach is optimal because:
- It leverages pandas' optimized internal implementation
- It handles the index management automatically
- It preserves the column structure and data types
- It's a single-line solution that's both efficient and readable
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to understand how the solution works.
Given Input:
DataFrame df1
:
student_id name age 0 101 Alice 20 1 102 Bob 21
DataFrame df2
:
student_id name age 0 201 Charlie 19 1 202 Diana 22 2 203 Eve 20
Step-by-step Process:
-
Initial State: We have two separate DataFrames with identical column structures. Notice that both DataFrames have their own index starting from 0.
-
Calling pd.concat([df1, df2]): When we pass the list
[df1, df2]
topd.concat()
, it prepares to stack them vertically. The order in the list determines the order in the result -df1
first, thendf2
. -
Without ignore_index (for comparison): If we didn't use
ignore_index=True
, the result would look like:student_id name age 0 101 Alice 20 1 102 Bob 21 0 201 Charlie 19 # Notice duplicate index 0 1 202 Diana 22 # Notice duplicate index 1 2 203 Eve 20
This has duplicate indices (0 and 1 appear twice), which could cause issues.
-
With ignore_index=True: The parameter tells pandas to discard the original indices and create a new sequential index:
student_id name age 0 101 Alice 20 1 102 Bob 21 2 201 Charlie 19 # New index: 2 3 202 Diana 22 # New index: 3 4 203 Eve 20 # New index: 4
-
Final Result: The function returns this combined DataFrame with:
- All 5 rows preserved (2 from
df1
+ 3 fromdf2
) - Original data intact
- Clean, sequential index from 0 to 4
- Same column structure as the input DataFrames
- All 5 rows preserved (2 from
The entire operation completes in a single line: return pd.concat([df1, df2], ignore_index=True)
, making it both efficient and elegant.
Solution Implementation
1import pandas as pd
2
3
4def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
5 """
6 Concatenates two DataFrames vertically (row-wise).
7
8 Args:
9 df1: First DataFrame to concatenate
10 df2: Second DataFrame to concatenate
11
12 Returns:
13 A new DataFrame containing all rows from both input DataFrames
14 with reset index starting from 0
15 """
16 # Concatenate the two DataFrames vertically
17 # ignore_index=True resets the index to 0, 1, 2, ... instead of preserving original indices
18 result = pd.concat([df1, df2], ignore_index=True)
19
20 return result
21
1import java.util.ArrayList;
2import java.util.List;
3
4public class DataFrameConcatenator {
5
6 /**
7 * Concatenates two DataFrames vertically (row-wise).
8 *
9 * @param df1 First DataFrame to concatenate
10 * @param df2 Second DataFrame to concatenate
11 * @return A new DataFrame containing all rows from both input DataFrames
12 * with reset index starting from 0
13 */
14 public static DataFrame concatenateTables(DataFrame df1, DataFrame df2) {
15 // Create a new DataFrame to store the concatenated result
16 DataFrame result = new DataFrame();
17
18 // Copy all rows from the first DataFrame
19 for (int i = 0; i < df1.getRowCount(); i++) {
20 result.addRow(df1.getRow(i));
21 }
22
23 // Append all rows from the second DataFrame
24 for (int i = 0; i < df2.getRowCount(); i++) {
25 result.addRow(df2.getRow(i));
26 }
27
28 // The index is automatically reset starting from 0 when adding rows
29 // This is equivalent to ignore_index=True in pandas
30
31 return result;
32 }
33
34 /**
35 * Alternative implementation using Java Lists for better performance
36 */
37 public static DataFrame concatenateTablesOptimized(DataFrame df1, DataFrame df2) {
38 // Get all rows from both DataFrames
39 List<Row> allRows = new ArrayList<>();
40
41 // Add all rows from first DataFrame
42 allRows.addAll(df1.getAllRows());
43
44 // Add all rows from second DataFrame
45 allRows.addAll(df2.getAllRows());
46
47 // Create new DataFrame with concatenated rows
48 // Index will be automatically reset from 0
49 return new DataFrame(allRows);
50 }
51}
52
1#include <vector>
2#include <algorithm>
3#include <unordered_map>
4#include <string>
5
6// Note: C++ doesn't have a built-in DataFrame equivalent like pandas
7// This implementation assumes a simplified DataFrame structure using vectors
8// In practice, you might use libraries like DuckDB, Apache Arrow, or custom implementations
9
10template<typename T>
11class DataFrame {
12public:
13 std::vector<std::string> columns;
14 std::vector<std::vector<T>> data;
15
16 // Constructor
17 DataFrame() {}
18
19 // Get number of rows
20 size_t size() const {
21 return data.empty() ? 0 : data[0].size();
22 }
23
24 // Get number of columns
25 size_t numColumns() const {
26 return columns.size();
27 }
28};
29
30template<typename T>
31DataFrame<T> concatenateTables(const DataFrame<T>& df1, const DataFrame<T>& df2) {
32 /**
33 * Concatenates two DataFrames vertically (row-wise).
34 *
35 * @param df1: First DataFrame to concatenate
36 * @param df2: Second DataFrame to concatenate
37 *
38 * @return: A new DataFrame containing all rows from both input DataFrames
39 * with reset index starting from 0
40 */
41
42 // Create result DataFrame
43 DataFrame<T> result;
44
45 // Copy column names from first DataFrame
46 // Assumes both DataFrames have the same columns
47 result.columns = df1.columns;
48
49 // Initialize data vectors for each column
50 result.data.resize(df1.numColumns());
51
52 // Concatenate data from both DataFrames
53 for (size_t col = 0; col < df1.numColumns(); ++col) {
54 // Reserve space for efficiency
55 result.data[col].reserve(df1.data[col].size() + df2.data[col].size());
56
57 // Copy all rows from df1 for current column
58 result.data[col].insert(result.data[col].end(),
59 df1.data[col].begin(),
60 df1.data[col].end());
61
62 // Append all rows from df2 for current column
63 result.data[col].insert(result.data[col].end(),
64 df2.data[col].begin(),
65 df2.data[col].end());
66 }
67
68 // Index is implicitly reset as we're using vector indices (0, 1, 2, ...)
69 // No explicit index reset needed in this implementation
70
71 return result;
72}
73
1// Import statement would be handled differently in TypeScript
2// TypeScript doesn't have pandas, but we'll represent the structure
3
4interface DataFrame {
5 // DataFrame interface properties would go here
6 [key: string]: any;
7}
8
9/**
10 * Concatenates two DataFrames vertically (row-wise).
11 *
12 * @param df1 - First DataFrame to concatenate
13 * @param df2 - Second DataFrame to concatenate
14 * @returns A new DataFrame containing all rows from both input DataFrames
15 * with reset index starting from 0
16 */
17function concatenateTables(df1: DataFrame, df2: DataFrame): DataFrame {
18 // Concatenate the two DataFrames vertically
19 // In TypeScript/JavaScript, we would typically use array operations
20 // or a library like Danfo.js for DataFrame operations
21
22 // Since pandas.concat doesn't exist in TypeScript, this represents
23 // the conceptual operation of concatenating DataFrames
24 // ignore_index=True equivalent would reset the index to 0, 1, 2, ...
25 const result: DataFrame = concat([df1, df2], { ignoreIndex: true });
26
27 return result;
28}
29
30/**
31 * Helper function to represent pandas.concat functionality
32 * This would be implemented using appropriate TypeScript DataFrame library
33 */
34declare function concat(dataFrames: DataFrame[], options: { ignoreIndex: boolean }): DataFrame;
35
Time and Space Complexity
Time Complexity: O(n + m)
where n
is the number of rows in df1
and m
is the number of rows in df2
. The pd.concat()
function needs to iterate through all rows in both dataframes to create the concatenated result. Since ignore_index=True
is specified, it also needs to generate new sequential indices for all n + m
rows.
Space Complexity: O(n + m)
where n
is the total size (rows × columns) of df1
and m
is the total size of df2
. The function creates a new DataFrame that contains copies of all the data from both input DataFrames. The original DataFrames remain in memory during the operation, but the primary space cost is the new concatenated DataFrame containing all n + m
rows.
Common Pitfalls
1. Not Resetting the Index (Missing ignore_index=True
)
One of the most common mistakes is forgetting to use ignore_index=True
when concatenating DataFrames. This can lead to duplicate index values in the resulting DataFrame.
Problem Example:
# Without ignore_index=True result = pd.concat([df1, df2]) # WRONG!
If df1
has indices [0, 1, 2] and df2
also has indices [0, 1], the result will have duplicate index values [0, 1, 2, 0, 1], which can cause issues when:
- Trying to access rows by index
- Performing operations that require unique indices
- Exporting or further processing the data
Solution:
Always include ignore_index=True
when you want a clean, sequential index:
result = pd.concat([df1, df2], ignore_index=True) # CORRECT!
2. Assuming Column Alignment Without Verification
While the problem states that both DataFrames have identical column structures, in real-world scenarios, developers often concatenate DataFrames without verifying column compatibility.
Problem Example:
# If df1 has columns ['student_id', 'name', 'age'] # but df2 has columns ['id', 'student_name', 'age'] result = pd.concat([df1, df2], ignore_index=True) # This creates NaN values for mismatched columns!
Solution: Verify column names match before concatenation or rename columns as needed:
# Check if columns match
if list(df1.columns) != list(df2.columns):
# Handle the mismatch appropriately
df2.columns = df1.columns # If structure is same but names differ
result = pd.concat([df1, df2], ignore_index=True)
3. Modifying the Original DataFrames
Some developers might think they need to modify the original DataFrames or that concat()
modifies them in-place.
Problem Example:
def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
df1 = pd.concat([df1, df2], ignore_index=True) # Reassigning parameter
return df1 # This works but is misleading and bad practice
Solution:
pd.concat()
returns a new DataFrame without modifying the originals. Use a new variable:
def concatenateTables(df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
result = pd.concat([df1, df2], ignore_index=True)
return result
4. Using the Wrong Axis for Concatenation
Although the default axis=0
is correct for vertical concatenation, explicitly specifying it can prevent confusion, especially when maintaining code.
Problem Example:
# Someone might accidentally use axis=1 thinking it means "first dimension" result = pd.concat([df1, df2], axis=1, ignore_index=True) # WRONG! This concatenates horizontally
Solution: Either rely on the default (axis=0) or explicitly specify it for clarity:
result = pd.concat([df1, df2], axis=0, ignore_index=True) # Explicit vertical concatenation # or simply result = pd.concat([df1, df2], ignore_index=True) # Default is axis=0
Which of the following array represent a max heap?
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!