2886. Change Data Type

EasyPandas

Problem Description

You are given a DataFrame called students with four columns: student_id (int), name (object), age (int), and grade (float).

The problem asks you to fix a data type error in the DataFrame. Specifically, the grade column is currently stored as floating-point numbers (floats), but it needs to be converted to integers.

Your task is to write a function changeDatatype that takes the students DataFrame as input and returns the same DataFrame with the grade column converted from float to integer type.

For example:

If the input has a grade value of 73.0 (float), it should become 73 (integer)
If the input has a grade value of 87.0 (float), it should become 87 (integer)

The solution uses pandas' astype() method to convert the data type of the grade column from float to int. The conversion is done in-place by reassigning the converted column back to students['grade'], and then the modified DataFrame is returned.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

When working with DataFrames, data type inconsistencies are common issues that need to be addressed. In this case, we have grade values that are whole numbers but stored as floats (like 73.0 and 87.0). This is inefficient and may cause issues in downstream processing.

The key insight is that pandas provides built-in methods for type conversion. Since we need to change the data type of an entire column, we can leverage pandas' astype() method, which is specifically designed for this purpose.

Why astype(int)?

The grade values are already whole numbers (just with .0 decimal parts)
Converting from float to int will simply remove the unnecessary decimal portion
This is a direct type casting operation that doesn't require any complex transformation logic

The approach is straightforward: access the specific column that needs type conversion (students['grade']), apply the type conversion method (astype(int)), and reassign it back to the same column to update the DataFrame in place. This modifies the original DataFrame structure while preserving all the data values and other columns unchanged.

This solution is both memory-efficient and computationally simple, as it performs a single operation on one column without creating unnecessary copies of the entire DataFrame.

Solution Implementation

1import pandas as pd
2
3
4def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
5    """
6    Convert the data type of the 'grade' column from float to integer.
7  
8    Args:
9        students: DataFrame containing student information with a 'grade' column
10      
11    Returns:
12        DataFrame with the 'grade' column converted to integer type
13    """
14    # Convert the 'grade' column from its current type to integer
15    students['grade'] = students['grade'].astype(int)
16  
17    # Return the modified DataFrame
18    return students
19

1import java.util.List;
2import java.util.ArrayList;
3
4/**
5 * Student class to represent student information
6 */
7class Student {
8    private String name;
9    private double grade;  // Originally stored as double (equivalent to float in Python)
10  
11    // Constructor
12    public Student(String name, double grade) {
13        this.name = name;
14        this.grade = grade;
15    }
16  
17    // Getters and setters
18    public String getName() {
19        return name;
20    }
21  
22    public void setName(String name) {
23        this.name = name;
24    }
25  
26    public double getGrade() {
27        return grade;
28    }
29  
30    public void setGrade(double grade) {
31        this.grade = grade;
32    }
33  
34    // Method to get grade as integer
35    public int getGradeAsInt() {
36        return (int) grade;
37    }
38}
39
40/**
41 * Solution class containing the changeDatatype method
42 */
43public class Solution {
44  
45    /**
46     * Convert the data type of the 'grade' field from double to integer.
47     * This method modifies the grade values by truncating decimal parts.
48     * 
49     * @param students List containing student information with a 'grade' field
50     * @return List with the 'grade' field converted to integer values
51     */
52    public static List<Student> changeDatatype(List<Student> students) {
53        // Create a new list to store students with integer grades
54        List<Student> modifiedStudents = new ArrayList<>();
55      
56        // Iterate through each student in the input list
57        for (Student student : students) {
58            // Convert the grade from double to integer (truncates decimal part)
59            int integerGrade = (int) student.getGrade();
60          
61            // Create a new Student object with the converted grade
62            Student modifiedStudent = new Student(student.getName(), integerGrade);
63          
64            // Add the modified student to the result list
65            modifiedStudents.add(modifiedStudent);
66        }
67      
68        // Return the modified list
69        return modifiedStudents;
70    }
71}
72

1#include <vector>
2#include <string>
3#include <cmath>
4
5// Structure to represent a student record
6struct Student {
7    std::string name;
8    int id;
9    double grade;  // Originally stored as floating-point
10};
11
12// Class to represent a DataFrame-like structure for students
13class DataFrame {
14public:
15    std::vector<Student> students;
16  
17    // Constructor
18    DataFrame(const std::vector<Student>& data) : students(data) {}
19  
20    // Get the students data
21    std::vector<Student>& getData() {
22        return students;
23    }
24};
25
26/**
27 * Convert the data type of the 'grade' column from float to integer.
28 * 
29 * @param students DataFrame containing student information with a 'grade' column
30 * @return DataFrame with the 'grade' column converted to integer type
31 */
32DataFrame changeDatatype(DataFrame students) {
33    // Iterate through all student records in the DataFrame
34    for (size_t i = 0; i < students.getData().size(); ++i) {
35        // Convert the floating-point grade to integer by truncation
36        // Note: This mimics pandas astype(int) behavior which truncates towards zero
37        students.getData()[i].grade = static_cast<int>(students.getData()[i].grade);
38    }
39  
40    // Return the modified DataFrame
41    return students;
42}
43

1// Import necessary types and libraries
2import * as pd from 'pandas-js';
3
4/**
5 * Convert the data type of the 'grade' column from float to integer.
6 * 
7 * @param students - DataFrame containing student information with a 'grade' column
8 * @returns DataFrame with the 'grade' column converted to integer type
9 */
10function changeDatatype(students: pd.DataFrame): pd.DataFrame {
11    // Convert the 'grade' column from its current type to integer
12    // Note: In TypeScript/JavaScript, we need to map over the values and convert them
13    students['grade'] = students['grade'].map((value: number) => Math.floor(value));
14  
15    // Return the modified DataFrame
16    return students;
17}
18

Solution Approach

The implementation follows a direct type conversion approach using pandas' built-in functionality:

Access the target column: We access the grade column from the DataFrame using bracket notation: students['grade']
Apply type conversion: We use the astype() method to convert the column data type from float to integer: students['grade'].astype(int)
Update the DataFrame: We reassign the converted column back to the original DataFrame to update it in place: students['grade'] = students['grade'].astype(int)
Return the modified DataFrame: After the conversion is complete, we return the updated students DataFrame

The complete implementation:

def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
    students['grade'] = students['grade'].astype(int)
    return students

This approach works because:

The astype(int) method performs element-wise conversion on all values in the column
Since all grade values are already whole numbers stored as floats (e.g., 73.0, 87.0), the conversion simply removes the decimal portion
The operation modifies the DataFrame in place, maintaining all other columns and their data types unchanged
No additional data structures or intermediate storage is needed

The time complexity is O(n) where n is the number of rows, as we need to convert each value in the column. The space complexity is O(1) for the operation itself, as we're modifying the existing column rather than creating a new DataFrame.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate how the solution works.

Initial DataFrame:

   student_id    name  age  grade
0         101    Alice   20   85.0
1         102      Bob   21   92.0
2         103  Charlie   19   78.0

Notice that the grade column contains floating-point values (85.0, 92.0, 78.0) even though they represent whole numbers.

Step 1: Access the grade column When we execute students['grade'], we get:

0    85.0
1    92.0
2    78.0
Name: grade, dtype: float64

The dtype shows it's currently float64.

Step 2: Apply astype(int) conversion When we apply .astype(int) to this column:

85.0 → 85
92.0 → 92
78.0 → 78

The decimal portion is removed, converting each float to its integer equivalent.

Step 3: Reassign to update the DataFrame By executing students['grade'] = students['grade'].astype(int), we replace the original float column with the converted integer column.

Final DataFrame:

   student_id    name  age  grade
0         101    Alice   20     85
1         102      Bob   21     92
2         103  Charlie   19     78

Now the grade column contains integer values (85, 92, 78) with dtype int64. The transformation is complete - we've successfully converted the data type while preserving all the actual grade values and leaving other columns unchanged.

Time and Space Complexity

Time Complexity: O(n) where n is the number of rows in the DataFrame. The astype() operation needs to iterate through each element in the 'grade' column to convert it from its current data type to integer, which requires visiting each row once.

Space Complexity: O(n) where n is the number of rows in the DataFrame. When astype(int) is called, pandas creates a new array/series with the converted integer values. This requires allocating memory for n integer values. Although the operation appears to be in-place (assigning back to students['grade']), internally pandas creates a new column with the converted values before replacing the old one.

Common Pitfalls

1. Handling Missing Values (NaN)

The most critical pitfall occurs when the grade column contains missing values (NaN). Direct conversion using astype(int) will raise an error because NaN cannot be converted to integer type in pandas.

Problem Example:

# DataFrame with NaN values
students = pd.DataFrame({
    'student_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [20, 21, 22],
    'grade': [85.0, NaN, 92.0]
})

# This will raise: ValueError: Cannot convert non-finite values (NA or inf) to integer
students['grade'] = students['grade'].astype(int)

Solution:

def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
    # Option 1: Fill NaN values before conversion
    students['grade'] = students['grade'].fillna(0).astype(int)
  
    # Option 2: Use nullable integer type
    students['grade'] = students['grade'].astype('Int64')
  
    # Option 3: Drop rows with NaN values first
    students = students.dropna(subset=['grade'])
    students['grade'] = students['grade'].astype(int)
  
    return students

2. Non-Integer Float Values

If the grade column contains float values with decimal parts (e.g., 73.5, 87.8), converting to integer will truncate the decimal portion, potentially losing important information.

Problem Example:

# DataFrame with decimal grades
students = pd.DataFrame({
    'grade': [73.5, 87.8, 92.3]
})

# This truncates: 73.5 → 73, 87.8 → 87 (loses precision)
students['grade'] = students['grade'].astype(int)

Solution:

def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
    # Round before converting to preserve accuracy
    students['grade'] = students['grade'].round().astype(int)
  
    # Or use ceiling/floor based on requirements
    students['grade'] = students['grade'].apply(np.ceil).astype(int)
  
    return students

3. Column Existence Verification

The code assumes the grade column exists. If it doesn't, a KeyError will be raised.

Problem Example:

# DataFrame without 'grade' column
students = pd.DataFrame({
    'student_id': [1, 2],
    'name': ['Alice', 'Bob'],
    'score': [85.0, 92.0]  # Different column name
})

# This raises: KeyError: 'grade'
students['grade'] = students['grade'].astype(int)

Solution:

def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
    # Check if column exists
    if 'grade' not in students.columns:
        raise ValueError("Column 'grade' not found in DataFrame")
  
    # Or handle gracefully
    if 'grade' in students.columns:
        students['grade'] = students['grade'].astype(int)
  
    return students

4. Infinity Values

Float columns might contain infinity values (inf or -inf) which cannot be converted to integers.

Solution:

def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
    # Replace infinity values before conversion
    students['grade'] = students['grade'].replace([np.inf, -np.inf], 0)
    students['grade'] = students['grade'].astype(int)
  
    return students

5. Performance with Large DataFrames

For very large DataFrames, creating a copy vs modifying in-place can have memory implications.

Solution:

def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
    # Explicit in-place modification (more memory efficient)
    students.loc[:, 'grade'] = students['grade'].astype(int)
  
    # Or if you need to preserve original
    students_copy = students.copy()
    students_copy['grade'] = students_copy['grade'].astype(int)
    return students_copy

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

2886. Change Data Type

Problem Description

Intuition

Solution Implementation

Solution Approach

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Example Walkthrough

Time and Space Complexity

Common Pitfalls

1. Handling Missing Values (NaN)

2. Non-Integer Float Values

3. Column Existence Verification

4. Infinity Values

5. Performance with Large DataFrames

Recommended Readings