2886. Change Data Type
Problem Description
You are given a DataFrame called students
with four columns: student_id
(int), name
(object), age
(int), and grade
(float).
The problem asks you to fix a data type error in the DataFrame. Specifically, the grade
column is currently stored as floating-point numbers (floats), but it needs to be converted to integers.
Your task is to write a function changeDatatype
that takes the students
DataFrame as input and returns the same DataFrame with the grade
column converted from float to integer type.
For example:
- If the input has a grade value of
73.0
(float), it should become73
(integer) - If the input has a grade value of
87.0
(float), it should become87
(integer)
The solution uses pandas' astype()
method to convert the data type of the grade
column from float to int. The conversion is done in-place by reassigning the converted column back to students['grade']
, and then the modified DataFrame is returned.
Intuition
When working with DataFrames, data type inconsistencies are common issues that need to be addressed. In this case, we have grade values that are whole numbers but stored as floats (like 73.0
and 87.0
). This is inefficient and may cause issues in downstream processing.
The key insight is that pandas provides built-in methods for type conversion. Since we need to change the data type of an entire column, we can leverage pandas' astype()
method, which is specifically designed for this purpose.
Why astype(int)
?
- The grade values are already whole numbers (just with
.0
decimal parts) - Converting from float to int will simply remove the unnecessary decimal portion
- This is a direct type casting operation that doesn't require any complex transformation logic
The approach is straightforward: access the specific column that needs type conversion (students['grade']
), apply the type conversion method (astype(int)
), and reassign it back to the same column to update the DataFrame in place. This modifies the original DataFrame structure while preserving all the data values and other columns unchanged.
This solution is both memory-efficient and computationally simple, as it performs a single operation on one column without creating unnecessary copies of the entire DataFrame.
Solution Approach
The implementation follows a direct type conversion approach using pandas' built-in functionality:
-
Access the target column: We access the
grade
column from the DataFrame using bracket notation:students['grade']
-
Apply type conversion: We use the
astype()
method to convert the column data type from float to integer:students['grade'].astype(int)
-
Update the DataFrame: We reassign the converted column back to the original DataFrame to update it in place:
students['grade'] = students['grade'].astype(int)
-
Return the modified DataFrame: After the conversion is complete, we return the updated
students
DataFrame
The complete implementation:
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
students['grade'] = students['grade'].astype(int)
return students
This approach works because:
- The
astype(int)
method performs element-wise conversion on all values in the column - Since all grade values are already whole numbers stored as floats (e.g.,
73.0
,87.0
), the conversion simply removes the decimal portion - The operation modifies the DataFrame in place, maintaining all other columns and their data types unchanged
- No additional data structures or intermediate storage is needed
The time complexity is O(n) where n is the number of rows, as we need to convert each value in the column. The space complexity is O(1) for the operation itself, as we're modifying the existing column rather than creating a new DataFrame.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate how the solution works.
Initial DataFrame:
student_id name age grade 0 101 Alice 20 85.0 1 102 Bob 21 92.0 2 103 Charlie 19 78.0
Notice that the grade
column contains floating-point values (85.0, 92.0, 78.0) even though they represent whole numbers.
Step 1: Access the grade column
When we execute students['grade']
, we get:
0 85.0 1 92.0 2 78.0 Name: grade, dtype: float64
The dtype shows it's currently float64.
Step 2: Apply astype(int) conversion
When we apply .astype(int)
to this column:
- 85.0 → 85
- 92.0 → 92
- 78.0 → 78
The decimal portion is removed, converting each float to its integer equivalent.
Step 3: Reassign to update the DataFrame
By executing students['grade'] = students['grade'].astype(int)
, we replace the original float column with the converted integer column.
Final DataFrame:
student_id name age grade 0 101 Alice 20 85 1 102 Bob 21 92 2 103 Charlie 19 78
Now the grade
column contains integer values (85, 92, 78) with dtype int64. The transformation is complete - we've successfully converted the data type while preserving all the actual grade values and leaving other columns unchanged.
Solution Implementation
1import pandas as pd
2
3
4def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
5 """
6 Convert the data type of the 'grade' column from float to integer.
7
8 Args:
9 students: DataFrame containing student information with a 'grade' column
10
11 Returns:
12 DataFrame with the 'grade' column converted to integer type
13 """
14 # Convert the 'grade' column from its current type to integer
15 students['grade'] = students['grade'].astype(int)
16
17 # Return the modified DataFrame
18 return students
19
1import java.util.List;
2import java.util.ArrayList;
3
4/**
5 * Student class to represent student information
6 */
7class Student {
8 private String name;
9 private double grade; // Originally stored as double (equivalent to float in Python)
10
11 // Constructor
12 public Student(String name, double grade) {
13 this.name = name;
14 this.grade = grade;
15 }
16
17 // Getters and setters
18 public String getName() {
19 return name;
20 }
21
22 public void setName(String name) {
23 this.name = name;
24 }
25
26 public double getGrade() {
27 return grade;
28 }
29
30 public void setGrade(double grade) {
31 this.grade = grade;
32 }
33
34 // Method to get grade as integer
35 public int getGradeAsInt() {
36 return (int) grade;
37 }
38}
39
40/**
41 * Solution class containing the changeDatatype method
42 */
43public class Solution {
44
45 /**
46 * Convert the data type of the 'grade' field from double to integer.
47 * This method modifies the grade values by truncating decimal parts.
48 *
49 * @param students List containing student information with a 'grade' field
50 * @return List with the 'grade' field converted to integer values
51 */
52 public static List<Student> changeDatatype(List<Student> students) {
53 // Create a new list to store students with integer grades
54 List<Student> modifiedStudents = new ArrayList<>();
55
56 // Iterate through each student in the input list
57 for (Student student : students) {
58 // Convert the grade from double to integer (truncates decimal part)
59 int integerGrade = (int) student.getGrade();
60
61 // Create a new Student object with the converted grade
62 Student modifiedStudent = new Student(student.getName(), integerGrade);
63
64 // Add the modified student to the result list
65 modifiedStudents.add(modifiedStudent);
66 }
67
68 // Return the modified list
69 return modifiedStudents;
70 }
71}
72
1#include <vector>
2#include <string>
3#include <cmath>
4
5// Structure to represent a student record
6struct Student {
7 std::string name;
8 int id;
9 double grade; // Originally stored as floating-point
10};
11
12// Class to represent a DataFrame-like structure for students
13class DataFrame {
14public:
15 std::vector<Student> students;
16
17 // Constructor
18 DataFrame(const std::vector<Student>& data) : students(data) {}
19
20 // Get the students data
21 std::vector<Student>& getData() {
22 return students;
23 }
24};
25
26/**
27 * Convert the data type of the 'grade' column from float to integer.
28 *
29 * @param students DataFrame containing student information with a 'grade' column
30 * @return DataFrame with the 'grade' column converted to integer type
31 */
32DataFrame changeDatatype(DataFrame students) {
33 // Iterate through all student records in the DataFrame
34 for (size_t i = 0; i < students.getData().size(); ++i) {
35 // Convert the floating-point grade to integer by truncation
36 // Note: This mimics pandas astype(int) behavior which truncates towards zero
37 students.getData()[i].grade = static_cast<int>(students.getData()[i].grade);
38 }
39
40 // Return the modified DataFrame
41 return students;
42}
43
1// Import necessary types and libraries
2import * as pd from 'pandas-js';
3
4/**
5 * Convert the data type of the 'grade' column from float to integer.
6 *
7 * @param students - DataFrame containing student information with a 'grade' column
8 * @returns DataFrame with the 'grade' column converted to integer type
9 */
10function changeDatatype(students: pd.DataFrame): pd.DataFrame {
11 // Convert the 'grade' column from its current type to integer
12 // Note: In TypeScript/JavaScript, we need to map over the values and convert them
13 students['grade'] = students['grade'].map((value: number) => Math.floor(value));
14
15 // Return the modified DataFrame
16 return students;
17}
18
Time and Space Complexity
Time Complexity: O(n)
where n
is the number of rows in the DataFrame. The astype()
operation needs to iterate through each element in the 'grade' column to convert it from its current data type to integer, which requires visiting each row once.
Space Complexity: O(n)
where n
is the number of rows in the DataFrame. When astype(int)
is called, pandas creates a new array/series with the converted integer values. This requires allocating memory for n
integer values. Although the operation appears to be in-place (assigning back to students['grade']
), internally pandas creates a new column with the converted values before replacing the old one.
Common Pitfalls
1. Handling Missing Values (NaN)
The most critical pitfall occurs when the grade
column contains missing values (NaN). Direct conversion using astype(int)
will raise an error because NaN cannot be converted to integer type in pandas.
Problem Example:
# DataFrame with NaN values
students = pd.DataFrame({
'student_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'age': [20, 21, 22],
'grade': [85.0, NaN, 92.0]
})
# This will raise: ValueError: Cannot convert non-finite values (NA or inf) to integer
students['grade'] = students['grade'].astype(int)
Solution:
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
# Option 1: Fill NaN values before conversion
students['grade'] = students['grade'].fillna(0).astype(int)
# Option 2: Use nullable integer type
students['grade'] = students['grade'].astype('Int64')
# Option 3: Drop rows with NaN values first
students = students.dropna(subset=['grade'])
students['grade'] = students['grade'].astype(int)
return students
2. Non-Integer Float Values
If the grade
column contains float values with decimal parts (e.g., 73.5, 87.8), converting to integer will truncate the decimal portion, potentially losing important information.
Problem Example:
# DataFrame with decimal grades
students = pd.DataFrame({
'grade': [73.5, 87.8, 92.3]
})
# This truncates: 73.5 → 73, 87.8 → 87 (loses precision)
students['grade'] = students['grade'].astype(int)
Solution:
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
# Round before converting to preserve accuracy
students['grade'] = students['grade'].round().astype(int)
# Or use ceiling/floor based on requirements
students['grade'] = students['grade'].apply(np.ceil).astype(int)
return students
3. Column Existence Verification
The code assumes the grade
column exists. If it doesn't, a KeyError will be raised.
Problem Example:
# DataFrame without 'grade' column
students = pd.DataFrame({
'student_id': [1, 2],
'name': ['Alice', 'Bob'],
'score': [85.0, 92.0] # Different column name
})
# This raises: KeyError: 'grade'
students['grade'] = students['grade'].astype(int)
Solution:
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
# Check if column exists
if 'grade' not in students.columns:
raise ValueError("Column 'grade' not found in DataFrame")
# Or handle gracefully
if 'grade' in students.columns:
students['grade'] = students['grade'].astype(int)
return students
4. Infinity Values
Float columns might contain infinity values (inf or -inf) which cannot be converted to integers.
Solution:
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
# Replace infinity values before conversion
students['grade'] = students['grade'].replace([np.inf, -np.inf], 0)
students['grade'] = students['grade'].astype(int)
return students
5. Performance with Large DataFrames
For very large DataFrames, creating a copy vs modifying in-place can have memory implications.
Solution:
def changeDatatype(students: pd.DataFrame) -> pd.DataFrame:
# Explicit in-place modification (more memory efficient)
students.loc[:, 'grade'] = students['grade'].astype(int)
# Or if you need to preserve original
students_copy = students.copy()
students_copy['grade'] = students_copy['grade'].astype(int)
return students_copy
How does quick sort divide the problem into subproblems?
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!