Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 5: Cleaning Heart Disease Data in Pandas
Lecture: Fixing the Restecg Column
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Okay, we're gonna go to our next column resting echocardiographic results. This is supposed to have three values
0:07
value zero, normal, value one having a wave abnormality and value two showing probable or definite left ventricle hypertrophy.
0:18
Probably said some of those words wrong, but let's go through our process here again.
0:23
We're gonna look at our value counts tells us that our types are messed up. I'm going to do remove question.
0:28
This looks like an int8 should be fine. And let's validate that. That looks like that works.
0:34
What we're seeing is that because we made this function and we have this process it makes it really easy to clean up.
0:40
It is sort of annoying that these column types are messed up like this.
0:44
One thing that we could do is we could provide types to our function that is reading those CSV files.
0:51
But either way we're gonna have to go through and clean up the types.
0:55
I kind of like to have something where I just load in the raw data and then clean up after the fact.
1:01
Could I load in each individually and clean up each individual one and then concatenate those after fact?
1:07
Certainly I could do that as well. I'm not really adamant about whether one of those is better than the other.
1:12
In fact, this code would work with a single file.
1:15
However, it probably isn't quite as necessary with a single file because we're not going to get those mixed types.
1:20
So that might be a reason why we might want to consider cleaning each file up individually before combining them together.