Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 5: Cleaning Heart Disease Data in Pandas
Lecture: Fixing the Restecg Column

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Okay, we're gonna go to our next column resting echocardiographic results. This is supposed to have three values
0:07 value zero, normal, value one having a wave abnormality and value two showing probable or definite left ventricle hypertrophy.
0:18 Probably said some of those words wrong, but let's go through our process here again.
0:23 We're gonna look at our value counts tells us that our types are messed up. I'm going to do remove question.
0:28 This looks like an int8 should be fine. And let's validate that. That looks like that works.
0:34 What we're seeing is that because we made this function and we have this process it makes it really easy to clean up.
0:40 It is sort of annoying that these column types are messed up like this.
0:44 One thing that we could do is we could provide types to our function that is reading those CSV files.
0:51 But either way we're gonna have to go through and clean up the types.
0:55 I kind of like to have something where I just load in the raw data and then clean up after the fact.
1:01 Could I load in each individually and clean up each individual one and then concatenate those after fact?
1:07 Certainly I could do that as well. I'm not really adamant about whether one of those is better than the other.
1:12 In fact, this code would work with a single file.
1:15 However, it probably isn't quite as necessary with a single file because we're not going to get those mixed types.
1:20 So that might be a reason why we might want to consider cleaning each file up individually before combining them together.


Talk Python's Mastodon Michael Kennedy's Mastodon