Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 9: SQL / Database Integration
Lecture: Load CSV data into a Pandas dataframe and cleaning it
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Now I'm going to be using the SQLite library that comes with Python these days. I've got some data here and I'm loading that data.
0:09
This is from a ski resort and I'm going to read that from a CSV file and make a data frame from that. Let's just go through what's going on here.
0:18
This should make sense now that we are familiar with pandas. We are going to convert the date column to a date and put that in the Denver time zone.
0:28
We're pulling off certain columns. We're getting a month column. We're getting a year column. We're making a season column.
0:35
The season column is a little bit more involved. It says if the month is less than five, then we're taking the year minus one and we're
0:45
adding a dash and we're taking the current year. Basically you have a ski season and it runs from like November to like May time frame.
0:54
This is the code to create that ski season. If it's during the summertime, we're saying off season there. Let's run that. Let's look at Alta.
1:05
That's what it looks like. You can see that we have 1989 through 1990 season that we've crafted there. Let's look at our D types.
1:16
Our D types look pretty good. We do have some NumPy objects here. You see that season is an object here. It's not using our PyArrow backend for those.
1:27
If we wanted to, we could go in and tweak those and make sure that those were PyArrow types as well.