Python Data Visualization Transcripts
Chapter: Matplotlib
Lecture: Data set
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Let's face it. Real world data is typically messy and I wanted the data in
0:05
this course to mirror what you're gonna encounter once you apply these visualization concepts on your own. I've chosen to use data from the
0:13
US. Department of Energy, fueleconomy.gov at this URL. I've downloaded the data and created a file called EPA_fuel_economy.csv
0:22
Here's an example of the data that is in this file. The first seven columns include basic information about each vehicle per year.
0:32
So you have the make model and year as well as the number of cylinders in the engine, the type of transmission,
0:38
the engine displacement and the vehicle class. The C02 column is a measure of the estimated emissions of CO2.
0:45
on an annual basis, barrels 08 indicates the number of barrels of oil per
0:53
year to operate the vehicle and then what that cost would be on an annual basis
0:58
We also include the different fuel type used for this estimate as well as the MPG, both highway city and combined.
1:08
So I like this data set for a lot of different reasons. It has a large number of values, 24,000 values from 2000 to 2020,
1:17
which means it's big enough that visualization is really going to help us understand the large data set. It's already in the tidy format.
1:25
It has a mix of qualitative and quantitative variables and the variables are ordered and un ordered as well as discrete and continuous.
1:33
So those concepts that we talked about earlier are going to apply. And then this is an area where we all have experience with vehicles.
1:40
And hopefully it's interesting enough that you might choose to explore it on your own and see how it applies the vehicles that you own or operate.