Python Data Visualization Transcripts
Chapter: Matplotlib
Lecture: Data set

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Let's face it. Real world data is typically messy and I wanted the data in
0:05 this course to mirror what you're gonna encounter once you apply these visualization concepts on your own. I've chosen to use data from the
0:13 US. Department of Energy, fueleconomy.gov at this URL. I've downloaded the data and created a file called EPA_fuel_economy.csv
0:22 Here's an example of the data that is in this file. The first seven columns include basic information about each vehicle per year.
0:32 So you have the make model and year as well as the number of cylinders in the engine, the type of transmission,
0:38 the engine displacement and the vehicle class. The C02 column is a measure of the estimated emissions of CO2.
0:45 on an annual basis, barrels 08 indicates the number of barrels of oil per
0:53 year to operate the vehicle and then what that cost would be on an annual basis
0:58 We also include the different fuel type used for this estimate as well as the MPG, both highway city and combined.
1:08 So I like this data set for a lot of different reasons. It has a large number of values, 24,000 values from 2000 to 2020,
1:17 which means it's big enough that visualization is really going to help us understand the large data set. It's already in the tidy format.
1:25 It has a mix of qualitative and quantitative variables and the variables are ordered and un ordered as well as discrete and continuous.
1:33 So those concepts that we talked about earlier are going to apply. And then this is an area where we all have experience with vehicles.
1:40 And hopefully it's interesting enough that you might choose to explore it on your own and see how it applies the vehicles that you own or operate.


Talk Python's Mastodon Michael Kennedy's Mastodon