Python Data Visualization Transcripts
Lecture: Data set
0:00 Let's face it. Real world data is typically messy and I wanted the data in
0:05 this course to mirror what you're gonna encounter once you apply these visualization concepts on your own. I've chosen to use data from the
0:13 US. Department of Energy, fueleconomy.gov at this URL. I've downloaded the data and created a file called EPA_fuel_economy.csv
0:22 Here's an example of the data that is in this file. The first seven columns include basic information about each vehicle per year.
0:32 So you have the make model and year as well as the number of cylinders in the engine, the type of transmission,
0:38 the engine displacement and the vehicle class. The C02 column is a measure of the estimated emissions of CO2.
0:45 on an annual basis, barrels 08 indicates the number of barrels of oil per
0:53 year to operate the vehicle and then what that cost would be on an annual basis
0:58 We also include the different fuel type used for this estimate as well as the MPG, both highway city and combined.
1:08 So I like this data set for a lot of different reasons. It has a large number of values, 24,000 values from 2000 to 2020,
1:17 which means it's big enough that visualization is really going to help us understand the large data set. It's already in the tidy format.
1:25 It has a mix of qualitative and quantitative variables and the variables are ordered and un ordered as well as discrete and continuous.
1:33 So those concepts that we talked about earlier are going to apply. And then this is an area where we all have experience with vehicles.
1:40 And hopefully it's interesting enough that you might choose to explore it on your own and see how it applies the vehicles that you own or operate.