Python Data Visualization Transcripts
Chapter: Matplotlib
Lecture: Regression
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Since our last notebook was getting kind of long. I thought I'd start another notebook to go through an example of how to do additional
0:07
customization of your plots and also add a linear regression line to your plot. So for the new notebook I've set it up just like we have our other ones
0:18
I have all of my imports. I established my file paths to the EPA fuel economy file. I read it in you can see the top five rows as well as enable
0:30
matplot lib so it will plot in line. The one other thing that I wanted to call out,
0:35
I added a new import here for stats models and for those of you not familiar
0:40
with stats models, it's a really useful python module that does a lot of statistical analysis of your data in a very straightforward,
0:49
easy to understand model and you can look at the documentation to learn more about it
0:55
I'll go through one quick example but I encourage you to explore it more on your own. Similar to what we did in the past.
1:05
I created a very simple average by year what the fuel cost is. So I have this nice simple data frame that we will plot in a second.
1:14
So let's say we want to build a model to predict or show what a trend line would look like for the fuel economy as it changes over the years.
1:23
So we'll call this the MPG Model. Now I've developed this model that says predict the fuel
1:29
costs based on the year and develop and create a fitted line to that. If you want to see the values and see for each year this is what it
1:40
predicts the values would be. And if you want to see how good your model is, this prints out a nice table that describes the model as well as some
1:50
other measures of the effectiveness of the fit of that model. And I'll leave that to you as you decide you want to dive into this in
1:58
a little more detail. So now that we have this model, let's plot it. So what I've done is create a scatter plot showing the fuel
2:09
costs by year and then plotted as a line the fitted values so you can see
2:15
that this line represents what that that trend looks like if we want to clean this up. Since this isn't really a very good fit.
2:24
I'm doing this just for illustration purposes. Let's trim the number of years were showing and it looks a little bit cleaner.
2:42
So in this example I just changed the range instead of going from 2000 to 2020
2:47
I'm just doing 2010-2020 And then I also compacted the wide range to go from 1800 to 2200, Just to make it a little easier to visualize.
2:59
You can see that it's not too bad a fit for this range. Once again, I'm not gonna go into statistically how you'd want to evaluate this.
3:08
But this does show you how to use matplot lib to plot a linear regression.