#
Python Data Visualization Transcripts

Chapter: Matplotlib

Lecture: Regression

Login or
purchase this course
to watch this video and the rest of the course contents.

0:00
Since our last notebook was getting kind of long. I thought I'd start another notebook to go through an example of how to do additional

0:07
customization of your plots and also add a linear regression line to your plot. So for the new notebook I've set it up just like we have our other ones

0:18
I have all of my imports. I established my file paths to the EPA fuel economy file. I read it in you can see the top five rows as well as enable

0:30
matplot lib so it will plot in line. The one other thing that I wanted to call out,

0:35
I added a new import here for stats models and for those of you not familiar

0:40
with stats models, it's a really useful python module that does a lot of statistical analysis of your data in a very straightforward,

0:49
easy to understand model and you can look at the documentation to learn more about it

0:55
I'll go through one quick example but I encourage you to explore it more on your own. Similar to what we did in the past.

1:05
I created a very simple average by year what the fuel cost is. So I have this nice simple data frame that we will plot in a second.

1:14
So let's say we want to build a model to predict or show what a trend line would look like for the fuel economy as it changes over the years.

1:23
So we'll call this the MPG Model. Now I've developed this model that says predict the fuel

1:29
costs based on the year and develop and create a fitted line to that. If you want to see the values and see for each year this is what it

1:40
predicts the values would be. And if you want to see how good your model is, this prints out a nice table that describes the model as well as some

1:50
other measures of the effectiveness of the fit of that model. And I'll leave that to you as you decide you want to dive into this in

1:58
a little more detail. So now that we have this model, let's plot it. So what I've done is create a scatter plot showing the fuel

2:09
costs by year and then plotted as a line the fitted values so you can see

2:15
that this line represents what that that trend looks like if we want to clean this up. Since this isn't really a very good fit.

2:24
I'm doing this just for illustration purposes. Let's trim the number of years were showing and it looks a little bit cleaner.

2:42
So in this example I just changed the range instead of going from 2000 to 2020

2:47
I'm just doing 2010-2020 And then I also compacted the wide range to go from 1800 to 2200, Just to make it a little easier to visualize.

2:59
You can see that it's not too bad a fit for this range. Once again, I'm not gonna go into statistically how you'd want to evaluate this.

3:08
But this does show you how to use matplot lib to plot a linear regression.