Python Data Visualization Transcripts
Lecture: Advanced specialized plots
0:00 In addition to the standard plots that we've talked about with Pandas.
0:04 There are four very specialized visualizations that I want to walk through so that you're aware of them and can use them where appropriate.
0:12 Each of these is available through a separate import. We're going to cover the scatter matrix and andrews,
0:19 curves, parallel coordinates and the radviz report. Let me go ahead and rerun this new notebook that I've created and we'll go through
0:30 the first example of how to show a scatter matrix. Now let's look at what the scatter matrix is doing.
0:40 It's a really convenient tool to see what the interactions look like between your various columns So for each combination of column,
0:49 it plots different types of visualizations, scatter plots or histograms comparing the two. So you can see in this example,
0:58 If you look at the CO2 compared to the barrels of oil used per vehicle,
1:06 you can see a strong correlation line which certainly makes sense and intuitively something that you
1:14 would expect from our data. So let's bring this down to a smaller subset of
1:19 variables that we want to compare to give a better example of how to use this
1:23 tool. So here are all the vehicle class options that are available to us right now let's consolidate some of those.
1:32 So I'm gonna create a new car class data frame that is just for compact cars
1:38 Midsize cars, subcompact cars and large cars so we'll filter out trucks and other types of data. So we're just looking at compact cars,
1:49 mid sized cars and then we're just gonna include cylinders, fuel costs, C02 and vehicle class. Let's take a look at what that looks like.
1:58 So you can see it's a much smaller set of data and now we'll do a scatter matrix with this smaller data set just to make it a little bit easier to
2:08 understand what's going on. There you go. Now you can see how each of these values is plotted against the other and in
2:16 those areas where the fuel costs is plotted against fuel cost we just show a
2:21 histogram. So this is a really useful tool to quickly explore your data and understand
2:26 what sort of relationships there might be between the different columns.
2:30 I'm going to go through a more complex visualization called the Andrews curves which are useful
2:35 for visualizing high dimensional data and that means data with a lot of different variables that are hard or difficult to see the interactions between.
2:45 And then Andrew's curve is a unique way to visualize that data and here you can
2:51 see each of the different types of cars and start to visualize how the values differ I'm not going to go into the details on how to use.
2:59 Andrews curves. This is really a more advanced machine learning visualization but it is fairly
3:06 unique and I believe pandas is one of the few places that has this visualization.
3:10 So as you move down your machine learning pathway and start to tackle more and more
3:15 complex visualizations and projects you might want to consider this in a similar vein,
3:22 parallel coordinates are also a useful tool for visualizing high dimensional data.
3:29 Once again, this is an interesting way to look at the interaction of these multiple car variables to fuel cost cylinders in CO2.
3:37 And this is another way to view high dimensional data and help you maybe
3:42 understand different ways that you can cluster your observations together in the final chart we will
3:50 go through the final visualization is a radio visualization, which is another way to see where you might have natural clustering of your data and
4:00 once again all three of these are definitely more advanced visualizations but I wanted to call
4:05 them out so that you are aware that they are available in pandas when you need them. And finally I have been using a little bit of the matplot lib
4:14 customization to plot these and I'm gonna show how to create one figure with three rows and one column showing the andrews,
4:21 curves, the parallel coordinates and the radviz. All in one plot. So I create my figure, create three axes,
4:31 plot those values on the axis for each of those different visualization and do a little
4:36 bit of customization along the way to make it more visually appealing and understandable.
4:43 And now let's look at each of these plots together in one visualization, you can imagine how you could use this to start to get a better feel for
4:52 some of those more complex data and doing further machine learning or analysis on it in the future.