Python Data Visualization Transcripts
Chapter: Altair
Lecture: Amazon authors
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
One thing I forgot to mention is that we didn't talk about enabling any renderers with
0:06
this data set. And part of the reason we didn't have to do that is the dataset has only 600 rows.
0:14
So it's a much smaller dataset than what we've been working with. So there was no need to enable any of the background renderers.
0:21
So I wanted to call that out in case you were curious about that for the next set of data analysis. Let's take a look at the authors and I found
0:30
that due to the number of authors we have, We have quite a few 275 in this case. So I just want to focus on the top authors and see a little bit more
0:43
about maybe what their distribution of books looks like over time and to get the top authors, I'm gonna create Pandas command to do this.
0:53
So what we wanna do is group by author And I want to aggregate the reviews to sum those all up and then choose the top 20 authors by review and get
1:06
just the author name. So at the end of the day I turned that into a list. So now I have a list of Of 20 authors from Suzanne Collins
1:14
to Mary L. Trump that's going to be useful for slicing the data and getting a subset of the data for a chart. So let's put that chart together.
1:24
Okay, let's walk through a little bit about what we did. So I created a chart and because I just want those top authors,
1:31
I used the data frame query to make sure that the author was in. That top authors list that I created.
1:38
I create a circle and I control the opacity a little bit. I made the circles black on the outside and at a stroke with so you can
1:47
see it. And then I said that I wanted the author on the Y axis and the year on the X axis. So you can see I have my authors these top 20 here the year.
1:59
And then let's look at what else we did. We changed the size of these circles based on the number of reviews and I set
2:07
a specific scale between zero and 500 and add that to the legend. And I also modified that legend down here.
2:16
So it has reviews. So it's a little bit easier to understand and I then also colored it by the author so that each author has a different color and we
2:26
have the legend over here. So this is starting to get us some useful information
2:31
about how the authors are distributed over time where you can see some authors are through
2:36
this entire period whereas others are much more recent or maybe not as recent as others
2:43
But there are some things we can do to clarify and make this visualization a little bit more easy to understand.
2:50
So let's work through another example of how to make this a little bit better. So the first thing I'm going to do is because I have this color and I
3:00
have the author. There's a lot of duplicative information. Let's turn off the legend here.
3:08
That helps a little bit. So now I don't have all the authors over here
3:12
but I still preserve my colors so starting to look a little bit nicer but there's some other things we could do.
3:17
Maybe add some grids to it and change the shape a little bit. So now we've got this pretty cool visualization.
3:27
I'm gonna shrink this a little bit so you can see it a little bit easier So now we have this visualization where we have grids to show each row for
3:37
the author. We also have the colors, like we talked about everything fits within a nice square. So I'm gonna walk through what I did here,
3:46
I am still encoding the data but I'm using the configure axis command to turn on
3:53
the grid. And then one of the things that is really interesting that I did here is notice how I don't say df.query to get the top authors.
4:03
I use this transform filter to say that the field is one is the is a predicate of another. So I say that the field author is in a one of
4:14
top authors. So that top authors list that I created. So this just shows that you have some flexibility with altair to decide.
4:23
Do you want to filter and make modifications at the data frame level or do you want to use altair to do that filtering for you, personally
4:33
I do find it a little bit easier to use the data frame query approach. So I'll show you that here.
4:40
But I do want to call out that this is the basic approach to combining multiple altair functions together. Once I do that filter,
4:49
I also want to change the width and the height because I found that here things were scrunched a little bit and I added it.
4:57
I specifically said that the way it should be 550 the height 475 and gave it a title. So now we have our Amazon author reviews from 2009 to 2020.
5:08
And the circles tell you how many reviews there were during that period of time.
5:13
And it's a nice visualization to show how things change over time for these big authors.