Python Data Visualization Transcripts
Chapter: Altair
Lecture: Amazon authors

Login or purchase this course to watch this video and the rest of the course contents.
0:00 One thing I forgot to mention is that we didn't talk about enabling any renderers with
0:06 this data set. And part of the reason we didn't have to do that is the dataset has only 600 rows.
0:14 So it's a much smaller dataset than what we've been working with. So there was no need to enable any of the background renderers.
0:21 So I wanted to call that out in case you were curious about that for the next set of data analysis. Let's take a look at the authors and I found
0:30 that due to the number of authors we have, We have quite a few 275 in this case. So I just want to focus on the top authors and see a little bit more
0:43 about maybe what their distribution of books looks like over time and to get the top authors, I'm gonna create Pandas command to do this.
0:53 So what we wanna do is group by author And I want to aggregate the reviews to sum those all up and then choose the top 20 authors by review and get
1:06 just the author name. So at the end of the day I turned that into a list. So now I have a list of Of 20 authors from Suzanne Collins
1:14 to Mary L. Trump that's going to be useful for slicing the data and getting a subset of the data for a chart. So let's put that chart together.
1:24 Okay, let's walk through a little bit about what we did. So I created a chart and because I just want those top authors,
1:31 I used the data frame query to make sure that the author was in. That top authors list that I created.
1:38 I create a circle and I control the opacity a little bit. I made the circles black on the outside and at a stroke with so you can
1:47 see it. And then I said that I wanted the author on the Y axis and the year on the X axis. So you can see I have my authors these top 20 here the year.
1:59 And then let's look at what else we did. We changed the size of these circles based on the number of reviews and I set
2:07 a specific scale between zero and 500 and add that to the legend. And I also modified that legend down here.
2:16 So it has reviews. So it's a little bit easier to understand and I then also colored it by the author so that each author has a different color and we
2:26 have the legend over here. So this is starting to get us some useful information
2:31 about how the authors are distributed over time where you can see some authors are through
2:36 this entire period whereas others are much more recent or maybe not as recent as others
2:43 But there are some things we can do to clarify and make this visualization a little bit more easy to understand.
2:50 So let's work through another example of how to make this a little bit better. So the first thing I'm going to do is because I have this color and I
3:00 have the author. There's a lot of duplicative information. Let's turn off the legend here.
3:08 That helps a little bit. So now I don't have all the authors over here
3:12 but I still preserve my colors so starting to look a little bit nicer but there's some other things we could do.
3:17 Maybe add some grids to it and change the shape a little bit. So now we've got this pretty cool visualization.
3:27 I'm gonna shrink this a little bit so you can see it a little bit easier So now we have this visualization where we have grids to show each row for
3:37 the author. We also have the colors, like we talked about everything fits within a nice square. So I'm gonna walk through what I did here,
3:46 I am still encoding the data but I'm using the configure axis command to turn on
3:53 the grid. And then one of the things that is really interesting that I did here is notice how I don't say df.query to get the top authors.
4:03 I use this transform filter to say that the field is one is the is a predicate of another. So I say that the field author is in a one of
4:14 top authors. So that top authors list that I created. So this just shows that you have some flexibility with altair to decide.
4:23 Do you want to filter and make modifications at the data frame level or do you want to use altair to do that filtering for you, personally
4:33 I do find it a little bit easier to use the data frame query approach. So I'll show you that here.
4:40 But I do want to call out that this is the basic approach to combining multiple altair functions together. Once I do that filter,
4:49 I also want to change the width and the height because I found that here things were scrunched a little bit and I added it.
4:57 I specifically said that the way it should be 550 the height 475 and gave it a title. So now we have our Amazon author reviews from 2009 to 2020.
5:08 And the circles tell you how many reviews there were during that period of time.
5:13 And it's a nice visualization to show how things change over time for these big authors.


Talk Python's Mastodon Michael Kennedy's Mastodon