#100DaysOfCode in Python Transcripts
Chapter: Days 4-6: Collections module
Lecture: Second day: use collections on movie data

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Welcome back to the 100 days of Python and the second day of the collections module. Today, we're going to get practical with a code challenge.
0:09 That will be highest rated movie directors. We will load in a data set and convert it into a default dictionary of directors as keys
0:19 and movie namedtuples as values. If you want to try it yourself on scratch, I encourage you to pause the video now
0:26 and read through this link and try to code it up yourself. What I will do in the rest of this video, is to guide you how to get the data loaded
0:35 into the directors variable. So parse the CSV, convert it into defaultdict, and also will have a Counter example,
0:43 how to get the directors with the most amount of movies. So if you need some guidance, keep watching but maybe you want to try it yourself first.
0:53 Welcome back. I hope you had fun doing the other exercise. In the next session, I will show you how to load
0:59 in the data and parse it into a defaultdict. So we're going to load this data in and the goal is to make a defaultdict where the keys are the directors
1:11 and the values are a list of movies and every movie will be stored in a namedtuple. So let's define the namedtuple first.
1:23 We've defined a namedtuple called movie with title, year, and score. Those are the only fields I'm interested in for now. We need to parse the CSV
1:33 and load the data into defaultdict. I'm not going to touch too much upon the CSV part because there's a whole lesson dedicated to that.
1:40 I will write out the function and come back and comment it line by line. And let's see if that works.
2:00 And let's get the movies of Christopher Nolan, one of my favorite directors. Wow. Look at that. I can look up a director and I get a list of movies
2:15 and each movie is a name tuple with title, year, and score. Okay, let's go back to the code I've just written. We make a function and receives data
2:26 which by default is movies CSV which we retrieved here. I initialize a defaultdict of lists called directors. I open the data with a with statement.
2:40 Then I use the CSV dict reader to parse every line into an OrderedDict. Every line, I extract the director name, movie title,
2:49 title year, and IMDB score and store them in variables. The year, I convert to int. The score, I convert to float.
2:58 With data analysis, there's always bad data and this is no exception. A value error got raised for some rows.
3:04 So when that happens, I just ignore the row. I'm not interested in incomplete data. I initialize the movie namedtuple
3:11 and give it movie, year, and score. That namedtuple gets appended to the director in a directors named list. So here you see the defaultdict in action.
3:20 I don't have to initialize an empty list for every director up front. defaultdict handles that all behind the scene.
3:28 And then I return the directors defaultdict. Then I call the function and store the results in the directors variable and then I can look up directors.
3:37 So there's a lot of stuff you can do with this data. Let's do one more exercise. I'm going to use Counter to find the directors
3:45 that have most movies in this dataset. So I use a counter and I'm going to loop over... the directors... we stored before.
4:01 I can loop over dictionary with items which gives me the value pairs. Then I'm going to store the director... in the counter object.
4:14 I'm going to sum up the length of the movies. You can do this by hand, but the nice thing of having a counter object that now I can do counter...
4:25 Most common. Five. And there you go. Spielberg, Woody Allen, this is pretty plausible. So here you got some more practice
4:36 using the collections data types.


Talk Python's Mastodon Michael Kennedy's Mastodon