#100DaysOfCode in Python Transcripts
Chapter: Days 4-6: Collections module
Lecture: Second day: use collections on movie data
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Welcome back to the 100 days of Python and the second day of the collections module. Today, we're going to get practical with a code challenge.
0:09
That will be highest rated movie directors. We will load in a data set and convert it into a default dictionary of directors as keys
0:19
and movie namedtuples as values. If you want to try it yourself on scratch, I encourage you to pause the video now
0:26
and read through this link and try to code it up yourself. What I will do in the rest of this video, is to guide you how to get the data loaded
0:35
into the directors variable. So parse the CSV, convert it into defaultdict, and also will have a Counter example,
0:43
how to get the directors with the most amount of movies. So if you need some guidance, keep watching but maybe you want to try it yourself first.
0:53
Welcome back. I hope you had fun doing the other exercise. In the next session, I will show you how to load
0:59
in the data and parse it into a defaultdict. So we're going to load this data in and the goal is to make a defaultdict where the keys are the directors
1:11
and the values are a list of movies and every movie will be stored in a namedtuple. So let's define the namedtuple first.
1:23
We've defined a namedtuple called movie with title, year, and score. Those are the only fields I'm interested in for now. We need to parse the CSV
1:33
and load the data into defaultdict. I'm not going to touch too much upon the CSV part because there's a whole lesson dedicated to that.
1:40
I will write out the function and come back and comment it line by line. And let's see if that works.
2:00
And let's get the movies of Christopher Nolan, one of my favorite directors. Wow. Look at that. I can look up a director and I get a list of movies
2:15
and each movie is a name tuple with title, year, and score. Okay, let's go back to the code I've just written. We make a function and receives data
2:26
which by default is movies CSV which we retrieved here. I initialize a defaultdict of lists called directors. I open the data with a with statement.
2:40
Then I use the CSV dict reader to parse every line into an OrderedDict. Every line, I extract the director name, movie title,
2:49
title year, and IMDB score and store them in variables. The year, I convert to int. The score, I convert to float.
2:58
With data analysis, there's always bad data and this is no exception. A value error got raised for some rows.
3:04
So when that happens, I just ignore the row. I'm not interested in incomplete data. I initialize the movie namedtuple
3:11
and give it movie, year, and score. That namedtuple gets appended to the director in a directors named list. So here you see the defaultdict in action.
3:20
I don't have to initialize an empty list for every director up front. defaultdict handles that all behind the scene.
3:28
And then I return the directors defaultdict. Then I call the function and store the results in the directors variable and then I can look up directors.
3:37
So there's a lot of stuff you can do with this data. Let's do one more exercise. I'm going to use Counter to find the directors
3:45
that have most movies in this dataset. So I use a counter and I'm going to loop over... the directors... we stored before.
4:01
I can loop over dictionary with items which gives me the value pairs. Then I'm going to store the director... in the counter object.
4:14
I'm going to sum up the length of the movies. You can do this by hand, but the nice thing of having a counter object that now I can do counter...
4:25
Most common. Five. And there you go. Spielberg, Woody Allen, this is pretty plausible. So here you got some more practice
4:36
using the collections data types.