#100DaysOfCode in Python Transcripts
Chapter: Days 4-6: Collections module
Lecture: Second day: use collections on movie data
0:00 Welcome back to the 100 days of Python
0:02 and the second day of the collections module.
0:04 Today, we're going to get practical with a code challenge.
0:08 That will be highest rated movie directors.
0:12 We will load in a data set and convert it
0:14 into a default dictionary of directors as keys
0:18 and movie namedtuples as values.
0:20 If you want to try it yourself on scratch,
0:22 I encourage you to pause the video now
0:25 and read through this link and try to code it up yourself.
0:29 What I will do in the rest of this video,
0:31 is to guide you how to get the data loaded
0:34 into the directors variable.
0:36 So parse the CSV, convert it into defaultdict,
0:39 and also will have a Counter example,
0:42 how to get the directors with the most amount of movies.
0:46 So if you need some guidance, keep watching
0:48 but maybe you want to try it yourself first.
0:52 Welcome back.
0:53 I hope you had fun doing the other exercise.
0:56 In the next session, I will show you how to load
0:58 in the data and parse it into a defaultdict.
1:02 So we're going to load this data in
1:04 and the goal is to make a defaultdict
1:08 where the keys are the directors
1:10 and the values are a list of movies
1:13 and every movie will be stored in a namedtuple.
1:16 So let's define the namedtuple first.
1:22 We've defined a namedtuple called movie
1:24 with title, year, and score.
1:26 Those are the only fields I'm interested in for now.
1:29 We need to parse the CSV
1:32 and load the data into defaultdict.
1:34 I'm not going to touch too much upon the CSV part
1:37 because there's a whole lesson dedicated to that.
1:39 I will write out the function and come back
1:42 and comment it line by line.
1:52 And let's see if that works.
1:59 And let's get the movies of Christopher Nolan,
2:03 one of my favorite directors.
2:08 Wow. Look at that.
2:10 I can look up a director and I get a list of movies
2:14 and each movie is a name tuple with title, year, and score.
2:19 Okay, let's go back to the code I've just written.
2:22 We make a function and receives data
2:25 which by default is movies CSV which we retrieved here.
2:30 I initialize a defaultdict of lists called directors.
2:35 I open the data with a with statement.
2:39 Then I use the CSV dict reader to parse every line
2:43 into an OrderedDict.
2:44 Every line, I extract the director name, movie title,
2:48 title year, and IMDB score and store them in variables.
2:53 The year, I convert to int.
2:55 The score, I convert to float.
2:57 With data analysis, there's always bad data
2:59 and this is no exception.
3:01 A value error got raised for some rows.
3:03 So when that happens, I just ignore the row.
3:06 I'm not interested in incomplete data.
3:08 I initialize the movie namedtuple
3:10 and give it movie, year, and score.
3:12 That namedtuple gets appended
3:14 to the director in a directors named list.
3:17 So here you see the defaultdict in action.
3:19 I don't have to initialize an empty list
3:23 for every director up front.
3:25 defaultdict handles that all behind the scene.
3:27 And then I return the directors defaultdict.
3:30 Then I call the function and store the results
3:32 in the directors variable and then I can look up directors.
3:36 So there's a lot of stuff you can do with this data.
3:40 Let's do one more exercise.
3:42 I'm going to use Counter to find the directors
3:44 that have most movies in this dataset.
3:47 So I use a counter and I'm going to loop over...
3:53 the directors...
3:54 we stored before.
4:00 I can loop over dictionary with items
4:04 which gives me the value pairs.
4:06 Then I'm going to store the director...
4:11 in the counter object.
4:13 I'm going to sum up the length of the movies.
4:17 You can do this by hand, but the nice thing
4:20 of having a counter object that now I can do counter...
4:24 Most common.
4:28 And there you go.
4:30 Spielberg, Woody Allen, this is pretty plausible.
4:33 So here you got some more practice
4:35 using the collections data types.