#100DaysOfCode in Python Transcripts
Chapter: Days 28-30: Regular Expressions
Lecture: findall is your friend
0:00 And now my favorite
0:02 method of the re module: findall.
0:06 findall is useful to match a pattern
0:09 in a string,
0:10 and to get all the occurrences of that pattern.
0:13 In 100 Days of Code, we wrote a script module in that,
0:15 which returned three columns:
0:18 a module, if it was standard lib,
0:20 and the days that we used it.
0:22 As you see in the re module,
0:24 we used quite a lot.
0:26 And it will
0:27 give a link to the actual scripts at the end of this lesson.
0:30 Let's write a
0:32 regular expression to extract all the days,
0:35 and findall really shines in this kind of task.
0:39 So we do re.findall ...
0:42 raw string.
0:45 One or more digits,
0:47 and we have to specify
0:49 the string as the second argument.
0:52 And look at that.
0:53 One simple statement
0:55 and we got all the days.
0:57 That's awesome.
0:58 Let's do a second example.
1:00 Here is some text,
1:01 and let's extract the words
1:04 with a regular expression first.
1:08 re.findall, if I could type.
1:14 raw string.
1:16 One or more characters.
1:18 Text, and I first need to load that in.
1:24 Now you can do this,
1:25 also with text split.
1:30 I'm going to make it just the first five one.
1:33 So you don't really need a regular expression to
1:36 split a text string into a list.
1:40 Let's say we want to find out the most common words,
1:43 but only the ones that start with an uppercase character.
1:47 So, then you can use a regular expression like,
1:53 and I'm using character classes
1:54 which are in square brackets,
1:57 so let's define an uppercase
2:00 and then we have
2:01 one or more lowercase
2:04 characters or digits,
2:07 and the plus is one or more,
2:09 if you want zero more you do an asterisk.
2:12 And we want to do that on the text.
2:15 And here we have all the words
2:18 starting with an uppercase.
2:20 Now just for the fun of it,
2:22 let's wrap that in a counter
2:24 to get the most common words.
2:26 So we are going to use from collections
2:31 import counter,
2:33 and don't worry I will cover counter
2:36 more in detail in the collections lesson.
2:40 Counter receives a list,
2:42 so re.findall returns a list as we
2:46 saw earlier.
2:47 So we can just make a counter object,
2:51 passing that into the counter,
2:55 and as you see we get some counts here.
2:58 And to find out the most common words,
3:01 we can then do the most common method
3:04 on that counter object.
3:06 And lorem and ipsum are the winners.
3:09 So very powerful tool.
3:11 I really like findall.
3:12 This typical Python example, that in
3:15 one line of code you can
3:17 do a lot of good stuff.