#100DaysOfCode in Python Transcripts
Chapter: Days 28-30: Regular Expressions
Lecture: findall is your friend
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
And now my favorite method of the re module: findall. findall is useful to match a pattern in a string, and to get all the occurrences of that pattern.
0:14
In 100 Days of Code, we wrote a script module in that, which returned three columns: a module, if it was standard lib, and the days that we used it.
0:23
As you see in the re module, we used quite a lot. And it will give a link to the actual scripts at the end of this lesson. Let's write a
0:33
regular expression to extract all the days, and findall really shines in this kind of task. So we do re.findall ... raw string. One or more digits,
0:48
and we have to specify the string as the second argument. And look at that. One simple statement and we got all the days. That's awesome.
0:59
Let's do a second example. Here is some text, and let's extract the words with a regular expression first. re.findall, if I could type. raw string.
1:17
One or more characters. Text, and I first need to load that in. Bang. Now you can do this, also with text split.
1:31
I'm going to make it just the first five one. So you don't really need a regular expression to split a text string into a list.
1:41
Let's say we want to find out the most common words, but only the ones that start with an uppercase character.
1:48
So, then you can use a regular expression like, and I'm using character classes which are in square brackets, so let's define an uppercase
2:01
and then we have one or more lowercase characters or digits, and the plus is one or more, if you want zero more you do an asterisk.
2:13
And we want to do that on the text. And here we have all the words starting with an uppercase. Now just for the fun of it, let's wrap that in a counter
2:25
to get the most common words. So we are going to use from collections import counter, and don't worry I will cover counter
2:37
more in detail in the collections lesson. Counter receives a list, so re.findall returns a list as we saw earlier.
2:48
So we can just make a counter object, passing that into the counter, and as you see we get some counts here. And to find out the most common words,
3:02
we can then do the most common method on that counter object. And lorem and ipsum are the winners. So very powerful tool. I really like findall.
3:13
This typical Python example, that in one line of code you can do a lot of good stuff.