Move from Excel to Python with Pandas Transcripts
Chapter: Data wrangling with Pandas
Lecture: Examples of boolean filtering in pandas
0:00 Now let's go through some examples of using Boolean filtering.
0:03 So I am going to rerun my notebook and all it does right now is read
0:08 in the same Excel file we've been using and show the summary data,
0:12 Frank. Now let's create our first example of a Boolean index.
0:16 So we see that we have a company name called Viva.
0:19 And if we want to understand all of the rows where the company aim is viva
0:24 We can do this expression and what pandas does it returns and equivalent value of
0:30 true or false, depending at the company.
0:33 Name is Viva. So you can see here in row 9 95 and 96 there
0:38 are Vivas, the company name and it returns a true there because this is python
0:42 We can assign that to a variable to make her life a little bit easier
0:47 If we look at the Viva Variable,
0:50 it's the same true false values that we had before Then If we we choose to
0:55 use DF Lok on Viva, then we have a list of all the invoices for
1:01 company viva! And what what's happened is this true false list has been passed to
1:07 Lok and then on Lee, the true values are shown for each row.
1:12 There's another shortcut we can use that is pretty common.
1:15 I use a lot Instead of using Lok,
1:18 we just pass a list of the criteria that we want to apply to the data
1:24 frame. So here I just say D f and then all those true false values
1:29 and it returns the same value as look.
1:33 So the question might be Why would you want to use this?
1:36 The DOT lok approach versus just using the brackets and the reason you want to use
1:41 DOT lok is If you want to be able to control the columns you return,
1:46 then you need to use doubt.
1:48 Poke. This approach of just using the brackets can essentially just filter on all the
1:54 data. Keep that in mind,
1:55 and we will go through some more examples to drive that home.
1:59 Now we can also do mathematical comparisons,
2:02 so let's say if we want to understand where we've purchased at least 10 items or
2:08 more similar sort of results. So we've got a bunch of truce,
2:14 and false is for each row that has a quantity amount greater than or equal to
2:22 10 and what's really nice is you can actually combine these together.
2:27 So now we can see how maney times viva purchased at least 10 items or more
2:33 We've got to transactions here,
2:37 and we use the and operator,
2:39 the ampersand operator similar to you what you would use in standard python for an and
2:45 operation you can do and or or just a single value here.
2:51 Let's show how we talked about with Lok that we could select multiple columns as well
2:56 Let's see the purchase date through price and see the difference.
3:02 So instead of returning, all of the columns were just returning the ones between Purchase
3:08 Day and Price. And this is an inclusive list versus some of the other list
3:15 approaches. You might be experienced within Python.
3:18 Where that last item are. The last index is not included.
3:23 Remember when we talked about string excess Er's?
3:26 We can use these as well to get Boolean lists.
3:32 Several of our companies have the word buzz in the name,
3:35 not necessarily at the beginning or the end,
3:37 and if we use string contains it will search and find all the instances of buzz
3:44 and give us another Boolean Index or Boolean mask that we can use.
3:49 And let's take a look at some examples.
3:53 You can really do some very sophisticated analysis with Boolean filtering this way.
3:59 For the final example, we're going to use another string excess er.
4:05 Let's do a filter on skew and use the string excess er.
4:11 We can find all of the skews that start with F S.
4:15 And let's do show how we can do a little bit more analysis here.
4:19 Let's get the products as well.
4:21 So then we can see. Okay,
4:23 there's a skew there. It starts with poster and combine it with value counts.
4:28 So now it's really easy to tell that we have two types of skews shirts and
4:33 posters, and this is the number of occurrences of each one of those.