Move from Excel to Python with Pandas Transcripts
Chapter: Data wrangling with Pandas
Lecture: Examples of boolean filtering in pandas
0:00 Now let's go through some examples of using Boolean filtering. So I am going to rerun my notebook and all it does right now is read
0:09 in the same Excel file we've been using and show the summary data, Frank. Now let's create our first example of a Boolean index.
0:17 So we see that we have a company name called Viva. And if we want to understand all of the rows where the company aim is viva
0:25 We can do this expression and what pandas does it returns and equivalent value of true or false, depending at the company.
0:34 Name is Viva. So you can see here in row 9 95 and 96 there are Vivas, the company name and it returns a true there because this is Python
0:43 We can assign that to a variable to make her life a little bit easier If we look at the Viva Variable,
0:51 it's the same true false values that we had before Then If we we choose to use DF Lok on Viva, then we have a list of all the invoices for
1:02 company viva! And what what's happened is this true false list has been passed to Lok and then on Lee, the true values are shown for each row.
1:13 There's another shortcut we can use that is pretty common. I use a lot Instead of using Lok,
1:19 we just pass a list of the criteria that we want to apply to the data frame. So here I just say D f and then all those true false values
1:30 and it returns the same value as look. So the question might be Why would you want to use this?
1:37 The DOT lok approach versus just using the brackets and the reason you want to use DOT lok is If you want to be able to control the columns you return,
1:47 then you need to use doubt. Poke. This approach of just using the brackets can essentially just filter on all the data. Keep that in mind,
1:56 and we will go through some more examples to drive that home. Now we can also do mathematical comparisons,
2:03 so let's say if we want to understand where we've purchased at least 10 items or more similar sort of results. So we've got a bunch of truce,
2:15 and false is for each row that has a quantity amount greater than or equal to 10 and what's really nice is you can actually combine these together.
2:28 So now we can see how maney times viva purchased at least 10 items or more We've got to transactions here, and we use the and operator,
2:40 the ampersand operator similar to you what you would use in standard Python for an and operation you can do and or or just a single value here.
2:52 Let's show how we talked about with Lok that we could select multiple columns as well Let's see the purchase date through price and see the difference.
3:03 So instead of returning, all of the columns were just returning the ones between Purchase
3:09 Day and Price. And this is an inclusive list versus some of the other list approaches. You might be experienced within Python.
3:19 Where that last item are. The last index is not included. Remember when we talked about string excess Er's?
3:27 We can use these as well to get Boolean lists. Several of our companies have the word buzz in the name, not necessarily at the beginning or the end,
3:38 and if we use string contains it will search and find all the instances of buzz and give us another Boolean Index or Boolean mask that we can use.
3:50 And let's take a look at some examples. You can really do some very sophisticated analysis with Boolean filtering this way.
4:00 For the final example, we're going to use another string excess er. Let's do a filter on skew and use the string excess er.
4:12 We can find all of the skews that start with F S. And let's do show how we can do a little bit more analysis here. Let's get the products as well.
4:22 So then we can see. Okay, there's a skew there. It starts with poster and combine it with value counts.
4:29 So now it's really easy to tell that we have two types of skews shirts and posters, and this is the number of occurrences of each one of those.