Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 2: Excel Integration with Adult Income Data
Lecture: Quantifying Strings with filter and value_counts

Login or purchase this course to watch this video and the rest of the course contents.
0:00 I'm going to show how to explore some of the object columns again. Let's jump into that code.
0:06 So again, what I can do is I can say select D types and then say string. This will give me all of the string columns.
0:12 In Pandas 1, you would say object there, but because we have those PyArrow types, we can say string here. Remember what I said previously,
0:21 value counts is your friend here. So let's explore education. I'm going to say education.valueCounts. And here's a summary of that.
0:30 If I wanted to visualize that, look how easy this is. I'm going to say .plot.barH to do a horizontal bar plot,
0:38 and we can visualize that really easily there. So we see that most of these are high school graduates.
0:43 We have some college graduates, some masters, etc. If I want to filter columns, I want to get the columns that have education in them.
0:54 One of the things I can do is use this filter operation, and here are the columns that have education in them.
1:02 Note that this valueCounts also works with numbers as well. So if we take the age column, we might want to summarize the age.
1:10 Again, I'd probably do histogram here, but we can do a valueCounts on that. We can sort the index there.
1:17 In fact, if we do a plot and we do a bar on that, we're kind of getting the histogram by doing that.
1:24 So this is a very manual way of doing a histogram here. Again, I would probably just do age.hist to get a similar thing here.
1:32 If we want to bump up the bins, we'd say bins is equal to 20, and maybe we say figSize, so it doesn't come off the screen, 8 by 3.
1:44 In this section, we looked at pulling out those object columns. Again, valueCounts is your friend to summarize those.
1:51 We also saw that one of the things that you can do is you can use filter to limit what columns you're pulling out as well with a regular expression.
1:59 Filter has a bunch of other options as well. Again, I recommend that you pull that documentation up in Jupyter and see how to use it in other contexts.


Talk Python's Mastodon Michael Kennedy's Mastodon