Eve: Building RESTful APIs with MongoDB and Flask Transcripts
Chapter: Your first Eve service
Lecture: Defining document schemas

Login or purchase this course to watch this video and the rest of the course contents.
0:00 In this section, I'm going to show you how to remove stop words. Stop words are words that don't add value to your text.
0:07 Oftentimes when we're doing natural language processing, we want to get rid of stop words.
0:11 Things like a, the, things that occur a lot but don't really mean anything or add value.
0:17 We're going to use the spaCy library to do that. Make sure you install that.
0:21 After you install it, you need to download some English files so that it understands how to process English.
0:27 This is the command to load this small data set here. Then you can validate that your spaCy install worked.
0:37 You can see that I have downloaded that small one. I'm going to load spaCy and then I'm going to say load that small English data.
0:47 Now I'm going to remove the stop words. I'm going to use apply here and say, okay, here's the remove text. We're going to apply this function here.
0:57 And we pass in this NLP object. What this is going to do if we look at it is it's going to get a document from that,
1:07 which understands what's going on with the text. Then I'm going to loop over the tokens in the document here.
1:14 And if it's not a stop word, I'm going to stick that in there. So let's run that. I'm also using the time cell magic at the top.
1:24 This is going to take a while. This is using apply, which is slow. It's also working with strings, which tend to be slow as well.
1:32 But there's not really a way to vectorize this and make it much quicker. So we'll just deal with that. Okay, so this takes about 30 seconds.
1:44 You can see that I've got, it looks like some HTML in here. So I might want to further replace some of that HTML.
1:53 And I could put in code like this to do further manipulation there. Let's just load the original data so you can compare the two data sets
2:11 and see that the stop words are being removed. Okay, so that's looking better. Here is the original data you can see for a movie that gets no respect.
2:24 It got changed to movie gets respect, sure, lot, memorable quotes. You can see the bottom one here. I saw this at the premiere in Melbourne.
2:34 Saw premiere in Melbourne. Do you need to remove stop words? No, you don't, but this is something that's going to make your models
2:41 perform better because there's a lot of noise in those stop words.


Talk Python's Mastodon Michael Kennedy's Mastodon