Getting Started with NLP and spaCy Transcripts
Chapter: Part 2: Exploring data with spaCy
Lecture: Performance: Part 2

Login or purchase this course to watch this video and the rest of the course contents.
0:00 In the previous video we got a speed boost by using this NLP.pipe method, but there is also another improvement that we can make.
0:09 To help explain it, let's just dive into this NLP object a little bit. Because there is this pipeline object inside of it
0:17 that tells us what kind of components are actually active.
0:22 I can see for example that there is a tagger, that there is a parser, that there is a lemmatizer, and also a named entity recognition component.
0:31 To dive in a bit deeper, this tagger component, that's a component that's making sure that each token has a part of speech attribute attached.
0:39 So that would be stuff like, is this token a verb or a noun? There's also a grammatical parser.
0:45 And all these components are in general pretty dang useful. But if in our case we are only interested in doing named entity recognition,
0:55 well, then we can also just turn all of these other components off. We have to be a little bit careful when we do that,
1:02 because this named entity component does depend on this token2vec component, there are dependencies in this thing.
1:08 But one thing that helps us with that is that there is this setting called enable, in the spaCy.load method,
1:15 where we can say, well, let's just enable this one part of the component,
1:18 and then spaCy internally will make sure that this NER component can still run, all the dependencies will be there,
1:25 but everything else will just be turned off. So let's rerun this. That now gives us a new NLP object. And let's run this code one more time,
1:38 to see if we can get a little bit more juice out of this. Ah, nice, that's again a fair bit quicker.
1:46 So in general, definitely be mindful if you're gonna only use a subset of a model, because you might have components missing if you're not careful.
1:54 But in this particular case, I'm only interested in a component that can do entity recognition for me, and I definitely welcome this speedup.


Talk Python's Mastodon Michael Kennedy's Mastodon