Getting Started with NLP and spaCy Transcripts
Chapter: Part 2: Exploring data with spaCy
Lecture: Performance: Part 2
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
In the previous video we got a speed boost by using this NLP.pipe method, but there is also another improvement that we can make.
0:09
To help explain it, let's just dive into this NLP object a little bit. Because there is this pipeline object inside of it
0:17
that tells us what kind of components are actually active.
0:22
I can see for example that there is a tagger, that there is a parser, that there is a lemmatizer, and also a named entity recognition component.
0:31
To dive in a bit deeper, this tagger component, that's a component that's making sure that each token has a part of speech attribute attached.
0:39
So that would be stuff like, is this token a verb or a noun? There's also a grammatical parser.
0:45
And all these components are in general pretty dang useful. But if in our case we are only interested in doing named entity recognition,
0:55
well, then we can also just turn all of these other components off. We have to be a little bit careful when we do that,
1:02
because this named entity component does depend on this token2vec component, there are dependencies in this thing.
1:08
But one thing that helps us with that is that there is this setting called enable, in the spaCy.load method,
1:15
where we can say, well, let's just enable this one part of the component,
1:18
and then spaCy internally will make sure that this NER component can still run, all the dependencies will be there,
1:25
but everything else will just be turned off. So let's rerun this. That now gives us a new NLP object. And let's run this code one more time,
1:38
to see if we can get a little bit more juice out of this. Ah, nice, that's again a fair bit quicker.
1:46
So in general, definitely be mindful if you're gonna only use a subset of a model, because you might have components missing if you're not careful.
1:54
But in this particular case, I'm only interested in a component that can do entity recognition for me, and I definitely welcome this speedup.