Getting Started with NLP and spaCy Transcripts
Chapter: Part 2: Exploring data with spaCy
Lecture: Kicking the tires
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Alright, at this point in time we have our little generator that's able to give me a
0:08
full line from an episode and I can keep on requesting a generator to give me back more
0:15
stuff. So that's good but let's now actually start using it together with spaCy. Import
0:20
spaCy just for good measure, spaCy.load, let's go with the medium English model for now and
0:28
what this allows me to do is it allows me to say well whenever I call next on a generator
0:32
let's just only grab the text for now. That is indeed a bit of text and that text is something
0:41
we can pass on to spaCy which will give us a document object. Now just for good measure
0:47
what I'm going to go ahead and do is I'm going to say from spaCy displayCy I'm going to import
0:54
that render function that allows me to make a pretty chart and that's just going to be
1:00
a convenient way for me to explore this document. And this is a flow that I do kind of like
1:07
when I'm trying to get a feel of how well models behave on a dataset because I can just
1:12
keep running this cell, it's going to then grab the next line and I kind of get a nice
1:18
visual for all the stuff that it's been detecting. And in this case we can definitely see that
1:23
there are some entities in this model that are being detected correctly but there's also
1:26
some interesting things happening under the hood here. So let's just check when it comes
1:31
to artificial intelligence AI, AI in this case is being detected as an organization
1:36
so that's an interesting prediction I suppose. But then we notice that what's good for a
1:41
trillion dollar that's a monetary amount that got detected correctly. But what's good for
1:46
a trillion dollar companies isn't necessarily good for people that's the theme of season
1:51
seven which in this case got picked up as a date of IRL, Mozilla, then I see Bridget
1:58
Todd and name, season seven is being detected as a date here again and AI is being detected
2:03
again. So it's not immediately perfect but some of the predictions I hope do make sense.
2:10
Let's see if we can find another example. So okay I ran the cell a couple of times again
2:15
until I hit this big paragraph over here and again the model makes some good decisions
2:20
but also some curious ones. Week is spotted as a date that feels okay, annual is spotted
2:26
as a date that also feels okay, over a dozen, three or four that's a cardinal number that
2:32
also feels pretty good, the talkpython.fm/centurylaunchweek that's being detected as a person. And one
2:41
thing you are noticing here is that this spacey model isn't exactly trained on this kind of
2:48
data. I do encourage you to do this exercise yourself for a bit just to get a bit of a
2:53
feel of what kind of things the model does well and what kind of things the model does
2:57
poorly. Under the hood I do think that the spacey model does a lot of good for you on
3:01
your behalf but it is good to just observe that this is still a statistical model and
3:06
that there are all sorts of reasons why the results over here are not going to be perfect.