Getting Started with NLP and spaCy Transcripts
Chapter: Part 2: Exploring data with spaCy
Lecture: Kicking the tires

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Alright, at this point in time we have our little generator that's able to give me a
0:08 full line from an episode and I can keep on requesting a generator to give me back more
0:15 stuff. So that's good but let's now actually start using it together with spaCy. Import
0:20 spaCy just for good measure, spaCy.load, let's go with the medium English model for now and
0:28 what this allows me to do is it allows me to say well whenever I call next on a generator
0:32 let's just only grab the text for now. That is indeed a bit of text and that text is something
0:41 we can pass on to spaCy which will give us a document object. Now just for good measure
0:47 what I'm going to go ahead and do is I'm going to say from spaCy displayCy I'm going to import
0:54 that render function that allows me to make a pretty chart and that's just going to be
1:00 a convenient way for me to explore this document. And this is a flow that I do kind of like
1:07 when I'm trying to get a feel of how well models behave on a dataset because I can just
1:12 keep running this cell, it's going to then grab the next line and I kind of get a nice
1:18 visual for all the stuff that it's been detecting. And in this case we can definitely see that
1:23 there are some entities in this model that are being detected correctly but there's also
1:26 some interesting things happening under the hood here. So let's just check when it comes
1:31 to artificial intelligence AI, AI in this case is being detected as an organization
1:36 so that's an interesting prediction I suppose. But then we notice that what's good for a
1:41 trillion dollar that's a monetary amount that got detected correctly. But what's good for
1:46 a trillion dollar companies isn't necessarily good for people that's the theme of season
1:51 seven which in this case got picked up as a date of IRL, Mozilla, then I see Bridget
1:58 Todd and name, season seven is being detected as a date here again and AI is being detected
2:03 again. So it's not immediately perfect but some of the predictions I hope do make sense.
2:10 Let's see if we can find another example. So okay I ran the cell a couple of times again
2:15 until I hit this big paragraph over here and again the model makes some good decisions
2:20 but also some curious ones. Week is spotted as a date that feels okay, annual is spotted
2:26 as a date that also feels okay, over a dozen, three or four that's a cardinal number that
2:32 also feels pretty good, the talkpython.fm/centurylaunchweek that's being detected as a person. And one
2:41 thing you are noticing here is that this spacey model isn't exactly trained on this kind of
2:48 data. I do encourage you to do this exercise yourself for a bit just to get a bit of a
2:53 feel of what kind of things the model does well and what kind of things the model does
2:57 poorly. Under the hood I do think that the spacey model does a lot of good for you on
3:01 your behalf but it is good to just observe that this is still a statistical model and
3:06 that there are all sorts of reasons why the results over here are not going to be perfect.


Talk Python's Mastodon Michael Kennedy's Mastodon