Getting Started with NLP and spaCy Transcripts
Chapter: Part 1: spaCy syntax
Lecture: Spans

0:00 So far in these videos, we've been talking about some of the building blocks in spaCy. So what we've seen is that we have a doc object,

0:09 a document, and that it has some tokens. That's all well and good, but we also saw this thing called an entity. And there's an interesting thing there

0:20 because we also noticed that an entity, even though it is definitely part of a document, we also noticed that an entity

0:26 can actually contain one or more tokens. So you might wonder what is up with that. In short, an entity can be seen as a new concept

0:37 that we haven't explained yet, that's called a span. And a span can be thought of as a sequence of tokens in order. And to maybe help explain that,

0:46 I'll go ahead and explore that with some code right now. I have my sentence here, ""Hi, my name is Vincent. That gives me a document.

0:55 And just to confirm, this is the representation of the document. It looks like a string, but it's actually a spaCy document.

1:04 The type is being confirmed here. And I can do the same thing for the first token in that document. So just for good measure,

1:11 let's just grab that thing that's the token ""Hi. And we can confirm that that's indeed a token. But let's now grab some more.

1:21 So this is grabbing the first two tokens. That will be ""Hi"" plus the punctuation point over here. Those are two separate tokens.

1:34 And the type of those two tokens together, attached like this, that's a span. Now we will remember that because one property that this document has

1:46 is it has all the available entities. And we can confirm that Vincent is indeed an entity on that document.

1:53 So let's loop over that for ""int"" in document entities. Let's print that. So we can see that the entity Vincent is actually a span.

2:10 It's not a separate entity class. It is really just a span object. And spans also have a couple of properties.

2:18 So they tend to have a start and end segment. In this case, that means that the start token will be index five and it would end at index six.

2:30 So let's count one, two, three, four, five. That's where it starts. And then six where it ends. So that seems correct.

2:40 But I can also query for the starting character and the ending character. Depending on what the use case is, you might be more interested

2:49 in where the characters start and end. Now at this point, you might wonder, well, if an entity is just a span, what makes it so special?

2:57 And the primary reason is that an entity has a label that is attached. So we can confirm that this span, this Vincent span, so to say,

3:09 that has a person label attached. We can see that through this property. And that's not the case for this span that I can select,

3:19 like the first three characters. If I were to query for the label there, it is going to tell me that it's an empty string.

3:25 So this label is something that I would only expect on a span that is actually an entity in a sentence. So that's just really good to remember.

3:34 But moreover, the reason why we need a span here, that's related to the fact that an entity can have more than one token in it. So as we can see now,

3:44 if I were to change my name to my first and last name, then the entity updates this full name over here. That's the entity that's being detected.

3:53 I need something that can represent that. And that's what we have the span for inside of spaCy. Now, again, this span needs to have tokens

4:02 that are consecutive. So first name and then last name, but you can't have empty tokens in the middle. It all has to be sequential.

4:10 And we can have many different kinds of spans. We can select many of them, but typically the entities as found on this doc.ents property,

4:18 those will have a label that we are typically interested in. So maybe in summary, an entity in spaCy is a span,

4:26 but not every span in spaCy is an entity.

Getting Started with NLP and spaCy Transcripts Chapter: Part 1: spaCy syntax Lecture: Spans

Getting Started with NLP and spaCy Transcripts
Chapter: Part 1: spaCy syntax
Lecture: Spans