Getting Started with NLP and spaCy Transcripts
Chapter: Part 4: NLP with huggingface and LLMs
Lecture: GliNER

Login or purchase this course to watch this video and the rest of the course contents.
0:00 In my mind the way that you're gonna do NLP projects doesn't change. You're always gonna want to iterate.
0:06 But it is pretty common that you're gonna see new tools pop up once in a while.
0:11 It's a fast moving field, but once in a while there's a tool that just sticks out as being quite useful.
0:17 And when I was wrapping up this course I just noticed this Glee NER library that really does something that I think is likable.
0:25 It is not quite an LLM, it's a bit more lightweight than that. But this package is part of a research paper that you can go ahead and read.
0:33 There's a link on the GitHub repo. But in essence the thinking here is that maybe we can have a very, very, very small LLM-like model
0:43 that is able to run locally on your machine. In effect this would allow you to still get prompt-like behavior to get your named entities,
0:52 but you don't have to send any data to a third party. This model is designed to be somewhat lightweight.
0:58 So to give the quickest demo of this, I've installed the library and I have a little demo here that shows you how you can use it.
1:05 You gotta make sure it's installed. Then to load up a model what you gotta do is you gotta give it a string that points to a type of model.
1:14 I'm giving it one of the more lightweight ones right now. And then this model can make predictions on your behalf when you give it text
1:21 as well as labels to go ahead and detect. In this case I'm saying, well, let's just go for Python tools.
1:28 And in the cell block below over here, what I'm basically doing is I'm looping over all the lines that I've got for my transcripts.
1:38 And then I'm giving that model one line at a time as well as this list of labels that I would like to go ahead and detect.
1:47 That will give me some entities and I'm just gonna show you some of the entities that it was able to detect.
1:52 And keep in mind the only thing that I'm passing here are these labels. It's just a list. This is all the context that the model really has.
2:02 And out of the box it is able to detect a bunch of things that I would argue are pretty relevant.
2:07 I got a couple of Django and HTML detections over here. It's also able to detect Sentry. But we can also see that it makes a few subtle mistakes.
2:17 For example, view over here. To my knowledge that's a JavaScript library, not a Python tool. It's not the worst kind of mistake.
2:26 But similarly I can also see that Sentry-launch-week also gets detected as a Python tool over here.
2:33 And we can scroll down and see some other interesting examples. Async I think is not necessarily a Python tool.
2:41 PyCon and DjangoCon are great conferences but they're not really Python tools either.
2:46 And you can see that there's like subtle mistakes but it does get it right in the realm of Python, so to say.
2:52 And it's even able to sometimes deal with these odd spellings like lowercase fast space API.
2:59 This is not how I would spell the package but it is able to detect that it's referring to it.
3:04 So even though the results over here are not perfect I do feel that it's relevant to mention this.
3:11 Because models like this are incredibly useful when you're annotating datasets.
3:15 You only have to pass a label and even though the predictions are not going to be perfect you are going to get some predictions.
3:22 And when you're annotating it's typically a lot easier to say Is this correct? Yes/No"" than to really manually annotate everything by hand.
3:32 Another thing I would like to emphasize with this model is that this is part of a research paper so
3:36 who knows how well the support is going to be for this model going forward.
3:39 But just the fact that you can run models like this locally that is super nice.
3:45 And it's also no surprise that I think within a month of this paper being out there
3:50 together with this package a community member actually made a plugin for spaCy. So if you were to go to their story/gleener-spaCy
4:01 you are going to find a plugin that you can go ahead and use. And in effect it works in a very similar way.
4:07 spaCy will want some configurations so you're able to set some settings like Hey what labels would I like to have?"" etc.
4:14 You're able to add that as a pipeline step to the NLP pipeline which is also just
4:19 kind of nice. And then you also get these entities but they're not part of your normal spaCy workflow.
4:25 But I hope you agree this is kind of a nice example to maybe end with because it shows that despite the fact that NLP is a somewhat fast-moving field
4:34 the fact that we have different components that can always click into a spaCy pipeline
4:38 that's definitely kind of nice and it really fits the vibe of spaCy.
4:42 The goal is to have a somewhat general pipeline for NLP projects and spaCy is just a really really useful tool for that because it's nice and flexible.


Talk Python's Mastodon Michael Kennedy's Mastodon