Getting Started with NLP and spaCy Transcripts
Chapter: Part 4: NLP with huggingface and LLMs
Lecture: GliNER

0:00 In my mind the way that you're gonna do NLP projects doesn't change. You're always gonna want to iterate.

0:06 But it is pretty common that you're gonna see new tools pop up once in a while.

0:11 It's a fast moving field, but once in a while there's a tool that just sticks out as being quite useful.

0:17 And when I was wrapping up this course I just noticed this Glee NER library that really does something that I think is likable.

0:25 It is not quite an LLM, it's a bit more lightweight than that. But this package is part of a research paper that you can go ahead and read.

0:33 There's a link on the GitHub repo. But in essence the thinking here is that maybe we can have a very, very, very small LLM-like model

0:43 that is able to run locally on your machine. In effect this would allow you to still get prompt-like behavior to get your named entities,

0:52 but you don't have to send any data to a third party. This model is designed to be somewhat lightweight.

0:58 So to give the quickest demo of this, I've installed the library and I have a little demo here that shows you how you can use it.

1:05 You gotta make sure it's installed. Then to load up a model what you gotta do is you gotta give it a string that points to a type of model.

1:14 I'm giving it one of the more lightweight ones right now. And then this model can make predictions on your behalf when you give it text

1:21 as well as labels to go ahead and detect. In this case I'm saying, well, let's just go for Python tools.

1:28 And in the cell block below over here, what I'm basically doing is I'm looping over all the lines that I've got for my transcripts.

1:38 And then I'm giving that model one line at a time as well as this list of labels that I would like to go ahead and detect.

1:47 That will give me some entities and I'm just gonna show you some of the entities that it was able to detect.

1:52 And keep in mind the only thing that I'm passing here are these labels. It's just a list. This is all the context that the model really has.

2:02 And out of the box it is able to detect a bunch of things that I would argue are pretty relevant.

2:07 I got a couple of Django and HTML detections over here. It's also able to detect Sentry. But we can also see that it makes a few subtle mistakes.

2:17 For example, view over here. To my knowledge that's a JavaScript library, not a Python tool. It's not the worst kind of mistake.

2:26 But similarly I can also see that Sentry-launch-week also gets detected as a Python tool over here.

2:33 And we can scroll down and see some other interesting examples. Async I think is not necessarily a Python tool.

2:41 PyCon and DjangoCon are great conferences but they're not really Python tools either.

2:46 And you can see that there's like subtle mistakes but it does get it right in the realm of Python, so to say.

2:52 And it's even able to sometimes deal with these odd spellings like lowercase fast space API.

2:59 This is not how I would spell the package but it is able to detect that it's referring to it.

3:04 So even though the results over here are not perfect I do feel that it's relevant to mention this.

3:11 Because models like this are incredibly useful when you're annotating datasets.

3:15 You only have to pass a label and even though the predictions are not going to be perfect you are going to get some predictions.

3:22 And when you're annotating it's typically a lot easier to say Is this correct? Yes/No"" than to really manually annotate everything by hand.

3:32 Another thing I would like to emphasize with this model is that this is part of a research paper so

3:36 who knows how well the support is going to be for this model going forward.

3:39 But just the fact that you can run models like this locally that is super nice.

3:45 And it's also no surprise that I think within a month of this paper being out there

3:50 together with this package a community member actually made a plugin for spaCy. So if you were to go to their story/gleener-spaCy

4:01 you are going to find a plugin that you can go ahead and use. And in effect it works in a very similar way.

4:07 spaCy will want some configurations so you're able to set some settings like Hey what labels would I like to have?"" etc.

4:14 You're able to add that as a pipeline step to the NLP pipeline which is also just

4:19 kind of nice. And then you also get these entities but they're not part of your normal spaCy workflow.

4:25 But I hope you agree this is kind of a nice example to maybe end with because it shows that despite the fact that NLP is a somewhat fast-moving field

4:34 the fact that we have different components that can always click into a spaCy pipeline

4:38 that's definitely kind of nice and it really fits the vibe of spaCy.

4:42 The goal is to have a somewhat general pipeline for NLP projects and spaCy is just a really really useful tool for that because it's nice and flexible.

Getting Started with NLP and spaCy Transcripts Chapter: Part 4: NLP with huggingface and LLMs Lecture: GliNER

Getting Started with NLP and spaCy Transcripts
Chapter: Part 4: NLP with huggingface and LLMs
Lecture: GliNER