Getting Started with NLP and spaCy Transcripts
Chapter: Part 4: NLP with huggingface and LLMs
Lecture: Language support in spaCy

0:00 So let's discuss some related things you might find interesting about spaCy.

0:06 And before talking about all the different plugins, I feel that maybe I should spend some time

0:11 talking about these different models. Now, because this course is in English,

0:16 what I've effectively done is I've taken English models. These models are great, but there are also lots of other languages you might be interested in.

0:26 If you go to this model section on the spaCy documentation, though, you can have a look at all these different pipelines

0:31 that have been trained beforehand. So, for example, let's go to Dutch.

0:36 spaCy doesn't support every language, but when it supports a language with machine learning tools,

0:41 then typically there will be a small, medium, and large model.

0:46 There's lots of details that might matter here, but in general, the smaller models are a little bit less performant

0:51 in terms of entity recognition abilities, but they are really nice and fast and definitely lightweight.

0:56 In this case, the small model is only 12 megabytes, which is pretty nice.

1:01 If I were to contrast that with the large model for Dutch, then we can see that it comes in at about 500 megabytes, which is a whole lot bigger.

1:11 The medium model sits somewhere in the middle, and in general, I can recommend going for the medium model

1:16 when you're just trying to get started. In general, the medium model will be the thing

1:21 that will work just fine, but know that there is a large model available, too. Some languages, not all of them, will also have this TRF model attached,

1:31 which is an abbreviation for transformer. Now, the interesting thing with those models is that they might not be the biggest size in terms of megabytes

1:41 you've got to download, but these are models that are using a so-called transformer architecture under the hood, which is a very heavyweight model

1:51 in terms of compute. It does depend on what you're doing exactly, but you may need a GPU

1:56 in order to run these models comfortably. These models have pretty good performance statistics,

2:01 and again, if you're just getting started, I would definitely go with the medium model instead,

2:06 but it is good to know that these models exist as well, in case you're interested in them, and in case you've got access to a GPU.

Getting Started with NLP and spaCy Transcripts Chapter: Part 4: NLP with huggingface and LLMs Lecture: Language support in spaCy

Getting Started with NLP and spaCy Transcripts
Chapter: Part 4: NLP with huggingface and LLMs
Lecture: Language support in spaCy