Getting Started with NLP and spaCy Transcripts
Chapter: Part 3: spaCy Projects
Lecture: What is an NLP project

0:00 So I'm just going to draw out schematically what kind of things I need in my NLP project, just to kind of get the project structure maybe going.

0:10 So one thing I've got, let's draw that over here, are my transcripts.

0:16 These are the things that were spoken inside of a podcast, and there's stuff in here that I would like to predict.

0:22 However, if I'm going to have a machine learning learn anything, then I will also need to have some labels.

0:29 I will need to figure out some sort of way to turn at least a subset of these transcripts into a subset that is, I will call, annotated.

0:39 And just to give a quick example, if I have a sentence, something like ""Python is nice,

0:46 then this annotated subset would have that sentence, but also something that indicates that Python over here, that is a tech tool, let's say.

0:53 And I need to have some sort of data set where my machine learning model is able to learn from these annotated patterns.

1:01 Once I've got my annotated subset, there's actually another step, and that is to maybe prepare this data set for training.

1:10 There's a little bit of a detail here. Typically what we want to do is you want to have one set of data that you are going to

1:16 train on, and another set of data that you're going to use for evaluation.

1:23 Then this training data set over here, that can be used to train a machine learning model.

1:29 And that machine learning model, maybe we want to be able to package that.

1:33 And as you can see from this little overview, I do hope that you appreciate that there are actually a bunch of steps here that depend on each other.

1:41 And it'd be nice if we can structure our project accordingly.

1:44 Note that another aspect of this is that suppose that I've got my annotated subset over here, well, then I can train a machine learning model.

1:54 But if this subset doesn't change, then there's also no need to retrain this machine learning model.

1:58 So there's also something I would like to have in the system that is going to prevent unnecessary work.

2:04 So hopefully this diagram paints you a picture of what we need. We are going to need separate steps in this entire process.

2:12 But before diving into the code, what I would just like to do first is just give a glimpse of how to do this part.

2:22 Creating proper training data is an art in and of itself. But there are things that we have at our disposal to make this easier.

2:29 And I'm going to discuss that first before moving on to how I'm going to implement this project structure.

Getting Started with NLP and spaCy Transcripts Chapter: Part 3: spaCy Projects Lecture: What is an NLP project

Getting Started with NLP and spaCy Transcripts
Chapter: Part 3: spaCy Projects
Lecture: What is an NLP project