Getting Started with NLP and spaCy in Jupyter Notebooks Course

Getting Started with NLP and spaCy

Course Summary

There is a lot of text data out there and maybe you're interested in getting structured data out of it. Maybe you can make do without machine learning for this, but you might also consider using LLMs to help you with this. There are a lot of options out there and this course will introduce you to the field by focussing on spaCy while also exploring other tools.

As a motivation example, we'll have a look at the transcripts from the Talk Python podcast to see if we can automatically detect Python packages from it. We'll clean the data, explore it with spaCy, and training custom models. By the end of the course, you'll understand how to set up a proper NLP project and we'll also explore techniques beyond spaCy.

What students are saying

Effective PyCharm course is awesome. I have been using the IDE for a little while but you've opened up a whole world of features I never knew existed.
-- Nader S

Source code and course GitHub repository

github.com/talkpython/nlp-with-python-and-spacy-course

What's this course about and how is it different?

This course explores NLP with a focus on spaCy. You will be able to follow along by downloading the code but the videos themselves offer plenty of whiteboarding moments that help motivate why the code is written the way that it is.

What topics are covered

In this course, you will:

  • Learn the building blocks of spaCy including tokens, spans, documents and entities.
  • See what the spaCy base models can detect on your behalf
  • Understand why generators can really make it easier to handle text files
  • Set up spaCy such that it can handle large datasets
  • Understand how spaCy entities can be interpreted with flexibility
  • Build spaCy pipelines with business rules
  • Work with spaCy projects
  • Train custom spaCy models on your own annotated data
  • Understand the value of annotating your own data
  • Learn how to prompt ChatGPT to tackle NLP tasks
  • Configure LLMs for NLP via spaCy-LLM
  • Leverage the Huggingface ecosystem via GliNER and SpanMarker

Who is this course for?

This course is for anyone who knows a bit of Python and wants to dip their toes into NLP. While the course does discuss machine learning it does not rely on any knowledge of maths. The course will focus on spaCy as a primary tool but it will also discuss how to setup projects for NLP in general. At the end the course, we will also discuss related tools like large language models, and discuss how these can be effectively used in NLP projects.

If you're curious about NLP or just feel like expanding your data skills in a new domain, then this course is for you!

We do assume that you know core Python syntax and concepts such as virtual environments and external packages. If you are entirely new to Python, see our Python for Beginners course.

Concepts backed by concise visuals

While exploring a topic interactively with demos and live code is very engaging, it can mean losing the forest for the trees. That's why when we hit a new topic, we stop and discuss it with concise and clear visuals.

Vincent uses a drawing tablet to explain what the code is doing by drawing over the code in real time. This allows him to add plenty of context and it will also help you understand which steps need to happen and why. Here's an animated graphic that shows what it might look like.

Example: Concepts backed by concise visuals

Get hands-on with almost every chapter

Learning a new language features and concepts is an interactive experience. That's why it's important to write play around with the code from this course yourself. The code for this course can be found on Github.

This course is delivered in very high resolution

Example of 1440p high res video

This course is delivered in 1440p (4x the pixels as 720p). When you're watching the videos for this course, it will feel like you're sitting next to the instructor looking at their screen.

Every little detail, menu item, and icon is clear and crisp. Watch the introductory video at the top of this page to see an example.

Follow along with subtitles and transcripts

Each course comes with subtitles and full transcripts. The transcripts are available as a separate searchable page for each lecture. They also are available in course-wide search results to help you find just the right lecture.

Each course has subtitles available in the video player.

Who am I? Why should you take my course?

Who is Vincent D. Warmerdam?

My name is Vincent, nice to meet you. ;) I'm a senior data professional who worked as an engineer, researcher, team lead, and educator in the past. I'm well known for my PyData talks as well as many side projects for machine learning practitioners. In particular, he maintains calmcode.io, where people can learn how to code … calmly.

The time to act is now

If this course is a good fit for you and your goals, you should jump right in. It goes deep but doesn't skip out too much the foundation background info. You'll start to see benefits for your code right away. And it comes with a 100% money back guarantee (see details), so it's 100% risk free as well.

Course Outline: Chapters and Lectures

Welcome to the Course
8:26
What the course is about
1:39
The outline of the course
1:19
Course requirements
1:37
About the instructor
2:02
Git the code
1:49
Setup
9:09
Installing packages
3:26
Some things about Jupyter
5:43
Part 1: spaCy syntax
22:17
Tokens
4:36
Properties
4:33
Displacy
4:57
Document properties
3:42
Spans
4:29
Part 2: Exploring data with spaCy
28:55
Diving into transcripts
2:11
Cleaning transcripts
4:11
Why generators?
3:17
Kicking the tires
3:10
Testing a product hypothesis
2:37
Performance: Part 1
4:18
Performance: Part 2
2:01
Giving the setup a spin
4:38
Performance: Part 3
2:32
Part 3: spaCy Projects
38:06
Introduction
1:31
What is an NLP project
2:33
Annotation
5:17
spaCy projects
5:00
Converting data
4:17
ML Config
3:43
Training the model
6:09
Project finals
4:37
Pragmatism
4:09
Beyond projects
0:50
Part 4: NLP with huggingface and LLMs
37:31
Introduction
0:46
Language support in spaCy
2:13
spaCy plugins
2:55
Spanmarker
4:57
Text classification
2:07
ChatGPT for NLP
2:03
Prompting
4:57
spaCy-LLM
3:11
Boosting spaCy-LLM performance
5:54
spaCy remains super useful
3:39
GliNER
4:49
Wrap up
2:15
Next steps
2:15
Buy for $49 + tax Bundle and save 85% Team Gift

Questions? Send us an email: contact@talkpython.fm

Talk Python's Mastodon Michael Kennedy's Mastodon