Build An Audio AI App Course

Build An Audio AI App
4.2 hours, 100% free
Take this course for FREE
This course is carbon neutral.

Course Summary

If you work with audio and video content, a whole new world of possibilities has opened up with the advent of AI. Now that it is nearly instant and free to extract highly accurate text from spoken word. And with that text, we can expand the possibilities in many directions. In this course, we will build a realistic and real-world app working with audio from podcasts. We will summarize them, generate transcripts, and even create specialized GPTs for that content. All of that will be done with amazing technologies such as FastAPI, Pydantic, MongoDB, HTMX and AssemblyAI.

What students are saying

I was very looking forward to this [course] and I must say that it did not disappoint me. The course is really amazing and Michael Kennedy did it very well. I highly recommend it!
-- Tomas

Source code and course GitHub repository

github.com/talkpython/audio-ai-with-assemblyai-course

What's this course about and how is it different?

You may find courses that work with text derived from audio. However, this course is about building a real-world app with full end-to-end functionality for many features. We don't just pull text from audio. We create a whole world.

Here is a screenshot of one aspect of our demo application.

What topics are covered

In this course, you will:

  • Learn about AssemblyAI and their audio/video APIs
  • See how to use HTMX in a FastAPI application
  • Use Beanie and Pydantic to model data in MongoDB
  • Create transcripts from hours of audio content
  • Build a custom search engine on top of audio data and transcripts
  • Use HTMX for no-refresh active search UIs
  • Run background jobs using asyncio and Python
  • Monitor background work with HTMX's poll event for zero-touch UI updates (and zero JavaScript)
  • Learn practical techniques for prompt engineering when working with LLMs
  • Summarize large audio content using AssemblyAI's LeMUR LLM framework
  • Generate key moments for podcasts using AssemblyAI's LeMUR LLM framework
  • Create a conversational Q&A experience with the guests and hosts of podcast episodes (a la ChatGPT)
  • And lots more

View the full course outline.

Who is this course for?

This course is for anyone who wants to work with speech to text and LLMs around audio content. You will need basic Python experience. See our Python for Beginners course if you are entirely new to Python.

Concepts backed by concise visuals

While exploring a topic interactively with demos and live code is very engaging, it can mean losing the forest for the trees. That's why when we hit a new topic, we stop and discuss it with concise and clear visuals.

Here's an example of the dramatic effect prompt engineering has on our app's output.

Example: Concepts backed by concise visuals

Get hands-on with almost every chapter

Learning a new platform / API is an interactive experience. That's why it's important to write code and explore the apps we are building during this course.

Every chapter that involves writing or reading code has a starter and finished code snapshot. We strongly encourage you to take the starter code from each chapter and build out the features you see created during the course for each chapter after you complete it.

This course is delivered in very high resolution

Example of 1440p high res video

This course is delivered in 1440p (4x the pixels as 720p). When you're watching the videos for this course, it will feel like you're sitting next to the instructor looking at their screen.

Every little detail, menu item, and icon is clear and crisp. Watch the introductory video at the top of this page to see an example.

Follow along with subtitles and transcripts

Each course comes with subtitles and full transcripts. The transcripts are available as a separate searchable page for each lecture. They also are available in course-wide search results to help you find just the right lecture.

Each course has subtitles available in the video player.

Who am I? Why should you take my course?

Who is Michael Kennedy?

My name is Michael, nice to meet you. ;) There are a couple of reasons I'm especially qualified to teach you Python.

 1. I'm the host of the #1 podcast on Python called Talk Python To Me. Over there, I've interviewed many of the leaders and creators in the Python community. I bring that perspective to all the courses I create.

 2. I've been a professional software trainer for over 10 years. I have taught literally thousands of professional developers in hundreds of courses throughout the world.

 3. Students have loved my courses. Here are just a few quotes from past students of mine.

"Michael is super knowledgeable, loves his craft, and he conveys it all well. I would highly recommend his training class anytime." - Robert F.
"Michael is simply an outstanding instructor." - Kevin R.
"Michael was an encyclopedia for the deep inner workings of Python. Very impressive." - Neal L.

The time to act is now

If you have thought of speech to text as overly slow, inaccurate, or expensive, it's time to have another look. With this course, you'll learn how to build remarkable applications around speech APIs. And even if you aren't super interested in speech to text, the course is a true showcase of FastAPI and HTMX which will teach you many powerful design patterns.

The course is 100% free, so give it a try!

Course Outline: Chapters and Lectures

13:09
0:44
2:24
3:41
2:11
3:16
0:53
7:35
0:48
0:55
1:25
1:07
2:02
1:18
19:30
0:44
6:27
3:29
2:09
6:41
1:12:54
2:09
3:25
3:56
2:51
2:25
5:43
7:42
3:01
3:03
8:05
8:54
1:24
16:02
4:14
24:45
0:41
1:56
4:34
5:24
1:59
2:52
2:42
4:37
56:03
2:00
9:01
5:11
4:11
1:30
7:31
2:56
9:09
12:11
2:23
42:50
1:45
5:04
3:33
6:52
5:59
4:05
4:23
5:05
6:04
2:45
2:45
14:14
6:23
5:42
2:09
Take this course for FREE
Talk Python's Mastodon Michael Kennedy's Mastodon