Build An Audio AI Web App with Python and AssemblyAI Course

Build An Audio AI App
4.2 hours, 100% free
Take this course for FREE

Course Summary

If you work with audio and video content, a whole new world of possibilities has opened up with the advent of AI. Now that it is nearly instant and free to extract highly accurate text from spoken word. And with that text, we can expand the possibilities in many directions. In this course, we will build a realistic and real-world app working with audio from podcasts. We will summarize them, generate transcripts, and even create specialized GPTs for that content. All of that will be done with amazing technologies such as FastAPI, Pydantic, MongoDB, HTMX and AssemblyAI.

What students are saying

Just finished going through this, a great intro to mongodb and nosql in general. The introduction to Studio 3T by itself was worth it for me, amazing tool I didn't know before. Seeing how you designed the different parts of the application while explaining the logic behind, was great. Thanks for the content Michael Kennedy!
-- Shay E.

Source code and course GitHub repository

github.com/talkpython/audio-ai-with-assemblyai-course

What's this course about and how is it different?

You may find courses that work with text derived from audio. However, this course is about building a real-world app with full end-to-end functionality for many features. We don't just pull text from audio. We create a whole world.

Here is a screenshot of one aspect of our demo application.

What topics are covered

In this course, you will:

  • Learn about AssemblyAI and their audio/video APIs
  • See how to use HTMX in a FastAPI application
  • Use Beanie and Pydantic to model data in MongoDB
  • Create transcripts from hours of audio content
  • Build a custom search engine on top of audio data and transcripts
  • Use HTMX for no-refresh active search UIs
  • Run background jobs using asyncio and Python
  • Monitor background work with HTMX's poll event for zero-touch UI updates (and zero JavaScript)
  • Learn practical techniques for prompt engineering when working with LLMs
  • Summarize large audio content using AssemblyAI's LeMUR LLM framework
  • Generate key moments for podcasts using AssemblyAI's LeMUR LLM framework
  • Create a conversational Q&A experience with the guests and hosts of podcast episodes (a la ChatGPT)
  • And lots more

View the full course outline.

Who is this course for?

This course is for anyone who wants to work with speech to text and LLMs around audio content. You will need basic Python experience. See our Python for Beginners course if you are entirely new to Python.

Concepts backed by concise visuals

While exploring a topic interactively with demos and live code is very engaging, it can mean losing the forest for the trees. That's why when we hit a new topic, we stop and discuss it with concise and clear visuals.

Here's an example of the dramatic effect prompt engineering has on our app's output.

Example: Concepts backed by concise visuals

Get hands-on with almost every chapter

Learning a new platform / API is an interactive experience. That's why it's important to write code and explore the apps we are building during this course.

Every chapter that involves writing or reading code has a starter and finished code snapshot. We strongly encourage you to take the starter code from each chapter and build out the features you see created during the course for each chapter after you complete it.

This course is delivered in very high resolution

Example of 1440p high res video

This course is delivered in 1440p (4x the pixels as 720p). When you're watching the videos for this course, it will feel like you're sitting next to the instructor looking at their screen.

Every little detail, menu item, and icon is clear and crisp. Watch the introductory video at the top of this page to see an example.

Follow along with subtitles and transcripts

Each course comes with subtitles and full transcripts. The transcripts are available as a separate searchable page for each lecture. They also are available in course-wide search results to help you find just the right lecture.

Each course has subtitles available in the video player.

Who am I? Why should you take my course?

Who is Michael Kennedy?

My name is Michael, nice to meet you. ;) There are a couple of reasons I'm especially qualified to teach you Python.

 1. I'm the host of the #1 podcast on Python called Talk Python To Me. Over there, I've interviewed many of the leaders and creators in the Python community. I bring that perspective to all the courses I create.

 2. I've been a professional software trainer for over 10 years. I have taught literally thousands of professional developers in hundreds of courses throughout the world.

 3. Students have loved my courses. Here are just a few quotes from past students of mine.

"Michael is super knowledgeable, loves his craft, and he conveys it all well. I would highly recommend his training class anytime." - Robert F.
"Michael is simply an outstanding instructor." - Kevin R.
"Michael was an encyclopedia for the deep inner workings of Python. Very impressive." - Neal L.

The time to act is now

If you have thought of speech to text as overly slow, inaccurate, or expensive, it's time to have another look. With this course, you'll learn how to build remarkable applications around speech APIs. And even if you aren't super interested in speech to text, the course is a true showcase of FastAPI and HTMX which will teach you many powerful design patterns.

The course is 100% free, so give it a try!

Course Outline: Chapters and Lectures

Welcome to the Course
13:09
Welcome
0:44
AI +'s and -'s
2:24
What Could We Build?
3:41
Technologies Used in the Course
2:11
Course Table of Contents
3:16
Meet Your Instructor
0:53
Setup
8:05
Setup Introduction
0:48
Git the Code To Follow Along
0:55
You'll Need Python!
1:25
Compatible Editors
1:07
PyCharm Pro for Free
0:30
Running the DB
2:02
Setup in Summary
1:18
Tour of Starter App
19:30
App Tour Introduction
0:44
Code and Requirements
6:27
Running MongoDB (in Docker)
3:29
Playing with the Live App
2:09
Tour of Code
6:41
Feature 1: Transcripts
1:12:54
Transcript Feature Introduction
2:09
HTMX Primer
3:25
Add the Transcript Actions
3:56
Adding AI Action Views
2:51
Passing Data to AI Views
2:25
Starting and Monitoring Background Jobs
5:43
Running Background Jobs like Transcribe
7:42
AssemblyAI Library and Secret Keys
3:01
Getting the MP3 URL
3:03
First Transcript
8:05
Storing the Transcript in the DB
8:54
More Output on Completion
1:24
UI for Transcription
16:02
Using the Transcript in the UI
4:14
Feature 2: Search
24:45
Search Introduction
0:41
Survey of New Code Items
1:56
The Search Engine Basics
4:34
Running the Search Engine
5:24
Adding Search View
1:59
Wiring Search to the UI
2:52
Real Search Results
2:42
Search in Full Glory
4:37
Feature 3: Summarize
56:03
Introduction to LLM Summaries
2:00
Prompt Engineering, Really
9:01
Web UI for Summarize
5:11
Do We Have a Summary Already?
4:11
Get the Podcast and Episode
1:30
Creating the Prompt
7:31
Creating Transcript as a Single Text String
2:56
Calling the LeMUR API
9:09
Cleaning up LLM Prompt Leakage
12:11
AI Summary Steps in Review
2:23
Feature 4: Chat Q and A
42:50
AI Chat Introduction
1:45
Enable Chat UI
5:04
Actually Enabling Chat
3:33
Starting a New Chat
6:52
Chat with Episode UI
5:59
Processing the Question
4:05
Asking LeMUR
4:23
Using the ask_lemur method for Real Questions
5:05
Exploring the Podcasts with AI Q and A
6:04
Wrap up
2:45
Conclusion
2:45
Appendix
14:14
Running in VS Code
6:23
More Web Design with Tailwind and PyCharm
5:42
More Web Design with Tailwind and VSCode
2:09
Take this course for FREE
Talk Python's Mastodon Michael Kennedy's Mastodon