|
|
13:09 |
|
show
|
0:44 |
Hello and welcome to build an audio AI app with Python and Assembly AI.
In this course, we're gonna explore how we can apply machine learning through the Assembly AI platform to audio to build some really, really fun and insightful applications.
We're gonna build a cool website with FastAPI that lets us turn audio into text in all different ways, and then build on top of that information to provide rich details that you otherwise wouldn't be able to get.
So if you've ever wanted to work with audio in your Python application and create a cool service around it, follow along.
It's gonna be really fun.
|
|
show
|
2:24 |
We're all familiar with large language models and advanced machine learning that have really come on the scene in a huge way in the last couple of years.
Now, some of these features can be kind of frivolous.
Like here's a funny joke.
Imagine this, this cool guy over there, he thinks he's beat the system.
He's like, you know what?
I gotta send these business emails.
They gotta be real formal, but I don't wanna write a lot.
I don't wanna write those formal emails.
So here's what I'm gonna do.
I'm gonna write two bullet points, and then I'm gonna hit a button and say, Hey AI, flesh this out, make it sound real professional and send it off, okay?
Off it goes, lands over on someone's desk.
They come back and check their email.
They're like, ""Hmm, really?
All of this?
I gotta read all of this stuff?
You know what?
I don't need this.
I got AI, I'm gonna push a button.
It's gonna reduce this down to two bullet points and tell me exactly what I need instead of all this fluff, right?
Maybe we should just learn how to communicate better and no one has to go through this, right?
So a lot of AI features, they feel like this.
However, there are areas where AI machine learning really, really do work super well in constructive ways.
One of those ways is audio.
So imagine we've got some audio file like a podcast, for example, and we apply machine learning to it, maybe through Assembly AI's platform.
And what do we get back out?
Yeah, it's interesting.
Mark Shannon definitely has vision.
He's developed a plan as of years ago, but we finally were able to put him in a position where he could do something about it.
And we've all been pitching in.
A lot of it has to do with just applying some of the general ideas that are out there regarding dynamic language and optimization.
Things have been applied to other things like LLVM or various JavaScript runtimes and so on.
So here's a transcript encoded through Assembly AI in the Talk Python from the Talk Python to Me podcast.
That is word for word perfect of what was said on that show.
How incredible is this?
It truly is magical.
Now, if we take this, maybe mix in a little large language models, then we can start to do really interesting things with machine learning and audio.
And this is the kind of stuff we're gonna explore, these useful AI features in this course.
|
|
show
|
3:41 |
So we're inspired to work with audio and do something amazing.
What could we build?
Well, possibilities are pretty endless.
We can go this way or that, but I've got a really clear idea of something super fun that we can build for this course.
We're gonna build something I'm gonna call X-Ray Podcasts.
This is an app.
What does it do?
Well, as the name kind of indicates, it lets you look inside of a podcast.
It lets you look and see more about it.
So obviously I run a couple of podcasts, but I'm also an avid podcast listener.
Do you know what's really boring and not great?
Is if there's one paragraph for show notes, there's no links, not much about it.
Do you wanna listen to that one or not?
That particular episode?
I don't know.
You don't have enough information to make that decision, do you?
Well, with X-Ray Podcasts, you'll be able to get a TL DR, a little bit of a high action point summary type of thing.
What's happened throughout the show.
We'll be able to get transcripts.
We'll be able to search in them, all of that kind of stuff.
So that's what we're gonna build, this X-Ray Podcasts app.
And here's what it's gonna look like.
Starts out, you can just enter a podcast URL here.
The app is going to download, parse the podcast all through its RSS feed, import it into the system, save in a database.
And then it will start to apply its magic through the machine learning and assembly AI platform and some search engines and those kinds of things.
So here we can start by getting an app.
Then we can go into the details.
For example, here's one of the recent Talk Python ones.
You can see it's pulled in the show notes from the RSS feed, but more importantly, see those purple blue buttons there?
Create a transcript.
Don't have one yet.
And we actually supply one at Talk Python, but many podcasts don't have transcripts, but X-Ray Podcasts, it's gonna let you create them even if the podcast person, the host didn't provide them.
Create a summary, a TLDR and important action items and even have a chat, a Q&A with the podcast content itself.
That's amazing.
Click that transcript button.
We'll get really cool looking transcripts like this.
Press the summary button.
We'll get the TLDR and other things.
Here you can see Sydney Ruckel discusses her career journey in software engineering and offers advice to inspiring programmers, et cetera, et cetera.
That's not from me when I wrote the details for the show.
Sydney didn't exactly say that.
Assembly AI read this and gave us this summary for us.
Super cool.
And once we have all this extra information, then we can deeply search within podcasts.
If you search for a podcast today, you get show notes, you get the title, things like that.
Very, very limited information.
This app that we're gonna build, this X-Ray podcast is gonna allow you to search all of that extra information that we just talked about, surfacing through AI.
So this app is gonna be super fun to build.
We're not gonna build it from file, new project, completely from scratch.
That would spend a lot of time on things like web design and frameworks and database design, all those things.
We're just gonna start from an empty shell of an app and we're gonna add these features.
We're gonna add search, add the transcripts and those kinds of things.
So if you're not a web developer, don't let this scare you away.
We're gonna start with a pretty decently working model that you won't have to change a whole lot for the website and then just adding these cool features, but it's still gonna be cool to see how they surface in a real app.
|
|
show
|
2:11 |
What technologies are we gonna use, you know, frameworks and things like that in this course?
Well, there's two core ones.
We're gonna use the Assembly AI service, machine learning or AI as a service around audio, and that's gonna be really cool.
And we're gonna plug that into this web app that we just saw, which is built on FastAPI.
Yes, you didn't really see too many APIs.
You can build websites with FastAPI as well, and there's a lot of benefits to doing so.
So we'll see how to do that.
These are the two big pieces we're gonna be working with.
There's also a bunch of supporting technologies.
We have Pydantic, which normally comes with FastAPI, and we may use it with FastAPI a little bit, but primarily what we're gonna be doing is using it with MongoDB and Beanie.
So the database backend that's gonna allow us to do all the cool stuff, like save all that information, do lightning fast queries, and build a search engine, we're gonna build that all on top of Mongo.
We're gonna run that infrastructure through Docker.
So don't worry, you don't have to install anything on your computer other than Docker, and you don't need to know how MongoDB works.
Just one line of commands, boom, it's up and running.
But if you do care about exploring that, well, you'll be able to go a little deeper in the app and see what's happening.
And finally, we're gonna be using HTMX.
We're gonna make this a real dynamic, cool, lively app without going to things like React and all the consequences of going down one of these front-end framework sides of things.
And we're just gonna keep it real simple and use FastAPI and HTMX to create a dynamic and interactive site without actually reloading the pages all the time.
Kind of a several-page app, not quite a single-page app.
Now, you don't need to know most of these.
We're gonna focus on Assembly AI, and when we talk about those, they've got a really nice Python SDK, so there's not a lot we actually have to do in terms of understanding there.
And these others, they're gonna be in the background where you need to know a thing or two.
I'll teach you about them along the way, but they're mostly gonna play a supporting role for what we're doing in this course.
|
|
show
|
3:16 |
So what are we gonna cover in this course?
We talked about the app that we're gonna build, but specifically, what are the chapters and how are we gonna break up our time?
We're gonna start out by welcoming you to the course.
Congratulations, you're most of the way through that actually.
Then we're gonna be focusing a little bit on setup and making sure you can get this app up and running.
You know, make sure that you can get your editor configured to run the couple of commands that we're gonna need to, virtual environments, getting the database running so that it just can chill in the background and do its magic.
So we'll talk about that just to make sure everything's good to go.
Then I'll give you a tour of that starter code, that starter application.
Remember I said we're not gonna start from complete scratch because that would add a lot of time around something that's not really central to the idea of building audio AI apps, although important, not central.
So I'll take that starter code and we'll just spend a little time understanding how all the pieces fit together so that you can jump in and kind of make the code your own as we go.
Then we're gonna add four core features to this application.
Number one, the ability to generate transcripts using machine learning.
And we'll be able to do this from not just some files that we download, but anything on the internet that basically relates to a podcast is gonna be really cool.
So that's the first start.
Feature number two is gonna be adding deep search.
Once you have all the information about the podcast, plus word by word content of what is inside of that audio file, well, it becomes much more interesting to add search to that situation, right?
So we're gonna add search.
We'll see there's some really cool things we can do to make that really fast and also really user-friendly using HTMX.
So this is gonna be a super fun part.
Building on search, we were excited about what we got out of transcripts.
What about summarization?
So we'll bring in some large language model features from Lemur at Assembly AI.
And we'll be able to ask questions like, give me a TLDR summary of this podcast, or give me action items, or, you know, it's open-ended.
It's an LLM.
So we're gonna do a bunch of interesting things to get additional information on top of just the spoken word, the show notes, and those kinds of things.
And then we'll fold that, of course, back into feature two to enhance our search further.
Final thing, we're going to go and build a chat with the podcast episode.
So imagine this, there's a podcast episode that interviews one of your heroes or somebody you're a fan of, somebody who's really focused on some topic, you know, whatever it is out there in the world.
Wouldn't it be cool to kind of have a Q&A, open-ended conversation with them?
Well, we're gonna come up with something along those lines using LLMs and chats with all this information that we built up.
It's gonna be pretty awesome.
And that's it.
We'll just wrap up the course and make sure that you're good to go and build, you know, all set to go build whatever it is you want, armed with all this awesome knowledge.
So thank you for taking the course, and hopefully this roadmap looks exciting to you.
It's gonna be a lot of fun, what we're gonna build.
|
|
show
|
0:53 |
Finally, as we close out this chapter, let me just quickly introduce myself.
Hey there, I'm Michael Kennedy.
You may know me from other courses or from the podcasts that I run, but if you haven't, it's great to meet you.
You can find more about me, my essays that I write and links to many of my personal things over@mkennedy.codes.
I'm the founder and host of the Talk Python to Me podcast, co-host of the Python Bytes podcast, founder and one of the principal authors here at Talk Python Training.
And finally, I am a Python Software Foundation fellow, which is a really cool honor as well.
If you want to get in touch with me, find me over on Mastodon, where I'm there@mkennedy@fosstodon.org.
You can also find me on X, where I'm @mkennedy, if you wish, though I spend more time at Mastodon these days.
Great to meet you.
Welcome to the course.
Let's dive into some content.
|
|
|
8:05 |
|
show
|
0:48 |
In this short chapter, I'm going to show you everything you need to know to make sure that your system is ready to run the app and edit the code and all those different things so that in case you want to follow along, which I highly recommend, you'll be able to do so.
Of course, if you just want to kick back and watch, you know, feel free.
That's great.
That's totally great.
But if you do want to follow along, you'll get more out of it.
So I encourage you to, through each chapter, you know, watch what I'm doing for a while, pause the videos, go over and work on it on yourself, and then come back to the videos and carry on.
So if that's your goal, then you definitely want to pay attention here.
And if not, you know, you just see what it requires to run the app anyway.
|
|
show
|
0:55 |
The first thing is you're going to want to clone and probably star or fork the repo for the course.
Here you can see it's at github.com/talkpython/audio-ai-with-assembly-ai-course.
And just go over there.
You can also just click the ""Take me to the repo"" button in your video player here.
And this is going to have all the code that you see me write as well as the starter code.
The starter simplified, not AI enabled version of the app yet.
You don't see that yet because we haven't written that yet.
We're going to do that together throughout this course, but I'll put the starter program in here and we'll start from there and go.
So be sure to go over pretty much, I recommend you pause it right now.
go make sure you have the code for the course, and then we'll carry on.
|
|
show
|
1:25 |
Newsflash, you're gonna need Python to run the code in a Python course.
Actually, not too surprising, is it?
Well, there's a little bit more detail I want to give you.
You're gonna need Python 3.9 or higher.
Right now, 3.12 is the latest, so I recommend that you have 3.12, or if you're watching this later after I've recorded, of course you're going to, because I'm not gonna ship it instantly, it's not a livestream.
You might as well have whatever the latest version of Python is.
At a very minimum, you're gonna require 3.9.
There's certain things we're doing with Python type hints and those types of things that will not work on Python 3.8 or lower.
I've had people send me messages, ""Michael, you've written bad Python code, it doesn't work.
Look, it says that this thing is not understood, and I have Python 3, and I'm trying to run it and it doesn't work.
Guess what?
They had Python 3.8.
Python 3.9 is different syntax, and on and on.
Every version brings a little bit something new, but the backwards compatibility story is really strong with Python.
So get the latest one.
If you don't know if you have Python, if you don't know how to check, or you want to make sure that you're just doing it in a good way, you can go over here to Talk Python Training, and we have an article that is kind of a choose-your-own-adventure.
Are you on Mac?
Have you installed it this way?
Et cetera.
And it'll help you get all set up if you don't know.
|
|
show
|
1:07 |
Once you can run Python, you will need to be able to write Python and edit Python.
So we're going to need an editor.
I'm going to be using PyCharm for this course.
I'm a huge fan of PyCharm.
I will show you some awesome stuff if you're not familiar with it, and maybe encourage you to be a fan as well.
So get that over at JetBrains.com/pycharm.
There's a free community edition and there's a paid edition.
The community edition should be more than enough for this course.
So you don't need to pay any money for it.
So recommend that.
If for some reason you're on the other side of the fence, and these days the other side of the fence pretty much means VS Code, you're welcome to use VS Code.
This is a great app as well.
Just make sure when you get it, you go to the extensions, that little set of boxes clicking together, and that you install the Python tooling for VS Code.
Because without that, it doesn't really do that much.
It does suggest it, I believe, if you try to open up a Python file and it doesn't have it.
So PyCharm or VS Code, but if you get VS Code, make sure it has its Python capabilities clicked
|
|
show
|
0:30 |
Good news, I've worked with the folks over at JetBrains to get you a six-month free trial for PyCharm Professional.
It'll save you around $60 and give you a chance to use some of the paid features like support for the web frameworks, FastAPI, Flask, Django, and others like that, the JavaScript frameworks such as React, as well as the data science features.
Just use the link jetbrains.com/store/redeem and the code PyCharm for DocPython with the capitalization you see on the screen to redeem the code.
|
|
show
|
2:02 |
The final piece of infrastructure or tooling that you're going to need is you're gonna need to be able to run the database.
It doesn't really matter what database we're using 'cause a lot of that's abstracted.
You won't need to see it or know how to work with it, but we're using the free and open source database MongoDB.
Now to run it, if you already have MongoDB installed and running on the default port 27017, that's fine.
Just let the app create a database over there.
It will do that on its own and it's off to the races.
You don't have to worry about that at all.
But I imagine most of you out there don't have MongoDB just running in the background.
You're not like me 'cause I work with it all the time.
I recommend that we use Docker because with Docker, it's one line to install, i.e.
Docker pull, and then one more line to run MongoDB without it leaving any traces on your computer.
Just run it in the most lightweight, isolated way as we can.
So you can do that if you're on Mac or Windows by getting Docker Desktop.
If you're on Linux, you can just get real Docker and go to Docker and then it'll show you how to install it for that.
But if you're on Mac or Windows, download Docker Desktop, you can see right here.
This is a free app and long as it's running in the background, then the Docker commands will work.
One thing I've been playing with lately that you might wanna check out is something called Orbstack at orbstack.dev.
And this is also an alternative desktop environment for Docker containers and has some really cool features like you can see here that this container Mongo server is running and it has a little folder icon.
You can actually click that and open up, in this case in Mac, open up a Finder, just start exploring and editing the files and really getting a look inside similar things with Windows.
So one of these two things, if you're on Windows or Mac to run Mongo, don't worry about how we do that yet.
We'll talk about that just a minute.
But recommend this, unless you already have Mongo, then you don't need Docker.
|
|
show
|
1:18 |
So in summary, what do we need to run our app?
What do you got to do to get your system ready?
I know there's a few steps here, but I really wanted to put together an app that felt like a real app and not just a toy.
So that's a couple of steps.
Number one, you have Python 3.9 or higher, ideally 3.12 or higher, but you know, go with the minimum version if for some reason you have restrictions there.
3.9 is the minimum.
Go and star and clone the GitHub repo if you want to make changes and push those changes back, maybe fork it and then clone your fork so that you have permission to push changes back and don't change it for everyone.
Install a proper editor, an editor that understands the whole projects.
We'll see there's many moving pieces in web apps, like that is the nature of the web, and having a project structure-based editor like PyCharm or VS Code is invaluable.
We're going to need to run the database, so either have Mongo directly and be happy to run that.
And if you don't, install Docker Desktop, either the proper version or the Orbstack version.
You can check those both out and see what you like.
And that's it.
Once you get this stuff ready, you know, don't go on to the next chapter till you get these things done.
They're all super quick.
And then let's move on and get into the code.
|
|
|
19:30 |
|
show
|
0:44 |
You know what we're gonna cover in the course, and you know how to get your computer set up, at least what's required for it.
We'll talk a little more about that in this chapter.
It's time to dig into the code, isn't it?
So in this chapter, because we're starting from a somewhat complicated, 'cause I want it to be a realistic app, I wanna show you or walk you through a little bit of the code, some of the moving parts before we start adding features to it.
Again, take this as a FYI for your information.
You don't need to follow along.
You don't need to be an expert in say, FastAPI or whatever it happens to be that we're doing.
We'll work in little tiny pieces as we go.
I just wanna give you the big picture when we get started, okay?
So let's dive in.
|
|
show
|
6:27 |
Here we are in the GitHub repo, and let's just start where I recommended you start.
Let's go up here and clone it.
Notice down here, we now have a code 00 starter app, and there's gonna be more 01, this feature, 02, that feature, and so on.
But down here, it even comes with instructions on what to do, so this is what we're gonna talk through, but it's also written down right here in the readme.
Step number one, we're gonna go and clone it.
Now, I'm just gonna clone it to the desktop for now so you can see what's happening, then I'll move it somewhere else later that you don't need to worry about.
Also, I have a non-default terminal.
This one is warp, and I recommend you check out warp if you're on Mac or Linux.
Maybe later, they said they're working on a Windows version, but it's not out yet.
So sometimes people ask me, Michael, what is this weird terminal thing you've got going on?
That's what it is.
So we're on the desktop.
We'll just git clone, and we'll just put in the URL.
You don't need the .git, by the way.
You can have it, but it's not required.
And here it is over on the desktop.
It'll be called audio something or other.
So you can see we've got our readme.
That's just not, that's not the one I was talking about.
That just talks about the overall project.
We go into code, and we'll have the starter app here.
And then this is the place that we're gonna be working.
So for each one of these, we've got, I'll open it up in Finder here for you.
Each one of these has kind of got this top-level point-in-time name, so starter app, finished with transcripts, or whatever we end up calling these.
And it's got the node modules, and the source, and the requirements.
So I recommend that you open up this folder, or whatever section of the course, or point-in-time you wanna work with, into your editor.
And because of that, I'm over here, and we're gonna do a couple of things.
First of all, notice there's no virtual environment.
It does show there's this global one, but I run with a virtual environment active all the time on my computer in case I wanna blow it away.
It doesn't mess with System Python.
So what I'm gonna do is I'm gonna create a virtual environment.
That's the first thing that we wanna do because you can see we have some requirements.
In fact, quite a few that we're gonna need for this app to run, and that's just a good practice.
So we will say Python -m venv, venv.
Now, that would be fine, but I would like, see how this one is named global, not just venv.
We can say --prompt.
And when that creates this virtual environment, the name is going to be just venv.
But when we activate it, so this varies depending on your operating system also laid out in that readme right here.
On macOS and Linux, you say dot for apply to this shell.
Then you say venv/bin/activate, like so.
Notice how it's 00starterapp, so it tells me which one it's in.
If I was on Windows, I would do venv scripts/activate.bat, why, oh why, for all that is right, do these have to be named separately?
I don't know, but that's the way it is.
So that's one of the weird differences between Python on Windows and Mac, so we gotta do this.
Next, we need to get those requirements.
I use piptools, and piptools takes this file, which is just the top-level unspecified requirements, and compiles it, something like compiles it to, this more specific version.
So what we're gonna do is, you don't need to use piptools, you can just simply go here and pip install the requirements, but if you wanna mess with that and upgrade it, you're welcome to.
So I'll say pip install-r requirements.txt, virtual environment active, and as always, Python is always, always, almost always, except for one or two weeks a year, out of date with its pip, so I actually have a thing that normally creates virtual environments and automatically upgrades it, but it's not required, but go ahead and upgrade it, just so it doesn't complain to you.
Okay, and there we have it.
We've got our virtual environment created, we've got it activated, the things we need to run the app are here.
So we could come over here and just say Python main and run our app, but at this point, I kinda wanna put the terminal and the shell down and go over to a proper editor.
So this folder is what I want to be the base, so in PyCharm on macOS, macOS only, you can do this, drag and drop it, and it will open up this project.
On Windows, you have to say file open directory and browse to it, not a huge deal, but that's how it works.
All right, here we have it in our program, and notice, got our source files down here, this is where all the code that we're working with.
One more thing we have to do before we're able to run this is we need to set things like the assembly AI API key.
Should that stuff be committed to Git?
No, no, no, no, no, no, no, no, it should not.
So what I've done here is I've created this settings file that has a template default values for just local host server, but then here's where your API key goes.
So what we're gonna do is we'll talk later about the API key, but in order for the app to run, it just expects this file to be there, and later we'll go and enter our key in there.
If you already got it, go ahead and put it.
Don't put it into this template.
The goal is do what the action says, make a copy, save it to settings, and in here put, and this is no longer needed, and put my API key.
I'm gonna put my API key there, but you know what?
I'm not gonna share it with everyone because that one's mine, you put yours there.
So with all of that in place, we should be able to run this, although without MongoDB being here yet, it won't really work, so that's one more thing I'm gonna do before we actually go and run this code.
|
|
show
|
3:29 |
Now that we've got our virtual environment set, you can see, move out of the way here for a second, you see PyCharm down here says it's using this virtual environment, which is excellent.
The final thing to do is to set up MongoDB on our server.
And again, here are the steps.
So let's go back to our terminal here.
It doesn't matter where on your computer you do this.
So we'll say docker pull mongo, and that'll get the latest MongoDB.
It says, look, you've already got it because guess what, I've done this before.
It actually just updated a couple hours ago.
Then the next thing we're gonna do is say docker volume create.
I'm gonna say docker volume create MongoDB, and I'm not going to actually run this because I already have this volume, but run this, it'll say created a volume.
Again, the point of this is here's a persistent bit of storage.
So when Docker restarts or whatever, the container goes away, you upgrade it or something along those lines, this persists.
Otherwise the database files backing in Mongo will go away every time you restart your container.
Not great for a thing that's supposed to be a persistent data storage, right?
And then finally, I'm gonna go down here to the command so I can copy it, paste it in here.
We're gonna say docker run, don't block, and just run it in the background as long as I'm logged in until I do something in a mode that if I stop it, it cleans up after itself so that it listens on localhost to this port and forwards it inside the container.
So just forwards on all traffic using this new volume we just created, mapping that to data/db and give it a stable name so we can refer to it running this image that we just got.
So let's go and run that.
And look at that, it's up and running and now we can say docker ps, you can see it's sitting here running and its name is, if I zoom out, its name is a Mongo server.
It's just gonna always be named that, perfect.
So pretty much you don't ever have to do anything again unless you reboot your computer or for some reason stop Docker.
But if you don't do those things, just close this and you have a database server running.
All right, excellent.
So now our settings here, our MongoDB installed and it's incredibly lightweight and easy to get rid of.
Told you about Orbstack, we can go down here, you can see that it's running.
If you go to the volumes, you can see there's Mongo data.
I've actually loaded some other data into it for other apps that I have going and some other things just hanging around.
These are unused, we can clean them up if we wish, like so.
All right, so you can monitor what's going on with all these things, but you don't need to, just in case you're curious.
Finally, I want to run this code so I can right click on main and say run.
Excellent, see this stuff came from the database.
That's 'cause I ran it before.
You guys will see importing these various podcasts and then the next time you run it, you'll see this.
So we're talking to our local MongoDB and everything looks like it was great there and yeah, the app is running.
Let's just click this and see what we get.
Here it is, yes, the app is up and running.
I'll give you a tour around the app next.
|
|
show
|
2:09 |
So we got our app all running.
Everything's configured Mongo setup, virtual environment is set up, our settings are set up, run it one more time.
And let's just open this up.
So here's our app, you can see it has this little pretty simplistic landing page I built, I just wanted to have something for you.
And if we go over to let's say discover, it'll say here are podcasts that you can follow that you might be interested in.
So here's a really awesome one, talk Python underneath.
And there's darknet diaries.
And oh, look, there's also fresh air.
MKBHD is great.
And because we're using HTMX, we can do really cool stuff.
Like if I want to first go here and see followed, you can see I've got these four that I followed, I must have followed many by now I've logged in earlier and did this.
Now, if I click this watch right here, it goes follow just changes to following.
So it I have now that means I am following it.
And how cool is that?
So if I go back over here, you can see it's now following I can explore it.
See here all of its episodes, there's actually more we can just load those without a refresh right there because of HTMX.
Super cool.
Let's see what this vision pro thing is about.
It's Vision Pro Week.
Yay.
So we can play the episode.
There's going to be a lot of cool AI and assembly AI and audio work that I'm going to put right into that section.
And then, you know, just on down, here's the show notes, and so on.
Alright, so we can discover new ones, we can enter them in here if we want.
If you had a podcast URL or web page, put it in there.
If we can parse it and find a RSS feed in there, we'll include it in the library and save it that MongoDB that we just got running.
Yeah, so we can click around here.
And that's, that's what this app is about.
Okay.
Final thing, we'll be able to if you log out, you can log back in, come over and create an account.
I'll call this Michael to put in a password, we'll create an account and boom, we're back logged in.
But now we're not following anything.
It brought us back over here to discover.
That's pretty much it.
That's the app and how it works.
|
|
show
|
6:41 |
All right, let's get our bearings in this code and see how everything works, and then we'll start adding features.
Notice we have our virtual environment still active on the right.
I'll dive over there and cover it back up.
Get out of the way of the project view on the left.
We have a couple of important moving parts.
We saw that we're working with MongoDB based on Beanie and Pydantic.
We're working with FastAPI and Assembly AI SDK.
Most of the FastAPI apps start this way.
They have a main.py.
Close this up for a sec.
And it even includes the way to start it up here as well if you don't have the readme.
So there's a couple of things we do to configure our API app here.
Say configure routing.
We want to be able to serve web pages, not just APIs out of FastAPI, so we mount a static route for things like CSS and images and so on.
Then we have home views.
We have account views and podcast views.
This is a way to categorize and break up our code so it's not just one stupid big main.py or app.py with everything in it.
No, we're gonna have stuff to do with APIs and AI.
We'll have stuff to do with home pages and podcast views, all that stuff organized.
And these live over in the view section.
So for example, let's go to the podcast view down here.
If you just go to /podcasts, this is gonna give you a list of, I think the ones that are popular, and then /podcasts/followed, so this is Discover.
This is the ones that you follow.
And here's the host.
If you're on the Discover and you submit a new one, you can see it's coming in and it says, we're gonna try to find whatever URL you specified.
And we have this service that is quite intense that goes, discovers the HTML page, pulls out the RSS feed meta tag, then uses the meta tag to get the RSS feed XML itself, and then parses that with a whole bunch of variations 'cause there's standards that nobody can agree on, apparently.
So that's what we got going over there.
And then for example, here we have this HX.
I'm just using HX to tell me what it's about.
This is a HTMX request.
Remember when I click the little follow button, it turned into followed or following.
It ran this code behind the scenes kind of magically the way it did.
Okay, so we have these different ways of organizing our code and these, you know, like the home page, this is just slash index.
And then we're using a template over here in the templates folder.
And each categorization of the views, like home or podcast or whatever, gets their own section.
So over home, we have an index and podcast.
We have details, episodes, following, discover and so on, right?
Here's the HTML for that.
So our templates match very much close to our views.
And you also saw inside the views, like in the podcast one here, there's a complex data exchange that happens sometimes, especially when we're passing in data.
So instead of making this code super disgusting and just busy and full of details, I'm following what I'm calling a view model pattern.
And again, organized like the templates, view models, and then in the view models, we have podcasts.
So we go to the podcast and then the URL is followed.
So we have followed podcasts view model.
And in here, this thing is storing the data that's exchanged between the HTML and the Python.
Now, normally this would probably have all of it just happening in the constructor, but because Python cannot have asynchronous constructors in our data layer, as well as FastAPI forms that must be asynchronous, we had to break this into two sections, okay?
So it'll do things like go to the form that was posted and get the URL.
And if there's not a URL, it says, you can't follow a podcast, discover a new podcast that has no URL, stuff like that.
So view models organized like this, views, drive templates and drive view models, they're registered through routes in the main.
Then we have a couple other things, grab bag of stuff like cookie authentication and just convert to web URLs, store the secrets that we put in our settings file.
We have our data layer.
For example, here's what a podcast looks like in Beanie.
A Beanie document is just a pedantic document if you haven't seen that.
So it has an ID, a title, a created date and the last updated date that is defaults.
And then it has a bunch of details about it.
And then they also have episodes and an episode has things like an episode good, episode number, but they're not always there.
So it's optional and so on.
And then we just say, look, this is where it goes in the database.
This is our, it's indices and so on.
So really nice, clean way for us to work with that.
And then the last thing is the services.
And these are not services like APIs or HTTP services.
These are just parts of our code that provide services through the rest of it, I guess.
Like there's stuff grouped around AI and there's stuff grouped around search.
And users, so for example, the user service, it can create an account.
So instead of putting that code somewhere, we just say, what do you need?
You need a name, email, password, whether or not it's an admin.
And then, and it comes and it goes and creates one of these users that goes in our database.
And then it calls save, says it was created and then it gets back the value out of the database.
Technically, this should be good enough to just return, but there's like a really fresh new version of it exactly as the database sees it.
And it'll run queries like find me an account by email.
And here's what a database query looks like for MongoDB in Beanie.
User.find1, user.email equals email.
Of course, we normalize that to be lowercase and without spaces if they typed in extra spaces.
All right, so that's the services.
And we'll play with those later as well.
Well, that's it.
And I don't know how much more there really is to show you.
We'll get into some other pieces like how does a little HTMX work when we actually put it into place for probably transcripts is gonna be the first thing that we're gonna use HTMX for the further background work that we're gonna have to run.
But few things we didn't cover, but this is the code.
You can start it up and play with it or you'd rather just follow along.
We'll do that really soon.
We'll get on to adding our first feature.
|
|
|
1:12:54 |
|
show
|
2:09 |
It's time.
It's time to start building.
We're gonna work on our very first feature that we're gonna add to our application, transcripts.
That first feature is gonna be transcripts.
And you can see we have this create transcript button.
Now, when we do this at Assembly AI, it's not instantaneous.
For example, this podcast that we're showing here is 55 minutes long and we want high quality transcripts.
That's not click the button, boom, here are your transcripts.
That has to run for a while.
And it's amazingly fast.
I talked to the founder of Assembly AI and about how they're doing their infrastructure and all those things behind the scenes.
And they're using a crazy number of really high-end GPUs in Amazon, I believe AWS, and certainly in the cloud, to take this and process it for you.
For example, when we do this here, it probably takes 20, 30 seconds.
If I do that on my M2 Pro on my Mac, it takes like 15 minutes.
So it's still incredibly fast, but it's not instant.
So what we're gonna need to do is add a really cool background feature to say you've started your transcripts and we're watching.
We'll let you know as soon as your transcripts are ready, we'll pull that in for you.
So that's gonna be what happens when we click this button.
It's gonna kick off a job that's going to start the request to process the transcripts in the cloud at Assembly AI.
Then we're just gonna hang around until that's done.
And then we're going to, when they are done, we're gonna put that into the database so we never have to do it again, right?
Save it there.
After that, we're gonna show this really cool page.
And you can see here, it even has a little green highlight over the first section.
You can actually move around here, you're gonna be able to click on it, play exactly at that point by just looking at the text and clicking on it.
So that's what we're building in this chapter.
I think it's a cool feature, and it's really gonna get us using some of the neat aspects of this web app.
|
|
show
|
3:25 |
Now, before we get into the weeds and the details of writing all this code, I wanna give you just a quick HTMX primer.
Now, if you already know HTMX, feel free to just skip to the next video or this is short anyway, but I'm gonna just show you really quickly what this is and how it works.
It's an amazing technology.
It's an alternative to front-end frameworks like React and Vue.
And it says, instead of running everything in JavaScript on the front-end, what if we just made the front-end more dynamic by working with the server more closely?
So you write your interesting behaviors in Python and you just click that together on the front-end with HTMX.
Let me show you an example.
So down here we have, let's just do a click to edit, say.
So here is all of the client-side HTML that you need to know.
We've got a label, not a form.
Notice there's no form here.
There's just static data in a label with values joeblue@joeblue.com.
And then it has, instead of an href, it has an hxget.
And when you click this button, what it does is it uses the server at that URL to figure out what happens when I click that.
And what it does in this case is it's gonna say, return a form that we can edit.
Let's see this in action down here.
Here's our static stuff, right?
Try to double-click it, typing, no changes.
But if I hit edit, notice it's downloaded that form without a page refresh.
It just, in that little fragment, sets it.
I'll just put this as michaelk@mk@mk.com.
And if we hit submit, it's back.
Now it's back to the static version.
And you can actually see down below, we show this, it'll show you what's happened.
So first it shows this static form as we just saw.
And when I clicked edit, it took that data and on the server it said, you know what?
The edit version is gonna be a form that has a label, some text area, another label, another text area.
It has the values that were, there's the static data to start.
And it says, we're gonna do a put back to that URL.
And we're gonna have the target be this whole form and swap it if there's a change.
What do we do in the change?
You can't see it on the screen, but if you click that button, it'll do this thing.
It makes this change and it returns back the new updated static version.
So this is HTMX.
There's a ton of different examples down here.
Another one that's fun is active search.
We're gonna see that going in a little bit.
So down here, I can just type, we need Michael, the M.
I can see we've got Owen, so I could type O.
There's two Owens, keep typing.
Now it narrows down just to Owen.
We're gonna do that for our search as well using HTMX, super, super cool.
So there's really not much to learn at all about this.
We do have a course on it.
It is amazing if you take it to its extreme, but what we're gonna use in this course is a really, really small sliver of this to make our code super nice, allow us to add this dynamic behavior.
For example, when we kick off a transcript, we can start that, monitor the progress without just reloading the page all the time.
Really, really nice.
We won't need any JavaScript, which is also okay with me.
|
|
show
|
3:56 |
Now, we want to run our main file, of course.
This may just run in PyCharm.
Let's see if I hit run.
It turns out that I had to install a security update, so I have not yet restarted MongoDB.
You must have MongoDB running.
So let's do that again real quick.
Run this command.
It's up and going.
Perfect.
Looks like everything is back and running just fine.
One thing you may need to do is, it's going to help out a lot, is if you go over here and you right click and you say mark directory as, hiding down here, mark directory as, come on, sources root.
What that does is basically when you say something like from db import mongo setup, it's going to look here and say, oh, I see there's a folder called db and in there there's a mongo setup.
If you don't have that set, then it's going to be a problem.
Basically you want this to be the working directory, whatever editor you're using.
This is up and running.
That's great.
So it's up and running.
Go here, take this off.
We want to go just check out any old podcasts.
I'm logged in and I'm following Talk Python.
That one's going to be fine.
So let's go over to this parallel Python apps with sub-interpreters.
It says to do episode action buttons.
So what we want to do is we just want to put those buttons there.
They're not going to do anything at first.
We're going to add that functionality, but let's just add that little bit of UI here so that we can add those action buttons and then click it together with HTMX into the assembly AI API calls.
Okay.
So what we need to do is go to the details template in the podcast section.
Remember my organization over here.
So notice this is not colored.
You can sometimes do a little bit better if you mark this directory as a template folder.
And what we're using is the chameleon language.
I know a lot of people use Jinja.
We'll talk about chameleon and Jinja in a sec, but I'm using chameleon because I think it's much, much nicer.
All right.
And over here, we're going to go to episode because this is episode details, not podcast details.
And in here somewhere is a to-do and it says, all right, what we want is this episode actions.
We're going to put, replace that section with some plain old HTML.
Let's get some room here so we can see it.
So we're going to have a couple of things.
We're going to have a button, a rounded button.
It's going to be indigo.
And we have a hashtag transcribe that doesn't really apply or do anything, but having something in the URL means that you get a pointer instead of an arrow when you hover over it.
So it looks clickable.
It thinks it's clickable.
We'll wire that up to a JavaScript event in a second.
And in chameleon, the way you do if statements is through what's called this towel template attribute language where you have conditionals and loops and stuff.
So we're going to say, if there is no transcript that already exists, we want to show the button that says create transcript.
But if it's the case that there is a transcript, this will be hidden and we can just say view transcript.
So this is where the action is going to be.
And then later we're going to be able to interact with the transcript there.
And here's stuff about creating the summary and enabling AI chat or question and answer sort of thing.
We'll come back to that later.
So it's just these first two parts for now.
I just put it all together.
So we have the pieces there.
Let's see what we get if we refresh it.
Boom.
And sure enough, create transcript, create summary and create Q&A because the hashtags they have like little links looking things there.
Step one, done.
|
|
show
|
2:51 |
Now, we're going to need somewhere when we click this button for the API to call to or HTMX to call into to say they've clicked the button to create a transcript and moreover which episode.
So we'll use this episode variable and episode ID and those types of things to pass that over and say create a transcript for this episode.
Now we could stick it in one of these views but I like to have everything kind of broken up and you know isolated.
So I'm going to create a new file called AI views.
And in here, we're going to have two functions def start job.
Now you might say start transcript.
But in fact, there's different kinds of jobs there summarize, there's transcribe, there's chat, all these different things that we are going to be able to do.
And it turns out to be exactly the same.
The way it works is just what is the action we pass off to this background subsystem that is in here.
So we're going to just have a start job and def check job status.
Okay.
Now we want to make these something that we can call from FastAPI.
Now a lot of times you'll see at app dot.
How do you get app here?
Well, it's over here.
There it is.
But we're going to need to import this library.
We can't give it that.
So it's going to turn out to be a circular reference, which doesn't go well with Python.
So FastAPI has a cool way to do that.
PyCharm is suggesting we do something with flask.
Thanks.
But how about we do this router equals FastAPI, API router like that.
And then this can stand in for.
I do a get here and we'll put in slash URL.
And let's put a check status for this one.
Now this is not enough information, but let's just make sure that's working well.
We'll just return.
Hello world.
Just for a second.
And we want to make sure just that we're clicking everything together here in the app.
So what we're going to do is we're going to go down to this register or configure routes and we'll do another one of these.
And we'll say API or AI views.
Import that at the top dot router.
Now if we run it, we should be able to go over here and just say, what did I call it again?
Slash URL.
Hello world.
Okay.
So we got this new view plugged in and everything's ready to go.
these are not the URLs that we're going to need.
We'll put those in in a minute.
|
|
show
|
2:25 |
Well, /url looks fun, but kind of useless.
What are we gonna put?
Let me just paste this in 'cause there's kind of a lot going on here.
So we're gonna have /ai to represent our AI view here.
Start, start some action.
This is transcribe, summarize, whatever.
And so this is gonna be a string I'm gonna pass in here.
And we're gonna have which podcast ID is this?
Is this talk Python, Python Bytes, Fresh Air, whatever.
PyCharm is suggesting to auto-complete that perfectly, so I'll accept that.
And then episode number, and that's an int.
Now, one thing we can do is we can actually, if you look here, we can go to a symbol.
There's a job action.
There's the job action class over here.
Where is this living?
It lives in the database under jobs.
And this is an enumeration that we can use.
You can see it's a string enumeration.
Transcribe, summarize, and chat.
And that restricts what strings can be passed in here because FastAPI is awesome.
I can actually go over here and say this is a required job action and import that, not just a string.
Okay, so let's just print out the things that are coming in here now.
Here we go.
Try it again.
And I'm gonna need to copy this URL 'cause I won't remember it.
So we wanna go up here and say, well, what goes in here?
Let's put the action, we transcribe.
Podcast ID is gonna be Python-bytes episode ID.
Let's put 344.
Then transcribe Python bytes, 344, perfect.
And what if we put something else in here is like jump up, error.
Input for that section of the URL must be either transcribe, summarize, or chat.
You said jump up.
Jump up is not one of those.
See how cool it is we're using that string enumeration and how cool FastAPI is.
Okay, so yeah, that doesn't work, but this sort of thing right there, that's exactly what we're looking for.
So we've got this data being passed over.
And while we're at it, let's just do the other one as well.
On this check status, instead of that, we're just gonna say check status for job ID.
So what in practice this is gonna do is you're gonna say we started your job.
Here's the ID.
You can ask later whenever you want.
Just come back and ask using this endpoint to see if it's done.
And if it's done, we'll do one thing.
If it's not done, we'll do another.
|
|
show
|
5:43 |
Well, this little echo of this dictionary was just so that we could have something to verify that the data exchange is working and play with that for a bit.
Let's get more real about it.
So what we're gonna do is we're gonna bring in a view model.
Remember, I told you a lot of parts of this app are already built because I don't want you to have to juggle with like, how do we create a new asynchronous background task system?
So guess what?
That's done.
So I'm gonna create a view model and it's gonna be called a start job view model.
And we're gonna need to import that.
And let's just go look at it real quick.
So it takes, well, what you might expect, podcast ID, episode ID, and an action, which is one of these enumerations.
It also takes the FastAPI request, or more specifically the starlet request, because the stuff underneath, there's a view model base down here where it does things like hold the request for you and handle a couple other things.
Like for example, check if somebody's logged in, which requires to check the cookies of that new incoming request, that kind of stuff.
So don't worry too much about it, but we got to pass that along as well.
And then we're just gonna store different things about this.
All right, we're gonna say, look, if there's an, I'm gonna switch on the action, say for transcribing, and then we're gonna set some values.
Otherwise we'll set summarizing and so on.
And then finally, there's no asynchronous stuff we're doing here.
So this is just about the data exchange.
We know forms to parse asynchronously like you do in FastAPI.
So this will be a pretty simple one.
And what we need to do is just pass it the information that it needs.
And we're also gonna need to pass in the requests right here and let's import that.
As I said, from starlet.requests.request.
Okay, so then we'll pass requests.
So then I get that order, right?
Let's see requests.
No, it did not.
Good guess, PyCharm, guess better.
Okay, episode and action.
So this view model is gonna store that data.
You might wonder, well, we already have it kind of parsed here.
What are we doing?
There's HTML on the other side that expects all those different variables, those self variables from that class to work with.
And that's kind of the role of the view model.
So that's part of its job.
And the next thing we wanna do, speaking of jobs, is we wanna go to this thing called a background service over in services.
And say, create a background job.
We'll look at this in just a minute.
So what we're gonna do is pass in one of these job actions, which is the action, the podcast ID, and the episode number, which is episode ID.
I kind of would prefer to rename that thing to episode number and we're gonna have to sync that up there as well.
Excellent.
So then in order for this to actually run, we'd look at this, we'll say, yes, async def.
So in order for this to do anything interesting, we have to await it.
And that makes this an async function.
It requires it to be an async function.
And we can just print started job, job.ID.
And then we'll return nothing.
And let's go ahead and return the job ID as well, just so it's in this data.
(silence) Let's run it, see what happens.
All right, here we go.
It turns out for some weird reason, and for unable to get that to show up, I had to call str on it, because that's technically a object ID out of the database.
And its default string representation was weird, but there we go.
So we've started a new job, it's called this.
And if you look over here, you can see started job, such and such.
This theoretically is off to do something.
It's not quite yet.
That's going to be one of the core parts where we plug in our assembly AI magic, but we're off to a good start.
All these pieces and data are passing along.
One final thing here we could do is this is going to have a job ID.
And as you can imagine, this is going to be a, we'll just let it be a string over there.
And we may need a request as well.
So let's go ahead and take that.
And we can go to our background service and ask, is job finished?
Passing in this job ID, we'll say await.
Now, what does this want?
Does it want a object ID?
So we can just say bson object ID.
That'll parse this across like that.
And for us to call await, we have to say a single say print the job, whatever it is, is finished.
I have to need to start a new one here and then say check status.
And it printed out.
Is it finished?
No, it's still running.
We didn't return true or false, but it returned anything, so it just came up nothing.
But here we go.
The job is not yet finished.
And because it's not really running yet, that's not a surprise.
But there we go.
We've got our two important moving parts here.
We're going to have to plug in a little bit of more UI, but let's go ahead and work on making this, make the jobs actually run over in the system and do their thing there.
Then we'll put the UI in place, okay?
|
|
show
|
7:42 |
In this video, let's work on making this background job system work and I'll just introduce it real quick.
So what we're doing is we're creating one of these things I'm calling a background job and storing it in the database.
So it has a time it was created, started and finished.
You know, neither of these are set by default.
Definitely not the finished.
They start out awaiting to be processed and they are not finished, but they do require an action and episode number and a podcast ID.
Guess what, that's what we're passing along, right?
This is being used over by this background service.
You saw it's called this function.
And what it's going to do is just literally create one of these and then save it and then give it back to you.
This save generates the sets the database generated ID and that's the ID that we were just playing with.
And then somewhere down here, we can ask, is the job finished?
So it just goes and gets the job by ID, which is literally find one ID equals the ID you asked for.
And either it's false if it doesn't exist, or it checks to see if it's been set to be finished.
Now that all gets kind of put in the database and chills.
And then this function, the worker function is the thing that goes around forever and ever.
Now, first of all, notice there's something very uncommon here says the background services starting up and it's based on async and await and just the default, not the default, the FastAPI's core asyncio event loop.
And what it does is it says while true do a bunch of work, you'd think that would block up the processing.
But no, no, it's really cool.
So what happens is, when you go to await something that frees it up to just keep on doing, you go pro doing other processing.
So it'll say, get me one of these jobs, if there is nothing, just go to sleep for a second, and then go through the loop again, which means get asked good for a job again in one second.
But if there is one, get us the job and start processing it.
There's an error.
Well, that's too bad.
Get the next job.
Sorry, job failed.
And finally, it's going to come down here and says, all right, well, what was I supposed to do with this job here?
Was I supposed to summarize or transcribe?
And then it goes to the AI service and says, transcribe episode.
This is where we're going to write our assembly AI code using their API.
But this is how it happens kind of automatically.
And then once it gets through whichever step it takes there, it marks it as complete and successful in the database.
So that when we're over here, we say, give me this thing.
Is it finished?
When we did that successfully, it'll say, yeah, it's finished.
Here's your information.
You know, whatever has to happen when that transcript gets generated and so on.
So this is how it works.
Now, where does this run?
If you go look in the main, you're going to see nothing to do at all with background services.
We got secrets, routing, templating.
That's the chameleon connections there.
We'll get to that very, very soon.
And the caching, this is just if there's a change to a static file, we don't have to actually make any changes or hard reloads.
There's basically no cache, no stale stuff in our cache as we develop it or even as we run in production.
But there's nothing here.
So where is this happening?
Well, FastAPI has a really interesting restriction, but also way where requires anything that uses asyncio and async and await.
It needs to share that core asyncio thread, but FastAPI manages it or the server running FastAPI manages it.
And it doesn't exist until you get down to here or you pass it back or that something grabs it and runs it like UVA corn or granny and those kinds of things, hyper corn.
But it's not available to us yet.
And this is where it gets funny.
So over here in infrastructure, we have app setup.
So there's this thing called an app lifespan that we can plug into FastAPI and FastAPI will say, look, I know you got to do some async stuff, like for example, initialize MongoDB and it's async loop needs to run on this core FastAPI asyncio loop.
This is going to be called when everything starts up.
Right?
So FastAPI is going to call this app lifespan for us when everything starts up.
How does it know one thing back over here?
You go over here where we're creating the FastAPI stuff, I'll wrap around so you can read a little better is right here.
It says when we create the app, we say don't have any documentation in your website.
Are you in debug mode?
Right?
Or development mode?
Right now we say true.
We could switch that to false later.
And then here's the code that you run at startup.
So it runs all of this stuff.
And then it yields to run the entire application.
When the app shuts down, there's like a shutdown cleanup if you want.
There's nothing for us to do.
But the core thing is register the background jobs here.
So what we can do is we can say asyncio create a task.
And what we're going to do is we're gonna give it the background service, that worker function and call it that'll kick it off there.
Now there's a warning, it looks like I did something wrong.
Perhaps I have but this is saying look, this is not awaited.
I really, really think async and await in Python are awesome.
But there are some big time mistakes in the design here.
And one of the big times mistakes is there's no way to just say start this work.
I don't care about the answer.
I don't care about it.
That's why you can't just call this function directly.
And then it runs.
It's going to do this create task.
Then also it expects you to wait on it.
I don't want to wait, I just want to put a bunch of stuff out into the ether and let it run as long as the app is alive.
So I'm have to use this like this is not an error.
I mean to not wait on this, right?
We'll come back for search when we get there as well.
And search is going to be another thing that just runs in the background periodically, it'll wake up index some things and go back to sleep.
Here we go.
This will have the background job actually kicked off and running.
It's going to run this worker function.
So this is how we get this asyncio function running continuously in the background doing a while true on the right loop in FastAPI.
The next thing we got to do is to actually transcribe the episode at assembly AI.
Fun.
Right now if we run it, it's not going to take very long.
I guess we could go ahead and just complete this loop here.
Go over here and hit go.
And we get this new job and we can check if it's done.
I imagine it probably is done.
Yeah, look at that.
So we started a new job and it's finished because well, when it ran this, it didn't have to wait anything.
So it marked this finish in the database.
Excellent.
It looks like everything's running and maybe one more thing we can check up here is that right there.
The background asyncio service worker is up and running, just cruising along.
You see it's getting these new jobs and then the finished.
Yes, it is finished.
|
|
show
|
3:01 |
All right, now we're getting to the heart of the app.
We've got this cool web stuff, but the goal is to work with audio using AI and Assembly AI, right?
And here we are at their website.
I'm sure you've checked them out already, but assemblyai.com.
Turn your voice into chapters, insight, summaries, transcripts.
Here we go with their leading speech AI models, right?
They're focused and they have been for years just on doing cool stuff with audio and speech.
So what you're going to need to do is you're going to need to create an account over here to get started.
And then once you have your account, go over to your slash app slash account here, click on account to get there.
And you're gonna need this right here, your API key.
This is not actually my API key.
So don't feel like you've got a big secret, but notice if you hover over it, you can say click to copy and it'll give you your real API key.
When I take that API key, go over to our settings and put it in there.
Remember in the template, I'm not showing you my real settings, but it just goes right there, right?
One, two, three, four, five, whatever that was, it goes right into there, but not in the template, but in the real settings, right?
I'm just not showing you mine 'cause it has my API key, which I don't wanna share with the world 'cause I don't wanna pay for everyone's transcriptions.
So make sure you go over and put that in there.
And then once you do, you'll notice there's a right up at the top here, it says init secrets.
We go to it, it just goes to assembly AI settings, sets the API key.
Where does assembly AI come from?
Well, it is installed and it should already be installed.
And it's gonna be installed as one of the dependencies.
So this comes from PyPI.
And this is the official API for working with, official Python package for working with assembly AI's API.
It's their Python SDK, as they call it.
And you can see they got a bunch of cool examples, how you work with it, real time stuff.
This actually, this landing page here on PyPI is actually a really good resource for what they're doing.
And we're gonna go through and use it and I'll show you how to do a lot of different things with it already, but there's more things to do if you'd like to explore more.
So in summary, assemblyai.com, create an account, go to the account, get your API key, copy that over, put it into the, we shall never look at it settings.json in the API key section and the app will take care of setting everything up for us from there.
We've already pip installed assembly AI and wired that together as I showed you right here in main, like that's assemblyai.settings.api key.
Then for the rest of the app, it's good to go.
|
|
show
|
3:03 |
We've got all the supporting pieces in place.
It's time to start using Assembly AI and actually transcribe this episode here.
So we're gonna need to do a couple things real quick.
First, we're going to need to get a hold of the actual episode details because the way it works is we're gonna have an MP3 file equals something like that.
Then we're gonna pass that off to Assembly AI.
We're not gonna download the audio or provide the audio in any way directly to Assembly AI, although we could.
We're just gonna say, see that out there on the internet, go grab that and transcribe it.
It can be things like MP3 files or I think even YouTube videos.
There's all sorts of stuff that we can put out there and say, go find that on the internet, get it for yourself and transcribe it.
So in order to make that happen, the first thing we need to do is say the podcast is equal to, and we have this podcast service, podcast from ID and what are we passed in?
A podcast ID.
So perfect, we'll say if not podcasts or podcasts is none, depending on how you wanna look at it, we'll say, right, it's not that we could do anymore.
This is done, so we're out.
We also need the episode.
And we need the episode by number.
We'll have the podcast ID and the episode number that we're gonna get.
Now, these also are themselves asynchronous, so don't forget to await them because what you'll get back otherwise is a coroutine, which is never none, but also doesn't work.
It doesn't do anything.
So there we go.
So we have our episode.
I'll have basically the same setup here.
Something like that.
All right, so now we know we have our episode and the most important thing here is gonna be going to the episode and saying called the enclosure, enclosure URL.
And that's just RSS speak.
And let's just do a little printout.
Would transcribe.
There we go.
So let's just go and hit that and try to make one of these jobs again.
And we'll just say, let's consider doing episode 344 of Python bytes.
Gives us a job back.
And what do we see?
Would have transcribed.
Oh, look at that.
Python bytes, episode 344, AMA ask us anything from that URL.
Let's see if it works.
Sure enough, there it is.
Perfect.
Look at that.
So we're getting all the data that we need here.
We haven't yet sent this off to assembly AI, but we've got everything we need.
The next thing to do is gonna be to use assembly AI and import that and start calling functions like with the transcriber.
|
|
show
|
8:05 |
Transcribe in time, let's do it.
So, Assembly AI, we have to make sure that that's imported.
We wanna create a transcriber.
So we're gonna do things with this thing.
We'll say transcriber equals that.
And we're also gonna need to pass a configuration file.
We could just call transcribe, but you'll see there's an incredible number of options and features available.
So we'll have to come up with a config equals assemblyai.
transcriber config like this.
Now let's see what we can pass to it.
If we jump over here, yeah, there's a couple of things.
So we could pass in the language, whether or not to do punctuation.
And we're gonna wanna say true for that one.
So let's pass in some things that we'll want.
That one is true.
Format text, dual channel.
What else have we got here?
That's a lot of help text it was trying to show us.
We could do a subset.
If we knew here's a huge long thing, but I just want this little part of it.
Also, if you have certain words in, doesn't necessarily make sense in a completely general app like this, but let's suppose you're in like the medical field or in technology, programming, Python, you can say these are words that often get confused and put it for something else like PyPI might be spelled like the food instead of P-Y-P-I.
You don't want that.
So you can use word boost to say these words are really important in this particular area.
All right, and you can say filter out profanity, redact personally identifiable information, which is excellent.
You can put speaker labels on there.
None of these we're gonna put on ours yet.
Oh, I really need that to not be in the way.
Disfluencies, however, if you like, when you don't do that, you don't want those kinds of things in your transcripts.
So you wanna, if somebody says, I went to the store, it might be nice to just say, I went to the store, right?
You don't need all that babbling stuff in there.
So we'll say disfluencies, false.
We don't want them transcribed.
We want them omitted.
You can do sentiment analysis, auto chapters, entity detection, summarization.
We're gonna actually use lemur and the LLMs for this later.
So that's great.
That might be all we're gonna add.
Now that I think about it, let's actually put a format text is true and just so it doesn't do it, 'cause arbitrary set of speakers, I'm not sure how well that's gonna work for us in this situation.
So we'll say false.
So now come over here and we'll say transcript, almost future equals transcriber.
So we could say transcribe, and this is a blocking call, but we have such a nice setup here with async and await.
And if we do a blocking call, it's not just gonna clog up our transcribing service.
It's gonna actually block up the entire FastAPI event loop.
We don't want that.
So we're gonna go and use the transcribe async version here, and we'll pass for the data, we'll pass the MP3 URL and for the config, we'll just type in the config here like this.
Now you might think, Michael, you forgot your await.
Put your await there.
No, no, I'm not doing that.
What I actually get back here is, I guess I gotta just do something with it.
So the little error goes away and then I can hover over it and show you.
This is a future of T, a future of transcript, which is cool, but it's not something I can await.
It's a thing that just lets me ask questions.
That thing I started, is it done?
No.
Is it done?
No.
Is it done?
No.
If it is done, then what is its result?
So it's kind of like a intermediate working thing.
So I wrote a function down here at the bottom called run future, and it runs a concurrent futures dot future, and it returns the result.
In this case, it was a future of transcript.
So this result will return a transcript.
And this thing itself is actually awaitable.
So it's a little bit of a hassle, but we just have to say transcript, which is a transcript like that from, let's say, assembly AI dot transcript to make it really clear.
We'll await run future of transcript future, like that.
Woo, and let's just print out transcript JSON response, and let's print it in a way you can see it really clear.
If you put an indent into the JSON dump S, it'll automatically format it.
Okay, this is gonna be it.
We're gonna kick it off and run our transcribe here.
Go ahead and restart it so we know that it's gonna work.
Pull this back up.
I'll hit it.
We'll go ahead and transcribe this episode 344, and it's starting.
And let's see what's going on here.
Starting new job.
Would transcribe that.
It shouldn't say would.
It said we are transcribing, right?
We are transcribing.
Dot, dot, dot.
And it's running.
The next thing it's gonna do when it gets done down here is it's gonna do this print statement, or we'll see an exception in one of those two.
There we go.
It's done.
I told you it was pretty quick, given how much data we're working with.
So here we have it.
Look at how ginormously large that file is.
In fact, it scrolled out past the buffer.
So what are we getting back?
There's some things we'll talk about, look at the pieces, but what's important here is each word comes in.
So that, it starts at this timestamp, and it ends very, very quickly after.
You get a confidence, like how sure is it that this is the word?
And if we had said set speakers, and it would tell us which speaker it was.
All right, let's see if we find one.
Here's one, it's not so sure.
Assume wasn't awesome.
I don't know, just making that up.
But it's 99% sure, not 100% sure.
And then that people, this.
So what our end goal is gonna be?
Remember I showed you that cool printout, or that cool view with the green sections.
You can click on the sentences and view it.
We're gonna take this stuff, and we're gonna turn those into sentences.
And if you scroll through a little bit, like this one, other shows dot, we'll just look for punctuation.
We'll just say, we're gonna go through until we see a period or exclamation mark, or question, or something along those lines.
And we'll consider that to be a break.
Or you could use it based on time.
You know, if it gets too long, we'll just do a line break there if it runs on.
So we'll do a little bit of that magic to turn this string of word after word with timestamps into transcript sentences that we can use for our display, for our search engine, all those kinds of things.
But that's it.
And look how incredibly simple, right?
Create the transcriber, specify how you want the transcription to go, tell it to go.
We want to be able to keep our code responsive and zippy.
I didn't show you while it was running, but FastAPI is completely in charge to still do whatever it needs.
This is taking basically no effort.
It just kicked it off to the internet and let that thing go.
And then eventually it's gonna come back and it'll wake up and run.
So it adds almost no overhead to what's happening.
So because we're using this async version, it could just keep on cruising.
When it's done, we got our results.
Of course, we're gonna want to save this to the database and not do it over and over again.
But this is the basic way that we do transcriptions with Assembly AI from Python.
|
|
show
|
8:54 |
We look over in the database side here, we have this transcript, which is not the same transcript as you just saw right here, not the assembly AI transcript, but the one we're putting in the database, because this is both expensive in time.
And on a grand scale, if you did tons of transcripts, it would be expensive in money.
Any individual one is quite cheap.
But you know, the idea of this app is maybe you give it 1000 podcasts, and every episode that comes in, you transcribe that.
You don't want to do that every time somebody wants to view it.
Plus, you want it to be milliseconds, not seconds.
So what we're going to do is save that to the database.
And this thing will have when it was created, when it was last updated, if there's a summary available around it, or if you just have the straight words, and this transcription word kind of captures those little blocks that we were talking about in there as well.
And then you have just the full text and so on.
Okay, so we want to save it to there.
But first, let's see if it already exists.
Let's go over here and we'll say DB transcript equals now in here, there's a full transcript for episode, and we can give it the podcast ID and the episode number.
And I'll wait.
And we'll say, if it already exists, let's not recomputed.
Okay.
We'll say the transcript for podcasts, whatever, episode number already exists, skipping, and we'll just return DB transcript.
This thing returns one of these episode transcripts, or none if it doesn't exist.
But now in this case, we've checked it.
So this is going to be an episode transcript, like we expect.
Now it's not doing it.
There's not going to be one yet because we have yet to save one.
Next down here, instead of dumping this out, that was fun, but let's not do that.
Up here, let's go ahead and create one of these DB transcripts, because it was none from above.
So we'll go ahead and create one of these episode transcripts.
And let's pass in some values.
See what we've got to pass.
ID we don't pass, the revision ID we don't pass, created date and updated date have defaults that come out of the, when they get inserted to the database, and now is good.
But we can start going with episode number equals, well, episode number, podcast ID, podcast ID, kind of just pass a lot of this information along here.
Words, we will set in a minute.
Summary, not now.
Error message, none.
So we want to set whether it's successful and how do we know if it was successful?
So in this thing, returns a status, and we can check what that status is equal to.
It's a string, but just like before, I like having these enumerations when the set of strings is very limited.
This comes out of assembly AI.
They've got queued, processing, completed, or error.
And since we know it's not processing, not queued, it's going to be one of them too.
So we're just going to check to see that it is completed here.
Okay.
So it's successful.
If it was completed, the status is going to be the same value, but just stored directly.
So if for some reason we get a weird value, we can just look and see, okay, that's what that is.
We have an assembly AI ID.
When this comes back, it's going to give us an ID.
And later in other parts of the assembly AI API, you can say instead of passing all the data or retranscribing and say, you've already transcribed this thing.
Here's its ID.
Let's do something with it.
Let's ask questions about it and so on.
So we're going to store that in case we need it later.
And I'm not sure this is necessary, but again, this takes a lot of time, relatively speaking, and it does cost a little bit each time.
So we want to maybe store, overstore the information available here.
So I'm going to say the JSON result is going to be the transcript.json response.
That way, if for some reason the way we process the words and then generate additional information, like if we change how that works, we don't need to recompute it.
We'll have that there and we can kind of start over.
Okay.
And now one thing I did not set, the most important thing maybe is the words, because what I actually want to put in here is a little bit different.
I don't want to store.
Let's go over here.
I want to store things a little bit different.
I want to store the text, the start and time confidence, and not the end time, for example.
So first, let's check and make sure this was successful.
If not, dbtranscript.success, we'll do an exception here.
We'll say error processing this transcript, the status was that, and here's the error message.
Or if there's no error message, just instead of saying none, we'll just leave that blank.
Okay.
So we'll raise an exception.
And finally, let's store the words into our database transcript as well.
So we can just loop over the words that came back through the API.
So for word in transcript, the assembly AI thing, .words, we're going to go and compute time in seconds.
So when we're working, say, with audio in the audio players, we want to set the seconds, not milliseconds.
So we'll say the start second is just divide the word start by a thousand so we don't have to do math all the time.
Then we can say tx word equals transcript word.
Text equals word.text.
Start in second is start second.
Confidence equals word.confidence.
And finally, we want to put that into our dbtranscripts words.append, which is a list.
And MongoDB, if you're not familiar with it, you can store lists of data and nested items and all those kind of things.
You're not restricted to just tabular data.
So we're going to store these as sub objects in the database.
So our database object is all initialized.
We just want to wait db object transcript dot save and then return dbtranscript.
So we get something there.
Excellent.
So in review, how do we add it to the database?
First, we check if it's there.
Do the transcription just as before.
No longer print it out.
We're going to create one of these objects with the data we got back.
If it's not successful, well, let them know.
And then convert these words over into our own objects.
We'd have to do something like this anyway, because we need a hidden model that's going to go into the database anyway.
So we're going to do this and then save it and give it back to the rest of the app to work with.
Let's do one more transcription here.
We'll do the same one.
But the last time we're ever going to do this episode's transcription because it's going in the database.
Speeding up ahead just so you don't have to wait for the transcription.
All right.
It's all done.
Let's see if we can see it in the database.
I'm going to open up Studio 3T.
You can use whatever thing you want to talk to Mongo.
But this is there's a free version of Studio 3T.
That's pretty nice.
I'm just going to connect to the local database and here you can see X-Ray podcasts.
What have we got here?
Transcripts.
And look at that.
Here it is.
The assembly AI ID is apparently that the time is created.
Here you can see there's the JSON result.
And let's go way down.
Here we go.
These are our words.
Start and second.
The text and the confidence again that we got there.
Excellent.
So if we try to transcribe that again, let's see what happens if I go over here and say transcribe.
We look and see what the text says.
Starting a new job for this episode.
Boom.
This one with that number already exists.
Skipping.
Like we had at the top.
Perfect.
Right there.
So we don't need to transcribe that again.
We stored it in the database.
Now we just got to do something interesting with it, right?
|
|
show
|
1:24 |
Now one thing I noticed when that ran is that it said starting job and unless we ask it didn't really say finish.
So let's do two things real quick.
Let's go over here and track how long this takes.
So we can create grab the time when this starts like that.
And down at the bottom here before we return that value, we can say the DT change in time is dot now minus the start time and then let's just print out a message that says processing complete or transcription.
Elapsed time is however many seconds that took.
We needed that precise.
Let's put down to 2.2 seconds.
There's two decimal places at the end there.
Okay.
So now I have a little bit more awareness even if we're not actually looking at the UI or playing with UI to see it.
All right.
I think this transcribe is done.
Pretty cool.
Right.
I mean there's a lot going on here because we're storing stuff the database.
We're doing data conversion and things but the essence of it is you know not terribly hard right.
Just create a transcriber.
Tell it how you want to transcribe it.
Pass it the data URL and off it goes.
for it to finish.
|
|
show
|
16:02 |
Well, it's awesome that we have our transcribe core functionality done, but the UI, the UI, let's go back to the UI.
We're playing with this AMA on here.
Notice already that the transcript behaviors actually kind of working, right?
Or when there was no transcript, like let's go to another one, it says create transcript.
And we go down here to the AMA and it says view transcript.
That is excellent.
But the way we got that to happen was typing weird URLs into there.
Let's go and make it now so that we can click on this.
Right now it just puts this hashtag transcribe at the top, not really helpful, is it?
So let's wire in our HTMX behavior here to call that function that we were previously doing by hand.
So we're going to need to go to this URL here.
I'm going to just copy that because it's going to be a good little template for us.
And then over speaking of templates, if we go to podcasts and episode, well, here's where we got that create transcript and let's wrap that around a bit.
So what we want to do is we want to say instead of just going to the URL, we want to have some HTMX behaviors.
First of all, over here, I have installed, I've installed HTMX, the site here.
So if we go down, you can see right here at the bottom of the page, we're including this HTMX minified version into the website.
That's it.
It's installed.
There's no post build step.
There's none of that weirdness.
It's just included in that thing, that layout, shared layout thing includes it throughout the site.
So given that it's installed, we need to say what is going to cause this URL to be called.
Okay.
Well, let's specify the URL first.
So we can do HTMX and you can see all the different things.
We're going to do a get to something like this URL, but notice this is just standard text.
The action is going to be transcribe.
It's always the same, but the podcast somewhere up here, you can see the podcast is passed in.
So we need to pass in that and the way you do this in Chameleon is you say dollar curly things and the podcast is an object.
So we can just say podcast ID and then the episode, same deal, episode dot episode number like that.
So that's going to say, here's where we go when an action is taken with HTMX, but what causes the action?
It could be a keystroke.
It could be a mouse over.
It could be a timer.
We want our trigger to just be click.
So when we click the button, go do this thing.
Let's see what that does.
Refresh this page and I'll actually move this up so you can see below if anything happens.
Click that.
Now what showed up in here is really funky, but it's working.
See the bottom?
And in a minute, it's going to say it's done.
But what is this?
Well, remember when we went to that URL, it gave us back this little fragment of JSON to say what was going on.
Okay, I'll leave this going here a little bit at the bottom so you can kind of see when it is actually done.
So what happens is when you have one of these HTMX behaviors, it just says, I'm going to go to this URL and replace this element with whatever I got back.
What I should have sent back is I should have sent back some HTML.
Not just this particular button, but I want this whole section to turn into like, hey, we're working right now.
Please don't do more stuff.
So I can say HX target equals this.
So instead of replacing the element that caused it, it's going to look up here and do that.
And I'm going to do HX swap is outer HTML.
So take this whole div, basically delete that and put in whatever we put back from the server.
And what we'll return from the server is something that says, hey, we're working on it.
Just hold on.
So let's go back to our AI view here.
And instead of returning this, we want to somehow render a template.
Now the way this has been working throughout this whole site, and if you've seen it, if you've been paying attention, you'll already know, but I'm using a library called FastAPI Chameleon that I created actually, that allows us just to wrap or assign a template here.
So let's go back.
And instead of doing this business, we're going to say FastAPI Chameleon, import this.
If you're like, I don't like Chameleon, I want Jinja.
There's also a FastAPI Jinja template that is derived from this that you can use as well.
So regardless of what template language you're using, you can do whatever you want here.
Let's go to the template section and I'll make a new folder called AI to match the views.
And in here, let's make, go for us, regular file that I'll call HTML.
I don't want all that stuff in there.
So I'll call this job running dot HTML.
And so we're going to say this is going to be AI and then job running like that.
And this is where this thing comes in.
So we're going to return to that dictionary and return the VM dot to dictionary.
You don't need to say start a job there because it already started.
And the other important thing is we've got to set the job ID to job dot ID that we got back from the database and the background service.
Because this ID is going to be used right there later.
Let's put something meaningful in here because a blank page is going to look pretty blank.
So let's go over here and I have an image actually, and static image and it's these little working busy ball sort of, hey, the site is working type of thing.
And then let's put this into a div like that.
And then it's going to grab the job name that comes out of the view model.
Again, that's why we're using it and where's it right here.
And while we're returning this right, everything, all the stuff here, all those things are accessible to it.
So at first, we're just going to say it's running and nothing else.
Let's go ahead and just recompile and restart that.
And it says it's done.
Now it says the transcript exists.
I'm going to have to just go find another one.
Let's do this one.
And we'll go ahead and kick that off.
And look at that.
Boom.
Transcribing.
Hang tight.
That's excellent.
That is, that is really cool.
I think it tells you like, look, it's working and we'll just let it run for a second.
Keep your eye on the bottom there.
Hey, it's done.
The formatting is weird.
I got to work on that.
But it still just says this.
So we need some way to tie that other piece together.
Here, this check job.
Okay.
Let's use another type of deal here.
Say the template is job completed.
And we're going to go into this one and say, check the status.
If it's done, we want to give them some kind of UI that says your job is done.
Click here to view the transcripts.
Otherwise we're just going to leave it with that spinning background thing that we got from here.
All right.
We're going to need to make this job completed HTML as well as in be completed.
There we go.
And let me put a little bit different HTML there.
I'm going to say success.
The whatever it is, transcript in this case is ready.
Click here to view it.
Okay.
And it'll be like, click here.
This is all part of that view model data that gets passed back.
As it could be a transcript, it could be a summary, et cetera.
Okay.
So we're going to return that back when it is finished.
But the other trick is, well, how do we get this once it, this goes back to that page.
Remember we click the button, HTML swapped it out and here's plain HTML.
We need to say somehow to HTML, check again, keep checking.
Just ask, is that job done?
Hey, is that job done?
When it is, then do something.
Okay.
Then show the completed.
So in order to get that to happen, we're going to go up here.
I'm going to do some more stuff here as well.
HTML target is this, and we're going to use a timer.
So when this gets sent back, it's just going to say, Hey, just check every second or so and see what's going on.
Now let's line these up so you can get a little better look at them.
For the URL, we're going to set the HX get to just go to AI check dash status job ID.
Right there.
The ID comes from the view model when you pass it up over so it can use that to loop back around.
And again, the trigger is going to be every, how often do you want it run?
Every three seconds.
So just call that URL every three seconds and swap out the outer HTML with whatever comes back.
So how do we make that happen here?
Right now, it just prints out something.
So we're going to need to use this template to respond, depending on whether it's finished or not.
Now over here in the view model thing, we have this check job view model already.
So what we want to do is just create that on this side and I create one of those.
See what does it need?
It needs the request and it needs the job ID.
So request is good.
We already have that.
And this is going to be, it needs to be parsed over from a string.
I don't think, don't think FastAPI will do it.
If it did, that would be cool, but I'm just going to assume that it doesn't.
We'll say await VM dot load.
We go over here and look at this real quick.
It has to do, basically this get us the job and check and see if it's finished and so on as before.
Okay, so it's going to do that here.
And let's test.
We'll say if not, if for some reason that failed, it returns an error message.
We're just going to use this thing in the infrastructure section called the web utils that can return an error status code.
And this will return a FastAPI or starlet response with those values set.
Okay.
If it's not finished, maybe we'll just print out something.
The job, just that it's still running, something like that.
Just give us a little bit of feedback that there's something going on here.
We don't want to show the completed HTML using the completed template.
We want to use this one up here, this job running.
So we got to write a little bit different than the standard code here.
We go to FastAPI chameleon and we can give it a response and we just set the template name and then says, what is the template data you want to pass?
We're going to pass the template data by saying VM dot to dict like this.
And we want to return that.
Otherwise we'll just return a VM to dictionary as well, but it's going to use that template.
Okay.
It's not a lot of code here for all those little weird edge cases, right?
So when it comes into check, it's going to say, is the job done?
This check it's going to be done triggered by HTMX's polling trigger.
It's going to load it up, load all the details, check the job.
If there's some kind of error, like the episode doesn't even exist, well, we'll just say, sorry, no idea.
If it's still running, we'll do a little print and we'll send back the same HTML as before.
This one, it'll replace it, but then it'll start checking again.
And eventually when it is done and we just return the details and this will render the completed UI, which says your job is done, click here to see it.
All right, let's test this out and see how it goes.
Over here we have this one that's never came back, never told us anything.
So that wasn't great, but it did finish and there's this transcript.
Let's go find one that doesn't have its transcript.
This episode where we just ship open source.
In this episode we announced Brian and I that we'd created three different open source projects that week and put them on PyPI.
That was kind of crazy.
It has a transcript as well.
Let's try this one.
Okay, no transcript here.
Fantastic.
So we should be able to do this.
And let's go ahead and hit create transcript, transcribing, hang tight.
Should see the check in two seconds.
There we go.
The job is still running.
Look at that, it's checking.
Job is still running.
Still running, you can't quite see because it just says the same thing over and over again.
I go over here and erase it.
Shouldn't a second say still running.
It is.
Hold on.
We'll let assembly AI do their magic and see what happens.
It's done.
Oh my gosh.
And look at that.
Just a second or two later it says success.
Your transcript is ready.
View the transcript.
We're not quite ready to view the transcript yet, are we?
But it's done.
And look, no more.
If I erase it, there's no more polling.
Is the job done?
Once it's done, how does it stop?
How does it stop polling?
Over here when it has this, it returns this HTML version that says check.
Here's a new one.
And the new one says check.
Here's a new one.
The new one says check.
But when it's done, it says here.
Here is some plain HTML with no triggers or HTML or anything.
It's just, here's your HTML.
So this thing is done.
There's no more interaction.
Very, very cool.
So we've got this whole end to end.
Create this background job.
Use HTMX to kick it off and watch it.
Send it into this whole infrastructure with AsyncIO.
And then it kicks off this asynchronous transcription at Assembly AI.
And then it polls for it.
Just check in the database.
Hey, the thing you put in there, has it marked done yet?
No?
Okay.
We'll keep checking.
When it is, here we go.
The last thing to do is have a UI around this viewing of the transcript.
And they are in the database.
And if I refresh it, you can see View Transcript because it exists for this episode.
|
|
show
|
4:14 |
Are you ready for the grand finale?
So what happens if we click this transcript button?
It turns out the website already knows how to take the thing in the database and render it.
Let's give it a click.
What?
There it is.
Okay.
So a new cloud computing paradigm at Python Bytes.
In this episode, we switch everything over to like one giant Docker cluster for 15 apps instead of a bunch of servers.
It's a lot of fun, but let's look at this.
So notice the little green bar goes over this and we have a different time.
If I click, no place for podcasting.
Not a place for podcasting.
Not a place for podcasting.
So I'm hanging out here downtown in a...
How about that?
If I click here, but Michael, what do you want to tell us?
But...
Michael.
Michael, do you want to tell us why you're...
You want to tell us why?
You probably can't hear that playing back very well, but it plays right wherever you are.
You just go in here and you click it.
Boom.
And you can see how accurate the transcripts are by just clicking around.
That is a super cool feature.
Let's just review real quick how that works.
So over here you can see the URL is podcast transcript podcast name episode episode number.
Right there.
So all it does is comes in and create one of these episode view models, loads it up, and then renders it out on this piece here.
Most of the magic is right here.
This constructor just stores the value, but it's where load data happens that it gets interesting.
So we just get the podcast by what's passed in, the episode by number that's passed in.
There's a summary.
We grab that.
We don't have the summary right now.
That's another feature coming, but so this is empty.
And then it just says, give us the transcript text for an episode.
So let's jump over and see that.
This is going to return a sentence which has a start time and text.
So you can see basically one of those lines there in the green.
It also caches it so it can be faster over and over if it asks for the same thing again.
So it comes in and says, give us the words.
This just goes to the database and pulls back that transcript record we saved and just returns just the words in an efficient way, not selecting other things.
And then down here it just goes through and it basically says, I'm working on a sentence right now.
And then ask, is this word the end of a sentence?
If it is, we're going to finalize the sentence and store it and then start over.
Otherwise, just keep piling on active words.
So down here, is it the end of a sentence?
So it just says, does the thing end in punctuation?
So is it a period, a question mark, or an exclamation point?
If it is any of those, the last character, then it must be the end of a sentence.
There's probably a more advanced way we could do this, but actually it looks pretty darn good, doesn't it?
I mean, look at this.
It was 13, now it's 18 degrees Fahrenheit, so negative 9 Celsius if you prefer.
Boom, period, new line.
Celsius, no power, four or five days, period, new line, new sentence.
So that's what this does.
And then this UI here just loops over, show the time, show the text, and then it has a little JavaScript that says, play this thing at whatever time in seconds that is.
Remember, we can convert it from milliseconds to seconds.
End to end, I think we've got a really cool feature.
We're over here, we're checking out our podcast that we're following.
I didn't follow that one yet.
Maybe I'll go ahead and follow it.
Now it's followed.
Come over here and I see a look.
This one has a transcript.
Let's view it.
There it is.
Back out of the database straight away.
So really, really cool feature.
I really like how this came together.
It's only a little bit of what you can do with Assembly AI, but it's going to be the foundation of what we're doing.
Later on we'll do some summaries and chats, but often those are going to be based on the transcript.
Like I said, it's the foundation for adding more than just show notes and episode names to our story of our X-Ray podcast.
It's where the X-Ray comes from.
|
|
|
24:45 |
|
show
|
0:41 |
This is gonna be a really fun chapter.
We're going to add search for our app and it's gonna search through standard stuff that you would find in the podcast, like episode titles, show notes, and that sort of thing.
But we just finished that cool transcript feature.
Let's add transcripts as well into the search.
And later when we get things like AI key moments of the episode, well, we're gonna search them as well.
So we're gonna build a search engine and plug in all this extra data.
And as we get more, we're just gonna keep enhancing it.
It's a really cool feature.
It's actually super easy to do.
And we use HTMX again.
|
|
show
|
1:56 |
I've added a few things again.
We're into a new project up here, the 02 feature search.
And down here, the second feature search, that virtual environment is active again.
So we get a nice, fresh, separated copy for this project.
Now I've added a couple of things.
So I've added a search views here, and it just has a basically show us a search page.
We'll enhance this as we go.
Also, here we have this little bit of HTML.
And the most important part is this input box.
Doesn't do anything magic yet.
See down here, search results go here.
Also nothing magical yet, but they will be pretty soon, they will be.
We have a search view model that uses this thing called the search service, which is gonna actually be the search engine.
I'll show you that in just a minute.
It's a custom search engine that I wrote that you can very carefully control how everything works.
And it's going to just run in that same background, async I/O set of tasks that we did for transcribing, for synchronizing the podcasts, all those things.
And search services here, we'll talk about that in a minute.
And of course, we needed to register it as well.
So if we fire up the app, find the link, we now can click on this search thing, and it says search deep within the podcast.
We use Assembly AI to generate transcripts, summaries, not yet, but we will.
And surface all that information through a lightning quick search.
Go ahead and see what you can find.
Well, nothing, nothing yet, but it's going to.
It's going to load that up and do really cool things once we get the search plugged in.
At the end of this chapter, that statement will be true.
So this is where we're starting, and we're gonna dive in and write some code in just a second.
|
|
show
|
4:34 |
Let's just do a quick quick look at this search engine that I built so you know how it works It's gonna be using a couple of interesting pieces When it starts up, I'm gonna turn this number down for now I'm gonna put that to five seconds in this chapter and then later on it's gonna be a little bit slower But the website will start and then five seconds later the search background indexing will kick in so we don't have to wait too long To see what's happening Then how often does that run it says every five minutes and for this chapter, I'm gonna make that every One minute 60 seconds.
We're gonna use this thing called an NLP, which is a language.
What the heck is that?
We're gonna go out to spacey.
Spacey is a natural language processing System.
It's really cool.
There's a bunch of ML stuff going on here, but we're not going to do anything fancy What we're gonna do is we're gonna load up one of their language models and you can tell it to do things that make search Engines a little bit nicer through what are called lemmas so we could give it like this text and say Go through here every element that you found in that text and give us the lemma underscore.
There's a regular lemma.
That's a different thing The text you want is the lemma underscore.
Anyway, what this does is it?
Pluralizes words.
So for example if you had a podcast that said it was entitled the geese of Canada But you search for goose if you just look for the keywords Well, you're not gonna find that episode even though that's probably the most relevant thing in the entire library with the geese of Canada when you want to know about a goose or If I'm looking for friends and they only talked about a friend, right?
You might want to have that come up so it'll do things like change the pluralization or goose to geese and There's a whole bunch of other little Variations in there.
So we're gonna use spacey to and this NLP aspect of it to come up with those lemmas those Normalizations of the words that we find throughout this whole thing We'll be able to manually trigger an indexing So for example, if somebody adds a new podcast We want to instantly start indexing not wait for every five minutes or every hour We can run a search.
We'll search some text like Geese and birds or whatever and it'll break that into a bunch of independent Keywords and then it's gonna run a search on those keywords.
Okay, so we'll see down here It's gonna go through and say look we're doing all this search All right, we're gonna have a database record called a search record in the core essence This is sort of MongoDB the one that we start with Docker.
The core essence is every search record Is gonna have an episode number a podcast ID and then a distinct set of words And that's smaller than you might think so for like an hour-long Conversation on a technical topic you might end up with 1,000 words and we have as you should Absolutely should have an index on this part of our database So we can do an index based search on the keywords that appear and we're gonna use that in an LP to get just the normalized Ones in there as well.
That's basically how the search engine works Finally, there's this task here that just like the part that did the transcribing the background work stuff It's just gonna go down here and say run run run while you go just build Build the index and then wait for however long it needs to be in this case I said five minutes when it opened so this does and a wait sleep for five minutes That basically takes this thing out of the async I'll event process something for five minutes then puts it back in to do a little bit of work And if that was an hour, it'll sleep an hour It's really really low overhead when it's not running and it's also pretty fast.
So that's basically what it is It's going on here and we're going to go and turn it on in the main actually not in the main Remember we got it.
This is all a sink and a wait so it plugs into our app setup just like the other one So here's our to-do.
We'll do asyncio create task Search service dot and there's one called a search indexing task.
And again like before It's not a problem to start this.
In fact, that's the entire point Now so we're just gonna run this while true just like we do the other ones
|
|
show
|
5:24 |
To get us just a little bit more familiar with the search engine, let's just do one more thing.
Let's see how we could bring in our transcript information, because the way it works now, it's only using the basic info.
So let's go down here and it says, ""Build index for a podcast.
And it comes down here, and basically it says, ""Give us each episode.
If it doesn't have any changed contents here, then that's fine.
We don't need to re-index it if nothing's changed.
And then it says, ""We're going to come up with some episode text, is what we're calling it.
And episode text is going to include the title.
And if there's no title, it uses an empty string rather than none, because that avoids a crash when you try to add it together.
Give it a space.
Come up with the description.
This, some of the podcasts, they ship their show notes as HTML.
Others ship them as a PDF.
Others ship their podcast as plain text.
So this little thing, it's job is to, if it's HTML, turn it into plain text, kind of like Markdown.
And then it takes all the tags that might be in the RSS and throws them in there.
And then the base text up here has to do with, what is the podcast title, podcast description, et cetera, et cetera.
So it just takes all the words it can find and makes one giant string out of it.
And then it says, ""Hey, give me the transcript too.
Right?
Here's the full transcript for the episode.
And if there's a transcript, go to every word that appears in the transcript.
Don't turn it into sentences and stuff.
Just give me all the words.
Jam them in there as well.
It doesn't, they don't have to be in an order.
Remember, we're just looking for unique words that appear.
And then we add on any sort of summary information.
We'll generate that later, but eventually we'll have summary information.
And in the end, this episode text is just, what words can we find about this episode?
The ones that it comes with and the ones that we generate through Assembly AI.
And then we just say, turn that into a huge distinct set without duplication.
That's what set to do of keywords.
And we're just going to save that, right?
We're just going to stash that into the database like that and then save it.
So then when we do a search, we just really, it's incredibly simple, actually.
Let's go see the search.
Search episodes means search record dot find, where the keywords contains the word that you specified.
And it builds up this query to say, and the next word and the next word and the next word.
If you pass in like geese of Canada, that would be three different and statements, right?
Shows you the latest one first and then boom, off it goes.
Just iterates it and gets the results.
So that's how this works.
That's how we're able to take things like transcripts and summaries and plug them into this search engine.
And with that, we should be able to run it and see stuff going into the database.
So let's give that a go.
Restart it.
And remember in five seconds right there, I'll clean this up.
It should kick off.
Let's see what happens.
One more thing that we got to make sure we've got going here.
So when I was showing you spacey, that model that they're using right here, this load model is actually, the small one is not super large, but I want to have this work well for you.
So I chose the large English model.
So we need to download this and we can just go over here into our virtual environment and say pip and they already got it.
So we're just going to run Python dash M spacey, download this, which will ultimately pip install the thing.
So let it go.
You can see it's 587 megs, which is why it doesn't come with it.
So give it a second, but it's coming in nice and fast here.
Excellent.
Now that's loaded the search engine, make sure that's present because it needs it for that goose to geese trick I was talking about.
Try again.
Look at that.
It's indexing fresh air, talk Python, accidental tech podcasts, pulling in the episodes that it knows about.
All right.
Indexing complete in 10 seconds.
If we go over to our database again, you can see here for episode number 572 for what we're doing here.
You can see here for episode number 572 for what is this?
Let's see.
Accidental tech podcasts.
We've got month, store, programming, Mac, fee, bootleg, schatzel, ferrite, and become and so on.
Right?
So those are the words and we have an index on it.
Super, super cool.
And how many search records do we have?
2,052.
That is pretty excellent.
Awesome.
So that's how the search engine goes.
That's how we've got to run it.
Make sure you install the language model through that command that just tells you what to do.
If it doesn't work, let it go in the background and just live in that asyncio space and do its thing.
|
|
show
|
1:59 |
Now let's plug this search concept using HTMX into the UI.
So we saw, we go over here and we type, nothing happens, right?
So what we wanna do is make that thing come to life.
And in order to do that, we're gonna use HTMX, which means we need a server side endpoint to get the search results in the form of HTML back to them.
So we'll do an @router.get on a function, we'll set the URL in a second, we'll say async def, search HX results, and it's gonna get a request and search text, which is a string.
And for the URL, let's say /search/HX.
Now for this, we saw what we could do before.
Let's just say, go over here and in our templates under search, now I like to organize these so I have the whole pages here, but then just little tiny pieces under what I'm calling partials.
That really is just the stuff coming back from HTMX, so let's call this partials and I'll say search results.
Now again, we're gonna need to use our FastAPI Chameleon template, and this will be search, partials, search results.
And let's just see if we can get it to echo the search text back, okay?
So we'll say search text here is search text, and in our little HTML, we'll just say you searched for, obviously we'll put some nicely designed things in here in just a second, but let's just make sure it's hanging together, right?
The next thing we're gonna need to do is go and HTMX-ify this thing.
give it a trigger and a behavior and where to put the results, all those things.
|
|
show
|
2:52 |
Here's our search HTML.
Let's make it go do interesting things with HTML.
So we say hx-get equals that URL.
Hx.target, well, previously we had said either this or we put that somewhere else, but this one's different.
We want the input box to stay here, but we want the search results there, okay?
So we're gonna go down and say, that the search results go into the thing with ID search results.
So you say this in CSS by saying hash for ID and then just the value there.
Then we need a trigger.
Now there's a couple of variations here.
We don't want every single keystroke to make a request back to the server.
It could probably handle it, but it's not ideal, right?
It's gonna overwhelm things.
So we wanna put a delay and let's say 250 milliseconds.
Now, another one is, notice how I'm changing the arrows right here.
That would count as a key up.
However, that wouldn't actually change the content.
So we really only wanna do this when the delay has happened, the key is up and the contents of the input box have changed so we'll say changed.
And that together is gonna give us what we want.
It's gonna come over here, hit this.
We'll get our search text, which let's make this optional, stir.
And we'll send it back.
So it's gonna go there, pass over whatever input value is in here.
And then ideally put the results, actually replace this whole section with whatever comes back, which should just say you search for something.
Let's see how it's going.
Let's go to search.
I'm gonna search for HTMX.
Oh, look at that folks.
You search for HTMX.
I'm excited.
I search for geese of, notice that not every keystroke I'm hitting, once I stop, then they come in.
Geese of Canada.
If I do the arrow key, can't really see whether or not it's flickering and changing or it just has no flicker.
But the fact that it's not actually changing the contents, doesn't do anything.
Geese of Canada, question mark.
Okay, so it looks like we've got this really nicely hooked into our page.
Like see how amazing HTMX is?
HTMX is amazing.
Boom, love it, love it, love it.
Create easy, easy to create really dynamic pages.
Final thing for this feature is to actually, well, two things, to run the search and then format them in an interesting way.
Let's run the search next.
|
|
show
|
2:42 |
We saw that we're getting really cool search queries coming in through our text changed HTML magic.
Let's try a search, huh?
Okay, so over here we can say results equals, and we can go to the search service, and then we can just call search, and we give it all the text.
It doesn't matter how complicated it is.
They could say, I want to know about the geese of Canada who flew over the Rocky Mountains, but it was too cold one winter, so they stayed, whatever.
We jam that into the search engine.
It uses the NLP stuff and breaks it into a bunch of keywords and then searches for it.
So all we gotta do here is pass the search text in, and we need to await this 'cause it's doing a whole bunch of async things, and this will be a raw search result for the moment.
This is not where we're gonna end up.
I just wanna show you what is in here.
So we have the search text, and let's just return a couple of things and episodes like that, and then in our search results partial, we'll say there's a unordered list, and then for each, we want a repeat for each podcast, but its title, let's just say podcasts.
The way we do this in Chameleon, we'll say tal colon repeat.
It looks like an error, but don't worry about it.
It's just PyCharm not really being great with it.
So we'll say P for in podcasts, and we'll just put $P.title, and then we're gonna do the same thing for the episodes just to see what...
E episodes and E.
Let's restart it.
See if I've got this all hanging together here.
Refresh, and let's just type htmx.
See if anything talks about that.
Oh my goodness, look how awesome.
You search for htmx and you got podcasts, Talk Python to Me and Python Bytes.
What if we search for iPhone?
Whoa, it looks like Accidental Tech Podcasts and Waveform.
What if we search for Postgres DB?
How insanely cool is that?
MongoDB, there you go.
There's a few of them.
Talked about it a few times.
Oh my goodness, over on Talk Python to Me.
I wonder why.
Well, that's how we can use MongoDB to put those keywords in there and create an index and do this really quick search.
I think it's not even possible to perceive how awesomely quick this is 'cause it has that 250 millisecond delay, but we can just say it's fast, very fast.
|
|
show
|
4:37 |
Well, those results were fun and it shows, oh my goodness, this really is working.
Let's go and make this look a little bit nicer.
Okay, so I have some pre-designed HTML I'm going to put in here that is basically the same thing.
Let me just show you.
It says up at the top, if there's no episodes and you are trying to search for something, well, that means there's just no results.
And we'll make sure that there's at least some results here, which is any of the episodes.
And then we'll go through each podcast.
We send over the podcast separately and we do this cool little design with the image of the podcast, the title and its subtitle.
And then same thing, episodes for each one of those, we put in a little bit of information about them.
And at the very end, we say how long it took for the search to run.
Just so you get a sense of like how fast is this cool little search engine that doesn't even need a separate server.
Okay, and now the server side to make that go is we're not going to do this raw results.
Instead, we're going to say view model equals search results view model.
And what goes in here?
Well, you guessed it, there's a request one can spell and search text.
So we got to do that.
And then we say await VM dot load, turn VM dot to dictionary, right?
This is all the data that the search results that HTML needs.
Let's just real quick look at this.
It's doing the same thing.
Although it's storing the time, it runs the search just like we did stores those two values, creates a little lookup for when you're iterating over the episodes.
And you have only the episode ID, or sorry, the podcast ID, not the podcast itself.
You want the title, for example.
So it creates this little dictionary to make that super fast and easy.
And then it talks about how long it takes.
Let's try it again.
This should be interesting.
I should not even need to refresh the page.
I'm just going to put FastAPI and see what happens.
It's so good.
HTML is so good.
All right, look at that.
There's the podcast, goes a little image.
And there's the subtitle that we've registered with iTunes or from the RSS feed.
And then these are the episodes that talk about FastAPI, which ones talked about post Gris.
There you go.
Look at that, 26 milliseconds to search all that data.
What about Emma?
There's this best of Emma Stone.
Let's go over here and we can actually, oh, it already has the transcript.
Great.
So this will let us do something really interesting here.
We could come over and check this out.
Let's just grab a random section.
Show that we're actually using the transcripts.
The golden rule is extremely applicable to me this day.
I'm going to copy that.
And over here, you'll see Emma Stone.
If we just look down, the original show notes have nothing about this.
Let's go over to the search and we can search for Emma again.
There they are.
But let's search actually for that exact phrase that came up in the speech.
The golden rule is extremely applicable to me this day.
Check that out.
We can put Emma, not that, Emma.
There you go.
That appears in one part of the podcast.
This exact phrase appears in the transcripts.
And there you go, 36 milliseconds.
We went through everything, searched all these keywords and found it.
So good.
Yeah, really, really nice here.
One final thing I don't like when it's empty that says, if I refresh it, that search results here.
That was just while we were playing with it.
So let's just go and take that out.
Now you do this.
If you put, look at that.
But if you have no results, like that, it'll just say no results found.
Perfect.
I would say the search feature is done.
And you can see how FastAPI, HTMX make it awesome.
But the real heart of it, well, and MongoDB.
But the real heart of it is that we have all this data beyond just what shows up in the silly episode pages.
Although we have pretty good show notes.
There's not even here, there's not that much in the show notes here, is there?
I guess it's a little bit of a short one.
But generally, even with that, it's just a few paragraphs.
So all the extra stuff we can bring in really makes the X-Ray podcast app live up to its name, right?
|
|
|
56:03 |
|
show
|
2:00 |
We're taking it to the next level with our audio work.
We're not just asking for transcripts and feeding those into search, which already is awesome.
We're going to go and use the Lemur LLM that understands audio to work with this transcript and this audio file the way you would think of maybe something with ChatGPT.
But in fact, it's quite a bit better because it understands very large documents and it can keep a lot of tokens and have a large context for all the information it works with.
I think you're going to be pretty impressed.
So we're going to use our LLM in this chapter for two things, kind of the same two sides of the same thing.
We're going to go to an episode like this one that we've been working with throughout our examples.
And once we have the transcript, we're going to be able to say, why don't you just go ahead and summarize that for me.
We want the summary in two ways.
We want this TLDR.
This is really valuable because as a consumer of this podcast, you know, you can see it's 55 minutes and six seconds.
That's a lot of time.
Even if I listen at a double speed, that's still a lot of time and mental energy and focus.
Here's one paragraph that I can read and decide.
Is that worth going into?
Is that worth spending the time?
The other part will be the key takeaways.
You could use this before you listen or even after.
You've listened to the episode.
You're like, oh, I know there was a couple of things that gave some advice.
What was it?
Boom, boom, boom.
You can look in there and see what those are.
We're going to generate that with the Lemur LLM from Assembly AI.
And of course, we could take those things, fold them back into how we might use the transcript.
We could go into the transcript and search for those things and try to get the whole context.
And we'll take both the TLDR and the key moments and feed those into our search, making our search even better and even deeper than it was before.
It's going to be really fun.
|
|
show
|
9:01 |
Before we dive into actually using the Lemur LLM to get work done, I want to take a moment and talk to this cute little thing.
Say hi to it.
Its name is Pepper.
Now, Pepper is very smart, but you got to talk to Pepper in just the right way to really get the maximum potential from it.
Now, this is cute and all, but seriously, this idea of prompt engineering, it really matters.
I'm going to show you an example in a minute that will seem like two unrelated worlds for almost exactly the same question based on the identical same inputs.
It's going to blow your mind.
So give me just a moment.
I know you probably have used AI before.
You've talked to it before, but there are some techniques and some tricks and things you can do to really make the LLMs way better.
I'm going to talk about that just for a sec.
Patrick, who were from Assembly AI, did this great video.
It's only 14 minutes long called Prompt Engineering 101.
It comes with a Jupyter Notebook that lays out a bunch of the examples.
While I'm not included in the course, the link is right here on YouTube.
I really encourage you to honestly pause the video after I show you what you're about to see.
Go to one more slide, one more graphic here.
Then go watch this video and then come back and carry on because Patrick has a bunch of great tips and techniques, and some of these techniques come from places like Anthropic.
Here's my example to help you fully rock the idea of prompt engineering and that it actually matters.
So here's an example.
We're going to take a podcast.
Now, before I show you the results of what we're going to get here, this podcast comes from the Accessional Tech Podcast.
It's a Mac, mostly Mac, enthusiast podcast.
It's really long.
It's like two hours or more.
Each episode is pretty fun.
You can check it out if you wish.
But here's the deal.
There was an episode right when the Apple Watch was banned, the week that it was banned, because they basically stole the techniques and broke the patent of another company called Massimo.
So most of this episode, at least an hour, probably more, was dedicated to are patents valuable, what's going to happen to the Apple Watch, should they settle, should patents exist, all that kind of stuff, right?
Right or wrong, agree, disagree, doesn't matter, but that's what they spoke about, okay?
So given that, I went to Lemur and I said, I need you to read the transcript and summarize it for me.
Your responses should be a TL DR summary of around five to eight sentences.
And I gave it the entire transcript.
Here's what I got back.
The man, the man tries to take an alternate route to the beach, but gets his vehicle stuck during a turn.
Despite using recovery gear like Max Trax and shoveling sand, he's unable to free the vehicle himself.
The mayor,"" true story, ""the mayor drives by and recognizes that the man needs assistance, toes the truck out.
On the drive home, the man notices issues with the vehicle's suspension.
Later, he had to wash the wheels, et cetera, et cetera.
And then you go similarly for the key moments.
The man taking an alternate route, he's unable to get unstuck.
The mayor drives by on the drive home.
It takes washing out the wheels and resetting the suspension the next day to fix the problem and so on.
Any mention of the Apple Watch?
No.
This was a story told at the beginning of the episode.
All right, and it was, I'm just thinking back, probably 10 minutes, 15 minutes top out of the two hours.
The entire thing is about this, right?
No, it's not.
Let's ask the same question of the same, literally the exact identical input, but we're gonna use a little higher end bit of prompt engineering.
First, we're gonna go to Lemur, if you wanna think of Pepper in our picture there, and put it in the right mindset, put it in the right state of mind.
So instead of just saying, I'm gonna ask you a question, say, ""Look, you are playing this role and I want to talk to you about it from that perspective.
You are an expert journalist.
I need you to read the transcript and summarize it for me.
Use the style of a tech reporter from Ars Technica.
That's the first part.
The next part is to provide more context about what this thing, this transcript thing is we're gonna need to summarize.
This transcript comes from a podcast entitled Accidental Tech Podcast.
Obviously, this is generated out of the database 'cause we have that as we're clicking the button, right?
And it focuses on, here's the subtitle also out of the database, Three Nerds Discussing Tech, Apple Programming, and Loosely Related Topics.
Matters.
The title of the episode is blank, whatever the title is.
In this case, ""Episode 567, macOS Uranus.
It's not really germane to what we've been talking about so far, but that is the title.
And then finally, ""Your response should be a TL DR summary of around five to eight sentences.
How different will this make the output?
Identical input?
Let's see.
TL DR, the Accidental Tech Podcast crew found themselves diving down an unexpected rabbit hole discussing an ongoing patent dispute between Apple and the Italian electronics firm Anthropic.
I think it got that wrong.
With the firm, it's Massimo, and I think it's American.
But besides that, perfect.
While John and Marco,"" those are two of the three hosts, had initially criticized so-called patent trolls, they walked back that characterization of Anthropic, Massimo.
They explored the merits of both companies' claims around patents and product development.
The hosts analyzed this legal battle, how this legal battle could impact innovation if certain patents were deemed valid or invalid.
Both Apple and Massimo appeared committed to seeing this through the courts, leading Marco to question whether the patent system is properly incentivizing progress.
In the end, the nerdy trio seemed unsure of who might come out on top, but they look forward to a resolution one way or another.
Oh my gosh.
Are these the same things?
How are these the same question and the same inputs and these answers?
One is the man gets stuck, and the other, it knows the guest's name, it knows the topic, it's got a sort of a chronology to this.
The only thing that wasn't perfect was the name, which you can see in the key moments, it actually does get right.
So podcast host John, Marco, and guest, who was really a third host, they got off track in a previous episode discussing the patent dispute between Apple and Italian electronics firm, Massimo.
I think it's also still American, but they clarified they were not directly calling Massimo a patent troll, but complained about abusive patent practices.
You can see down at the bottom, it says, Looking ahead, the host discussed rumors of potential changes to Apple Watch's band attachment to improve design usability and other things about the capability of the phone.
And it's Apple, and it's really quite good.
Both the key moments and the TLDR are good.
And look at the difference.
We kind of talked the LLM up, like, ""Hey, you know what?
You're an expert.
You're an expert in blank, the whatever this podcast is about, you're an expert in that.
And you're a journalist, and I need you to help me.
Okay?
And then we give it more context of what the podcast is about, based on the title and those things.
I mean, I guess we could even give it show notes, although sometimes that would be really overdoing it and might mislead it.
I think this is a good balance.
We just give it a little more background information.
Look at the difference.
Again, the man tries an alternate route and the mayor gets him out.
Well, too bad that vehicles get stuck.
Here's a great summary of a two and a half hour conversation.
These are really different.
So, hopefully you're feeling motivated to take on prompt engineering and treat it a little more formally than let me just chat at a thing that sometimes responds well and sometimes doesn't.
So, go check out that video from Patrick.
It's 14 minutes.
It'll give you some real concrete ideas on how we can do this.
Hopefully you are feeling a little motivated and inspired to go build the second version, the really good version of that, into our X-Ray Podcast app.
|
|
show
|
5:11 |
Let's go ahead and add this AI summarize feature to our app.
Now, we've already done most of the work or worked through almost an identical example previously with our transcripts and those sorts of things.
So we're gonna apply the new ideas, but in a really similar framework.
Let's go and just pick one of these.
Let's suppose we wanna follow Fresh Air.
And we go over and check it out and we're like, oh, what do Americans really think about race?
Okay, this is interesting.
And I would, maybe it's interesting.
I don't mean these show notes.
I don't know how Fresh Air gets away with calling themselves a professional organization and having two sentences, but you know, whatever.
This is not enough.
I need a summary.
What are the key things they talked about?
Is it relevant or no?
So I'm gonna click on create summary.
Wait a minute.
Remember it just has summarize, #summarize, so that I could do a thing.
We could create the transcript, but we're not gonna.
We just wanna get this create summary thing going.
All right, well, we need to add the UI elements here.
Let's jump back into PyCharm.
We're in a new project here, 03 feature summary, and I've got a new virtual environment active, pip installed all the things.
So it's good to go.
Now, those three buttons we were looking at are right here.
Yes, there's four things on the screen, but only three are shown at any given moment, and three are fewer in a minute.
So this is the button that says create the transcript, and it goes to AI start transcribe, podcast ID, episode, episode number.
And it has that with HX, which is HTMX, right?
So this is only shown if there's no transcript.
And if there is a transcript, it just says view it, right?
So this section, those two kind of play the role of either being able to create or navigate to that transcript.
Now we wanna do something similar down here, although even simpler still.
So let's go down here.
And right now we just have this hash summarize, which we saw does zip, nothing.
So we're going to go and add the same type of deal.
We have somebody clicks on it, we wanna run a thing.
And where are we gonna go?
Well, it turns out, let's go look over the AI views here.
The router is AI start action, not transcribe specifically, but what do you wanna do?
Earlier we asked to transcribe this podcast, that episode, but this is a action that is a job action.
Let's navigate over here.
And you can see it's transcribe, summarize, or chat.
Sweet.
So in fact, what we should be able to do is take this, change this from transcribe to summarize, and get rid of that extra stray character down there.
And this should do it.
This should actually come over here and kick off this job.
Remember, what is it gonna do?
It's going to return this job is running, it's gonna check for the job to be completed.
And in the view model, if the action comes in as summarize, it'll say it's summarizing and view summary rather than transcribing and view transcript.
Now it's not going to do this yet.
It's not gonna do it 'cause we haven't written the code to summarize anything, but it will flow through the system theoretically.
Let's see how that works.
We go back here and refresh.
And now what happens if we click create summary?
Let me make this little down here so you can see.
Oh, before let's go ahead and set up the search.
Don't forget to set up the search in the new one.
Remember just over here, run this.
Skipping ahead, try again.
All right, we got it running again.
And let's see.
What we're gonna do is click create summary.
And down here you should see that the job is starting.
This UI should be replaced with hang tight.
We're summarizing it.
Let's see how it goes.
Summarizing, hang tight.
Awesome.
Guess what?
It's done.
No, no, it's not.
It's done.
The summary is done.
All you gotta do is reload it.
And it's not done.
So when it's completed, what we're gonna do is before the show notes, we're gonna show the AI summary, right?
And this will go away.
But since we haven't created it, it's not stored in the database.
When we reloaded it, it looked, said, no, there is no summary here.
Keep it here.
Keep it like this.
So our next action will be to go and actually write the code that uses the assembly AI API or SDK to make the summary happen, set it on the record in the database.
Then when we reload, we'll get the nice cool view of it.
But look how easy that was.
We already wrote this section here to how to start an arbitrary job, any of them that is a valid job action.
And then we already got the running and the checking for complete.
It's actually cool.
I told you we'd be able to leverage a ton of stuff we did before and sure enough, we can.
|
|
show
|
4:11 |
Now that we have the whole UI interaction working perfectly, all that's left is, well, you know, to use AI like magic to summarize and pull out key moments from an hour long of audio.
So let's do that with Assembly AI now, huh?
All right, so here's what happens when we say start job.
And remember, it just started and immediately said, it's done because there was no actual work to do.
If we're gonna navigate over to this create job here, and that's in this background service, and it starts the job, we gotta scroll down a little bit into this timer piece that goes around and says, which one, which action was it?
It's summarized, so we're gonna go over here.
And I've sketched out a couple of things that I want to do in blocks.
Oh, I don't overwhelm you with just doing a whole bunch of stuff, 'cause this will take a few minutes.
So the first question we need to ask here is, do we already have everything that we need?
And how does that work?
Well, maybe we've already, just like the transcripts, summarized this, and we don't want to send it over to Assembly AI again.
Remember, that costs credits or money, and it's also just slow compared to the one millisecond it takes to return it from the database.
So anytime you've already done the work, just don't do the work again if you can.
And the way we're gonna do that, we'll say db transcript.
Just like before in the transcribed section, it's real similar.
We're just gonna go to the database and get it.
So we'll say full transcript for episode, and we're passing in the podcast ID and the episode number, and PyCharm automatically created, auto-completed all the parameters.
Very nice, PyCharm.
But again, this is asynchronous because it talks to the database, so we've got to await it.
Now, unlike before, having the transcript is not enough.
We also have to have the transcript plus the summary, and it's stored on the transcript.
So we can say if db transcript and db transcript.summary, you see there's a summary TLDR and bullets.
It doesn't matter which one we look for.
You can see they're both either a string or none.
Just checking one should be enough.
So if we already have some kind of summary, you know what, we're good to go.
We'll just return db transcript.
The worker doesn't use this, but it might be useful somewhere else.
For example, in case something could use the return value, we'll go ahead and return it.
But most importantly, we just don't do the work.
We'll just tell the job, hey, you know what job, you're done.
Kind of like it's already been doing.
Also, there's another situation where we have no transcript whatsoever.
So if you look at the UI, there was the make a transcript, and then another create the summary button.
So they could create the summary even before the transcript existed.
And we could either say, well, first you have to click the transcript button, wait, and then click the summary button later.
But why don't we just go ahead and make one for them?
So if there's no transcript whatsoever, we can say db transcript equals, and we just scroll up here.
This is the other worker function, worker transcribe episode.
Okay, that was being used automatically, but let's go ahead and just reuse this here.
We'll await worker transcribe episode.
PyCharm, are you going to do it for us?
Not this time.
Oh, maybe it would.
It tried.
So now this is definitely going to work.
We're going to have our database transcript, but it still won't have the summary information, right?
So at least let's go ahead and put something like this.
If not db transcript, we'll do this, it won't print out.
There we go.
If there's no transcript yet, we're going to go ahead and make one, and then we're going to summarize it.
So if you click the create summary button and there's no transcript, sure, it'll take a little bit longer, but what's the alternative?
Make them click two buttons over long periods of time.
So we're just going to go ahead and do it like this.
I think that's a decent start.
So step one, if we have what we need, just give it back.
If not, we're going to need the transcript because the way this works is we're going to send the transcript text over to Lemur and ask it to summarize it, not the audio directly.
|
|
show
|
1:30 |
Now we either said there's no more work to do, we've already done it, or at least we have the transcript created.
We're gonna need to give it a prompt, a detailed prompt as you saw in the prompt engineering section, saying here's the name and here's what it's about.
In order to do that, we're gonna need more than a string that just says the title or a short version of the title and the number of the episode.
We need details.
So we need to actually get the podcast and the episode.
That's pretty straightforward.
So we'll say podcast equals podcastservice.podcastbyid 'cause that's what we have.
And these are all database sort of things, so they're async.
And then we'll get episode by episode by number, episode number.
And we have these two here.
We'll say if not podcast or not episode like this.
So we'll just make sure that we have both of them here.
Oh, and for this one, I was a little quick there.
This takes the podcast ID of course, 'cause there are many podcasts that could have episode 200, right?
All right, there we go.
So we get our podcast and we get our episode.
And if neither of them exist, then that's a problem.
But once we do, we'll have the information we need to create our prompt.
|
|
show
|
7:31 |
Now with the podcast and the episode details, we're able to write our prompt.
So let's go over here and I'll write it in sort of a naive way, then we'll make it a little more useful.
So prompt, remember, we don't want to just say, Please summarize this.
We saw how bad that did.
Remember, the man got stuck and the mayor pulled him out.
No, let's go a little bit more detailed here.
You are an expert journalist.
I need you to read the transcript and summarize it for me.
You know, by the way, just hitting enter here in the middle of the string, PyCharm will wrap it in parentheses so you don't have to do line continuations and carry it on and so on.
Use the style of a tech reporter at Ars Technica.
Great online resource.
This comes from the podcast.
Let's convert that to single quotes real quick so I can put double quotes into our prompt, podcast.title.
And it focuses on the subtitle here, period.
It's really important that all these lines have spaces on the end because this really just becomes one single string.
This is like having string concatenation in Python.
So if you don't have the spaces, things are going to get weird.
All right, the title is this.
And finally, so all of this is setting the stage, right?
You're an expert.
I need you to read it and summarize it.
You use the style of typical reporters at Ars Technica.
Here's the podcast information.
Here's the subtitle, the episode.
I realize something happened with its F string.
Now comes the actual details of how we want the command.
So we'll say, ""Your response.
Your response should be a TLR, TLDR summary of around five to eight sentences.
Excellent.
So this is the prompt that we want to send over to Lemur.
There are two little details here.
We want to have another response where it's, you know, ""Key moments.
Key moments.
And the only difference between this one and the key moments is this final string here.
So let's get rid of this here.
And we're going to take this and make it its own little variable.
So we'll say, ""Extract a variable.
Or we'll just cut it out and put it here because somehow that's hard.
Let's call this ""Prompt base"" because this is going to be reused here.
So we'll say, ""Prompt base plus this bit of text.
Like that.
It says, ""You don't need an F string.
And you know what?
They are right.
They are in fact right.
So this is the prompt because later we'll call this, let's call this ""Summary prompt.
And then we'll have the, actually, how about ""TLDR prompt"" and the ""Moments prompt.
Your response should be,"" for the moments we'll say, Your response should be in the form of 10 bullet points.
So we're going to pass these two strings on.
The other little weird edge case here is that, I've said this before, but if you go and actually look at how we're parsing all these RSS feeds, RSS feeds are a standard.
There are places you can go and say, Does this RSS feed for a podcast match the expected standard or no?
And yet, somehow, even major places like NPR, SHIP, invalid or missing or somehow messed up RSS feeds.
So, for example, sometimes there's a subtitle, sometimes there's not.
Sometimes there's an image with a URL.
Other times there's some complex XML representation of that.
Anyway, what is all that about?
That means this may not have a subtitle.
And if it doesn't have a subtitle, the prompt's going to look like, And it focuses on double quote.
That's going to mess things up.
That's not good.
So let's change this in two ways.
Let's make sure that this can be-- put this space over here, even though it's going to make it look a little weird because we want this to end right there.
I guess we can put it in what we're going to generate.
But what we need to do is make this some sort of expression right here.
So let's take this out and say, ""Subtitle text.
It equals this, but it only equals this if podcast.subtitle.
Else, what is it going to be?
A period.
It's going to put-- again, that's where we want to make sure this goes like this.
So put the period right there on the end, or put a space in that word, and then-- we need a space right there and right there.
We need a space right there and right there.
Again, make sure these don't get mushed together.
And the mistake or the warning here is saying you can't just put two variables together, so we've got to use the plus symbol on this particular one.
All right.
That's a lot of work, right, to form up this thing.
But we're ready to send this off to Lemur.
Everything is just fine.
Let's just do, really quick, print TLDR moment prompt and the moments prompt, just as a way of making sure things work.
All right.
Let's go check it out.
Let's go over here and discover.
Let's go pick on fresh air.
Okay.
We can do this best of Emma Stone and the birthplace of-- whatever that is, psychedelic something.
So remember the way this is going to work.
Let's clear--there we go.
We're going to try to get the summary of this.
And because the transcript doesn't exist, we should go ahead and create it.
We just won't create the summary, so let's test that behavior.
So look, it's kicking it off.
It says no--awesome--no transcript yet.
Let's make one for fresh air.
And then you can see we're transcribing this.
It's running, it's running.
Oh, and we're searching while it's happening as well.
So hang tight.
This is a pretty short episode.
It should come back quickly.
Hey, hey, look at that.
Summary is ready.
That's not technically true because, remember, we haven't actually created the summary still, but we should see that down here we have a transcript.
All right.
So it looks like everything is hanging together so far, and more importantly, the reason we did that is we wanted to see what the prompt looked like.
No searching right now, please.
All right.
You are an expert journalist.
I need you to read the transcript and summarize it for me.
Use the style of a tech reporter.
This comes from a podcast entitled ""Fresh Air.
The title of the episode is ""Best of Emma Stone and the Birth of Psychedelic Science.
Notice right here this period.
That comes--well, the period in space.
That's because ""Fresh Air"" doesn't have a subtitle or any form of subdescription for their podcast, at least not the way we parsed it.
Your response should be a TLDR summary of around 5 to 8 sentences.
And, again, same thing down to here, and now it should be in the form of 10 bullet points.
Excellent.
It looks like our prompt is ready to roll.
|
|
show
|
2:56 |
Next up, step four.
And in step four, what we're gonna do is we're gonna create the transcript text.
You may be thinking, Michael, are you confused?
We already have the DB transcript, right?
We have this DB transcript here.
And so you should be able to just use the transcript text.
Well, let's go over and go over to our data section to our transcript and see what's in here.
We've got our created date, updated date, episode, podcast, summary.
This is what we're trying to fill out in this section.
Here's a bunch of details.
And then words, I'm gonna say list of these.
This is what we're storing in the database.
We have a list of nested objects, which has text, start time, and confidence.
What we need is this is the sentence, this is word one, two, three, four, just as pure text to send to the AI, right?
So that's what we're gonna do right here.
And it's really simple.
And this is gonna be a string, right?
And a really nice way that we can do this is we can go to a string and say .join.
And if you're not familiar with this, let's go look real quick here.
That's a little short, isn't it?
All right, let's say I have this and I wanna turn it into a single string.
Or what I can do is I can just go and say, let's say if we wanna put dashes in between, I could say dash.join words.
And what comes out is the new thing created with that.
So if we just put space here, it'll turn that back into a sentence, right?
So that's what we're gonna do.
But as part of that step, we need to actually turn our rich word object that has confidence and start time into just plain text.
So we can use a generator comprehension for that.
You may have heard me riff on this before.
I think generators and list comprehensions and set comprehensions and so on are awesome, but they're written in the wrong order because the tools cannot help you with them.
So I would say something like something from Word for word in dbtranscript.words.
What goes here?
I don't know, but if we write that first and then come back and say dot, then we get autocomplete.
Whereas opposed, if I said it like this, w.
It's like, well, good luck with that.
Although PyCharm insanely was going to help us there.
So we wanna say text, word.text for w.
So this turns this set of rich objects into a set of words.
And then we join them together in the way we just discussed to put spaces between them and off it goes.
All right, so that's gonna be our transcript text.
That's the final thing that we've needed in order to send this off to Lemur.
We need the prompt, we need the transcript text, and a couple of other choices we make as we ask it to do the summarizing.
|
|
show
|
9:09 |
All right, that was a lot of buildup to get everything ready to go, but we're now finally ready to use the magic of AI.
Up at the top, I've added import assemblyai.lemur, and let's also get lemur task response.
We're gonna need those two things here in a second.
So down here, first we're gonna create a client.
So I'll say lemur client, let's call it that to be super explicit.
Say assemblyai.lemur.lemur class, and now we just call a function on it.
It will say response equals lemur client.task, and this is gonna be a lemur task response, and just to make it really clear what this is so that we get autocomplete when we click on it, right?
See, we get a response text and a response ID, couple of extra things that come back from it, all right?
But not a whole lot.
Basically, we get the response as text.
All right, now what goes in here?
I don't need that anymore.
What goes in here?
The prompt, and remember, we have two prompts.
Let's do the tldr prompt first.
I'll say the final model.
Now notice this is a lemur model.
And notice this is a string enumeration as we've seen before.
So this is a, we have a basic model.
We also have the cloud.
We have a Mistral 7b and default.
The one that I was actually using for that really cool example I showed you was basic, so let's just keep going with that.
You can play with this.
You can say the model output size is 2000, or the max output size is 2000 for a limit here.
If you work with LLMs, you may be familiar with temperature.
Temperature is basically an expression of how creative do you want this to be based on how much do you wanna just try to go straight down the narrow path of what you've asked it to do.
I'm gonna give it a little bit of leeway, but not too much.
So we'll say 0.25.
The final thing we want is input text is gonna be this transcript that we've created.
So notice right here, these are three additional ways that you can play with what lemur is doing to see what kind of results you can get.
You can try more creativity.
Is it making up stuff?
Maybe less creativity.
Do you wanna limit how much it can return to make it more concentrated?
I think we're already doing that pretty well with the last bit of our prompt here, but nonetheless, you can change this as well.
And then also you can play with the model.
There's at least three options there.
Pretty cool.
This is what we're gonna do.
Now, let's go ahead and run this and see what the response is and when it comes back, what we're gonna do is I'm just gonna go to the database transcript.
And on here, we have a TLDR summary, and let's just say the response.response goes on there.
We're not quite finished.
There's one more thing we're gonna have to do to clean up the response, but it'll be obvious in a minute.
And let's just redo this again with the different prompt and the bullets and the moments prompt.
Okay, tighten that up a little just so it doesn't show us errors.
And then when this is done, we've done both of these, we'll just await saving that back to the database.
Right?
Active record style there.
Couple of comments.
Let's first put some print statements here.
Of course, this would be logging in a real app, but trying to keep it somewhat simple.
So we'll say starting, say summarizing with Lemur TLDR mode, and then let's go ahead and also say this here.
Key moments.
Just so when it's running, we can see what is happening here.
And we also have our timings.
We'll see how long it took.
It says transcript, but we'll put some right here.
All right, now it's time for the big moment.
Let's go and run it.
Let's go and see if we can learn more about this Emma Stone episode.
Sure enough, there it is.
Refresh just to see everything's good.
So we already have the transcript, so that should be set, but we don't have the summary.
And let's click that and see what we get.
Starting.
Summarizing Lemur TLDR mode.
Key moments mode.
This is real time.
I'm not speeding anything up.
I'll just let it go at real time so you can see how long this takes.
23 seconds to do both of those things, which is excellent.
And you see this is almost an hour's worth of text.
So that's pretty excellent.
Now, if we refresh, the website actually already knows how to render this.
As you've seen it a few places when it was already in the database.
Check out what we got.
First thing to notice, here's a five sentence summary of some number as in the form of it.
We don't wanna show that to the user.
So let's just go ahead and refresh.
As in the form of it, we don't wanna show that to the user.
That's some sort of leakage.
The same thing.
Here's a bullet point.
So let's go from here and we'll deal with that first sentence of both sections later.
Emma Stone discusses her Oscar nominated role in the film, Poor Things, where she plays a woman brought back to life by a mad scientist with an infant's brain.
She talks about her character's journey, developing adult cognition and sexuality, et cetera, et cetera.
Let's see about here.
Nominated for Poor Things.
In her film, her character, Bella, struggles to understand these things.
Pretty interesting.
Stone and the host discuss some of the provocative representations of female sexuality and independence.
Listeners get a fascinating look at Stone's breakthrough role in the film and her ongoing journey with mental health issues.
Excellent, right?
Look at this.
This is really cool.
Now, how much better is that than Emma Stone is nominated for Poor Things?
She spoke to Terry Gross about her relationship with anxiety and somebody reviews the thing we probably don't care about.
Also, this person talks about his book.
I really like how this came out.
Now, this bit and this bit, we're gonna want to come back and deal with, but yeah, this works really, really well.
It took about 11 seconds because it was 23 for both and they're about equi-time.
So let's look at one real quick thing before we wrap up this part.
You may have noticed this comment.
By the way, lemur.task is blocking.
Hmm, that doesn't sound good.
That 11 seconds wasn't that long, but you know what?
FastAPI could not do any more.
Like, if we go back here, you'll notice something in the search engine is kind of cluttering up our history, but notice this.
This is 23 seconds and this polling, is it done yet, is it done yet, is it done yet?
That happens every two or three seconds.
Remember on the transcript, it goes, is it done, is it done, is it done?
Here, no, it couldn't do that.
Why?
Because this blocking call is blocking up the entire asyncio event loop.
Mm-hmm, that's not our deal.
Our website has lost a tremendous amount of scalability when it can't process any other requests for basically 11 seconds at a time.
So we should call the async version, yes?
And that's what we did with the transcript.
And we used our run future at the bottom, just right down there, and we awaited that.
That was cool.
The SDK does not have anything other than that synchronous version, okay?
There will be, I've been working with the Assembly AI folks and said, hey, you know, you really should have an async version because this is important.
And they're gonna add a task async, most likely, not 100% guarantee, but quite, quite likely there will be a task async, in which case, when there is a task async, just like this, you can create, get the future back, and you can await run future, and that'll keep FastAPI and anything doing asyncio cruising right along, 'cause it will be running somewhere else outside of this, and you're just awaiting its completion.
I'll just put, I can't leave it this way 'cause it's not working.
I'll come back and fix this code.
I will change this code if they add an async version.
Here we go.
So I'll put a little example here.
Please use future equals lemur-client.task.async, response equals await run future of that name, and however you like, but there should be some pattern like this that you can use eventually.
There's other things we could do to run this in a non-blocking way, but hopefully the folks over at Assembly AI are able to get that into their SDK and make things work just a little bit better for us.
Here we go though.
Besides that little thing I wanna put on your radar to make sure you're aware of what's going on there, 'cause it's not entirely obvious.
That's it.
Come up with our prompt.
We send it over here.
Perfect.
|
|
show
|
12:11 |
Well, we have our data being generated by the Lemur LLM, and it's being added here just like we hoped in our summary.
But we did see that there's this confirmation statement at the beginning of both AI responses.
That's not gonna come across great for our product.
I mean, people know it's an AI summary, that's kind of the point, but you don't wanna leak the prompt engineering back.
So what to do about this?
Well, what we're gonna do is we're going to go and find a way to replace this.
And because this could vary, right?
It could say, here's a seven sentence.
I said the response could be five to eight sentences.
Maybe it'll say it's eight, who knows?
And this could vary as the LLM changes over time.
So I wanna use a regular expression.
Regular expressions can be super tricky.
If you're not familiar with them, honestly, you can go to chat.gtp or one of the local models like LM Studio and just say, I have this text, I need to match it with a regular expression.
They're actually really good at that.
So what we're gonna do is we're gonna just remove that section and we're gonna remove that as well there.
All right, here we are in that code that we just wrote.
That's not the right one.
Here we are, here we are in the code we just wrote, worker summarize episode.
Remember down here at the end, we have this section and maybe I'll make it its own little separate thing.
What number are we on just to be consistent?
Six, step six.
So I put these examples here like this.
Now I'll just create the regular expression here and then maybe move it up so we can compile it and just use it once.
So we're gonna import RE, which I'll put at the top in a second.
And we'll say RE compile.
And here's where things get interesting.
So what I'm gonna do is I'm gonna say, here's a some sort of sentence.
(mouse clicking) Like this.
And we'll have onto the end and down to just say colon, matching the first colon.
So what we can say is we can say that the text starts with that.
Says here's a, now this number could change.
We can say it's any digit.
And the way that you do that is you say bracket zero dash nine so any characters between zero and nine.
And there could be potentially more of them if it's a 10, which we're gonna see in just a second below.
And then we wanna space sentence and then we can have arbitrary text.
So the way we say that is dot would be one, plus would be one or more, onward.
All right, so we'll call this reg X equals that.
And then let's just do a replacement here like that.
We'll say, okay, this thing is actually reg X.
Look at that.
PyCharm is saying, you know what?
You probably wanna substitute that, don't you?
Reg X dot sub.
And in fact, that's very, very close.
So I'm gonna let it write that.
And what we want here is the summary bullets like this.
Incredible actually.
So the API here is a little bit funky.
We're gonna take this pattern, and that's regular expression.
We want to substitute, and we wanna use this new thing into that old thing.
I kind of feel like this is backwards, but anyway, that's fine.
Summary, TLDR, and summary, TLDR actually.
We'll do that.
And let's do another here.
This is going to be, here's a 10 bullet point.
Bullet point, summary, right?
That should work.
Of course, we could make this more general, but I kind of want it to be really restrictive and not possibly hit the wrong thing.
Let's see how this works, and then I'll move that code a little bit around.
I wanted to keep it kind of local while we were looking at it.
And in fact, so we don't have to keep running over and over.
Let's do one more thing with this.
Let's move this.
Let's move this code over into the view model that shows that.
So that's the episode view.
So we're gonna go over here to the view models for the podcast episode.
And let's just say if self.ai summary, this way we can just do it on the response here.
And the way this thing works, it stores both of these.
So let's see, not, tldr, and this one will be bullets.
All right, so that'll just do it on the already generated text.
Let's see how we did.
Oh, it looks like that didn't take.
Now, I think it's just that I'm being overly restrictive and saying it has to start with this.
Let's try that real quick.
And sure enough, there we go.
Look, that chunk is gone.
That restatement is gone.
I imagine it probably is.
There's like some kind of spacer or something here.
We could try .strip.
Just, I wanted to kind of see if that regex would have been just fine.
Let's see.
Yeah, sure.
See, yeah, sure enough.
There was like some kind of space there at the beginning and I suppose it doesn't really hurt to do a strip on these as well.
Okay, so it looks like that regular expression is working and I'm gonna take this away.
Just wanted to make sure that, you know, we don't have to go wait a minute at a time to do this test.
All right.
So we'll go back to the generation section here and this, I'll just do strip on each of these.
In fact, let's do it where we originally set it.
It's all about the data cleaning.
Okay.
Yeah, these look good.
Let's go ahead and move this up here and we'll call this regex_tldr.
And this will be regex_moments.
And let's put this up at the top.
Here we go.
So why am I putting this up at the top here?
Well, because when you compile a regular expression, this takes a little bit of memory, takes a little bit of time.
And honestly, we don't need to do this each time.
The regular expression doesn't change.
So, you know, tldr, not misspelled.
Everything is fine here.
So let's just go and take it through its paces completely one time here.
Let's go summarize an episode.
How about Inside Azure Data Centers with Mark Russinovich?
We'll go and first create the transcript when we click this, and then we'll create the summary.
And hopefully everything looks right without those leaky LLM restatements there.
So hang tight.
I'll zoom this ahead.
When it's done, we'll have a look.
Well, here it is.
It's all finished.
Let's go and see how it came out.
Perfect.
Well, it's almost perfect.
It didn't get the guest, the host right now, did it?
I mean, maybe Peter Bright's awesome, and I should be flattered that it put me and replaced me with Peter Bright, but I have no idea.
So anyway, let's see what we got here.
In this episode of ""Talk Python to Me, the host discusses his interview with Microsoft Azure CTO Mark Russinovich from inside an Azure data center.
Now this is not good.
This is not, that is not great, but let's look at the moments here.
Microsoft Azure CTO Mark Russinovich joins the podcast to give a rare look inside Azure's massive global data center operations.
Yep, absolutely, that's good.
Azure originated in 2006 as a small incubating team within Microsoft and launched publicly in 2008 before its full commercial launch in 2010 when Russinovich joined.
Also, I believe that's correct.
In its early years, Azure ran on only a few thousand servers across two regions, but has since exploded with millions of servers, now spanning 60 worldwide region data centers.
Yep.
A new AI chip called Azure Cobalt is optimized to accelerate machine learning, now leverages Python and AI Assistant Co-Pilot for his own research on the massive Azure platform.
The key moments is perfect.
TLDR, I'm not so sure about this.
So let's try another one.
We'll see what reporter I get substituted as.
There, let's look at a few more here.
How about, how about celebrating JupyterLab?
Let's do that.
Hey, it's done.
Let's see how we did.
First of all, will the restatements be gone?
Yes, they are.
Look at that.
So what is the summary?
On a recent Talk Python to Me podcast, developers of the Jupyter Project, now there were three or four folks who were on the Jupyter Project, and we came and we talked about JupyterLab 4 being released as well as the Python team.
So we're gonna talk about that.
So we're gonna talk about the Python team and the Jupyter Project, and we came and we talked about JupyterLab 4 being released as well.
Jupyter 7, I believe, and a couple of other things.
So let's see.
JupyterLab 4.1 introduces features like code history restoration and markdown header collapsing.
Upcoming iterations with AI tools aimed to autocomplete code and provide hints to the Python coders.
All right, that looks pretty good.
What about the moments?
Jeremy, Sylvan, and Frederick, I believe that is exactly right, and the versions are right, perfect.
Darian worked on Jupyter for this time.
At JupyterCon, Darian and Martha introduced JupyterLab 4.
That's true, and very, very neat.
And we talked a lot about their presentation there, and then some of the things they discussed.
See, real-time collaboration is being tackled.
The tools like Google Docs show the promise for co-editing code.
One of the examples we discussed in the episode was that JupyterLab is really just a bunch of plugins or extensions made on top of Jupyter to make it what it is, and said, Hey, you know, another example of something you could do with that is this thing called JupyterCAD for 3D modeling.
Yeah, that's true.
JupyterLite and desktop apps.
Yeah, we talked about how you could use JupyterLite and not have to have Python or install anything or run a server.
Yeah, I think this is solid right here.
I feel like there might be something that was a chunk of the conversation that didn't get captured, but we're trying to have it limit that down, right?
So yeah, I'm happy with this.
It's better than what I put, right?
Here you can see the summary, the show notes that I gave it, and the links from the show.
So yeah, we got a lot more information about this right here.
Excellent.
But the most important part, not just is this right, we already saw this before.
What we saw that is the LLM restatement of like, Hey, I understood you.
Now here's your answer.
That part is gone because we were able to do our regular expressions right here.
Here is a some number of sentence, or here are, here's a some number, bullet points, summary, et cetera.
All right, well, there it is.
I think we have this feature all done.
|
|
show
|
2:23 |
There were a lot of moving parts in this whole example.
So let's just simplify it back down to the essence.
Remember we talked about prompt engineering and how you can get dramatically different results if you set the stage, provide a little extra, just a little extra info like what is this podcast about?
And you kind of sweet talk the LLM.
You're an expert, you're an expert journalist.
I wanna talk to you.
Oh, I'm an expert, excellent.
Let's answer it like that.
All right, you're an expert journalist.
I need you to read the transcript and summarize it.
So orange here is the action that we're asking it to do and the style that we're asking it to use.
And then we say this transcript comes from this podcast with a title 'cause we already know that for sure out of the database, that could probably help.
We can give it the subtitle text, which was, and the podcast covers things like, and we give it the subtitle there.
Sometimes it doesn't exist.
So we got to kind of put that in a variable.
And then the title of the episode is whatever it is.
And then we give it how the response should be.
Your response should be a TL DR summary of around five to eight sentences long.
The way that I was seeing the results of always five sentences, maybe you just wanna pick how many sentences you want it to be and just say that.
Somewhere in the middle is a six sentences, I don't know.
All right, we do this and then we need to somehow send that and the other data off to Assembly AI.
So we say lemur.task, give it this context, which we just saw.
And we say the model to use is basic, the max output size is 2000, the variability, the creativity is 25% creativity, I suppose there.
And then we give it the text, which is the full transcript text.
Remember we had to generate that, recompute that basically out of the database as an expression, super easy though.
And then we just get the response back.
Final thing is remember task is synchronous and blocking.
Look for task async if that shows up, and prefer to use that and then use the run future that we wrote together.
All right, here it is.
Now we've got our summary text and off you go.
Do be aware for that LLM restatement text to appear in there and we saw that we can use regular expressions as tricky as they can be to really nicely go and find that and remove that chunk from the summary.
|
|
|
42:50 |
|
show
|
1:45 |
This is a really exciting final chapter where we're gonna build our last feature, the ability to use an LLM to chat with the episode.
Think about that for a minute.
There's been some expert or maybe a couple of experts having a really deep and fantastic conversation, often about a single topic.
And maybe the host asked many questions, but there's other things you wanna know or there's aspects of that that you would like to know.
Now, of course, there are limits to how much the LLM can infer from just reading the conversation.
But that said, you'll be surprised how effective it is to give a little extra information through prompt engineering to the LLM, and then the transcript and all the other stuff.
And then we'll ask a couple of questions.
It's really cool.
So final chapter, final feature we're gonna add, and I think you're gonna be impressed.
We're gonna talk to our friend Pepper and say, Pepper, tell me about...
Okay, so maybe it's not Pepper.
It's actually Lemur.
We've been using Lemur for a while.
But shouldn't Lemur get a mascot?
I think Lemur should get a mascot.
What about this character?
Could that be Lemur?
Maybe that thing could be Lemur?
I don't know.
Just being silly.
But we're gonna talk to Lemur.
I'm gonna ask open-ended questions rather than just, ""Please summarize, or, ""Please put into bullet points this thing.
You'll be able to ask whatever you want.
And sometimes it'll give you great answers, and other times it will say, ""I don't know.
There's not enough information here.
I can't answer that.
But let's go.
It's gonna be fun.
Let's go, it's gonna be fun.
|
|
show
|
5:04 |
Here we are back on one of the podcast episodes, and we have Create Transcript, Create Summary, both of which are well known to you at this point.
We've implemented those features, we've used them a lot, but look at this, Enable AI Q&A, Question and Answer.
So this is where we're focusing so that we can create some new part of this web app where we talk to the LLM.
It's not super different, actually, than this Create Summary one, or maybe even the Create Transcript.
So our first task here is going to be to go in and make this button dynamic, whether or not it actually is enabled for AI or whether it is not.
Right now, it doesn't matter.
It just always looks like this 'cause it's just static HTML, okay?
And what does it need to be enabled for AI?
Well, honestly, it just needs to have a transcript, but the user don't, they don't care about like the implementation details.
They don't necessarily wanna go and say, Well, first I know that if I wanna talk to it, first I need to create a transcript.
And is it better if I make a summary or not?
I don't know.
And then we just want them to push this button if there's no transcript, and then we'll take them over to chat, okay?
So that's what we're gonna do as the very first thing here.
And that's in this episode HTML page right here.
So if we look at this, here's the create transcript one.
There's two parts, and this is it.
The first one says we have this indigo button, and it says we're gonna kick it off, and we're only gonna show this enable or create the transcript feature if there's not already a transcript that we're linking to.
On the other hand, we have this just simple straight button that says view transcript if, this is how condition right here, if we already have it done.
So same basic thing down here.
I'll copy this so it looks kind of the same.
So down here, let's put this BR at the end of either of these.
Well, first let's do this kicking off of this chat.
So it's gonna be very, very similar to transcription, isn't it?
So down here, we're gonna put, get instead of start transcribe this podcast, this episode.
Remember, this is actually a variable we're passing.
So we'll just say chat.
That's the third action.
Okay, and this will, that should do it right there.
And then the other one is we want to, when this is actually done, and when there actually is a chat to go to, we don't wanna go to the transcript URL.
Let's just go to, we'll make it look the same as viewing the transcript, which is, I'll just bar it over here so it looks exactly the same.
So what do we have there?
We had podcast transcript.
So here let's have podcast/chat, and the rest goes like that.
And view transcript, no, we'll just say chat with episode.
Now, look back over here in the services.
You see the background service, I added this ability to say worker enable chat.
And if we go to there, it just says actually do the chat.
So very much like the transcript, we should be able to click this button.
It should go through the same process and then just say it's done immediately.
All right, so let's go and run that and see what happens.
Back over here, refresh, nothing happened.
So right now, should be able to do this.
It should do their HTMX kickoff, preparing chat.
Hang tight.
Awesome, it's done.
Why did it say preparing chat?
Why did it already know that?
Let's see over here in the view models, we have our AI start job down here.
Right, if the job action is transcribing, it does that.
If it's summarizing, it does that.
If it's preparing, it does that.
And then in the check, similarly, the response that we said is start chatting and it's podcast/chat, just like we did.
So over here, it says start chatting.
Now, this link will actually take us somewhere.
I'm not gonna have you write that HTML.
It's not really worth it.
However, we don't actually have the work done 'cause if I refresh it, it doesn't stick, right?
So we didn't actually do any work.
As you saw, it says actually prepare the chat for an episode at Assembly AI.
So we've gotta do that next, but we've got our UI structure in place.
This sort of workflow here of kick a thing off with HTMX, pull it, and then express like that's done or reload to do it or go view it, whatever.
We've been doing that this whole course.
This is the third feature we added.
So no mystery there, but I just didn't wanna drop that in with you without you being familiar with it.
So there we go.
That's step one to getting our AI chat kicked off.
|
|
show
|
3:33 |
Well, there's a little bit of a missing detail here in our program.
And it says, ""To do, actually prepare the chat to be the episode to be chatted with.
Now, the last times we, two times we did this, that's a lot of code we did there.
And up here for the transcript one as well, like both of those examples were a lot.
So let me copy that real quick.
Come back down here.
Like I said at the opening, what do we actually need?
We just need the transcript because we're going to come up with a cool prompt, share the transcript, and then use that text to actually communicate and have a conversation around the audio, not anything else in particular.
All right, so watch this.
This should be actually pretty straightforward.
All we need to do is if there's no transcript, we need to make one.
Now, we could check.
We could say, ""Go to the database.
Give me the transcript.
If the transcript exists, then we don't need to do it otherwise.
But I'm just going to say this, ""Return, await, worker transcribe episode, and what goes in there, those things.
That looks pretty easy, right?
Let's see what this does.
It comes in.
Its first thing is to say, ""If the transcript already exists, don't do the work,"" right?
So we would theoretically put that first.
I think the one thing I might want here, just so we kind of get a little sense of what part of the app it's flowing through, is some logging.
So logging equals print agent for us right now.
And we'll just say, ""Preparing episode for AI chat on this one, basically a little detail about what the heck that is.
You might have thought that was going to be a lot of work.
Oh, no, it's not too much work.
Let's go and actually, though, see that this is doing its thing because previously we got, ""Hey, it looks like it works, and then I refreshed it and it looked like, ""Oh, it went back.
So notice there is no transcript on this one.
And I'll click ""Enable,"" ""Preparing chat,"" hang tight.
Let's look over here.
All right, so ""Preparing episode for AI chat, well, it really means we're transcribing it and it's still running.
Off it goes.
So in a second, the transcript will be done, hence it will be ready for AI chat.
Hey, look at that.
It says I can start chatting, but let's just--let's not get too ahead of ourselves here.
Let's refresh.
Hey, almost.
We're almost there.
Now we have our chat with episode.
For some reason I messed that up and it's still here, but most importantly, know that the view transcript is here.
Okay, let's go and fix that one little dangling piece.
This one previously had no condition on it, and I didn't bother to copy that over, did I?
So this is going to be not that.
All right, so if there's no transcript URL, let's just refresh.
Perfect.
So this one has a transcript, so we can chat with it.
And let's just click around just to make sure that, you know, that's true.
Here's another one.
This one also already has an AI summary, but more importantly, we can chat with it.
Awesome.
That was easy.
I mean, besides talking about it and catching that little omission there, two lines of code.
Yay for code reuse.
|
|
show
|
6:52 |
So we can prepare for chat, but let's actually start chatting, right?
The thing we wanna do is create some kind of conversation.
And if we look over real quick here in the database section, a lurking here has been this chat Q&A, and it will store things like when was the chat created?
What prompt did we give it?
That's ideally invisible to the user.
What question did they ask?
What answer did Lemur give back?
What is the email address of the person who did that?
And what podcast did we ask it of?
There are a couple of reasons we might wanna store the chat in the database.
One for the user, more importantly, one for us.
They might wanna see their history.
They might wanna see what have I asked before and be able to quickly navigate back.
We are not building that part of our UI, but you're free to extend it and keep running with that idea for sure.
The part for us though is this takes time.
It makes our website slow and it costs us money.
If somebody asked the same question again, or potentially even if some other user, if this was a real high traffic site and there were really highly popular podcasts, maybe they asked the same question as another person of the same show.
So what we're gonna do is store this so that if the same questions asked again, we don't recompute it.
We just say same question, same answer, same content.
It's not always true.
Maybe we wanna expire that at some point because maybe the LLM has upgraded and it's better, but we're gonna assume that it's not significantly changing over time.
So that'll be great.
Let's go over here and we're gonna have an async def as we do called new chat.
Now this is not directly used by the rest of our app, but let's go ahead and put a reasonable bit of code because there's a few steps to this.
All right, we're gonna say we want the podcast is that, the episode number is that.
What else do we want?
We saw that we needed a prompt, which is a string.
We need a question, which is a string.
And we need an email, which is a string.
And we'll enforce that that has to exist.
Remember, you can log into the website.
If this is being associated with a user and potentially because we don't want just arbitrary people to come along and just start asking questions, you know, they have to have an account.
Ideally, maybe they have a certain number of questions they can ask, so on, or a certain plan.
We wanna associate this like in theory with this email address, okay?
So what are we gonna do?
Let's start by just putting this into the database and then I'll work on the reuse.
So we'll say chat equals chat Q&A.
That's the database thing that we just saw.
And we can just start setting stuff.
So podcast ID equals podcast ID.
Here we go.
Let's look over here one more time.
Remember, Pydantic is very particular.
If it says it's a solid string, it has to be passed in.
So create a date, we'll just get set by the default factory.
And now, that's great.
This we're passing in.
This we don't have yet.
This is the question we're starting.
We're gonna ask Gleamer, then we'll have an answer.
And these three things, perfect.
It will be stored, by the way, in the chat history database or in the chat history collection within the database.
Okay, so there's this.
And then we want to save it to the database.
So we'll await chat save, return chat.
And then over here, let's give it a little more room.
Chat Q&A.
This looks great, right?
Well, it's going to create a new chat for every conversation.
Even if the same person is asking the same question.
So let's do this.
We'll say existing chat equals, and let's do a query.
It's trying to help us, but we don't exactly want that.
We'll say await chat QA dot find one.
And we'll say, just start passing in parameters like this, like the podcast ID must match and the episode number must match.
We're not going to say that the email does, however, because maybe someone else has asked this question and that's fine.
So first of all, let's see, is this actually their question?
So we'll say existing chat is not none and existing chat dot email, close email, equals email.
Then you know what?
They're asking the same question again.
We're out.
If it's not, let's create a new one, but we can also ask if, I think we could just, yeah, we'll do it like this.
We'll say if existing chat and existing chat dot answer, and let's go ahead and hit the answer is set to be equal to this, okay?
And we'll just save it and say, you know what?
We've migrated this question over to you as well.
Let's be a little more careful here.
Let's call this first one.
Let's call this their chat for a second.
And let me put one more parameter like this.
So why did I change that?
Well, we could end up with lots of the same question and answer, you know, theoretically, and I wasn't doing enough testing here.
So let's go ahead and just say, look, if it's exactly there as before, that's great.
And down here, we'll say existing chat like this.
So we know that it's not their email like that.
All right.
That way we don't get duplicates for certain people for not checking super careful.
All right, that should do it.
It's two database queries, but you know what?
MongoDB is blazing fast.
It doesn't matter.
All right, so that maybe seemed like more work than you might've guessed, like put this thing in the database, but remember, we're really trying to optimize the speed of our website and save this in the database for that user without duplication and potentially reuse answers across users.
And so that's what this is hopefully accomplishing for us.
We haven't got a chance to test it yet 'cause we need one more layer in our app where basically the request can come in to the view and this will come down into the section where we run it, right?
Of course, we also have to ask Lemur to get the answer somewhere along the way.
|
|
show
|
5:59 |
Next up, let's add some URL.
Remember, over in our template on the podcast, we said, ""Hey, guess what?
When this exists, you can go to this URL right there and you should be able to start chatting, podcast/chat/detailsaboutit.
But if we go to the podcast views, no, there's not actually such a thing.
So our goal here is going to be to add that functionality.
So there actually is an endpoint here.
Some of this is done for us, but not all of it.
Now, I'm gonna place it off this transcript, view another transcript, 'cause it turns out to be real, real similar.
Let's save us some typing.
So this is gonna be podcast/chat, like that.
And I'll call this with chat with episode.
And notice over here, because we're not doing everything from scratch, it would take, we've already spent a lot of time working on these features, right?
It would take way longer to do all the web design and stuff.
And it's not germane really to what we're doing.
So if we look over here, there's a chatwithepisode.html.
So chat dash, chat with episode, like that.
So this will actually go ahead and pull things up.
And just like I've done the HTML, I've also created a view model that knows how to exchange the data with it.
And that is going to be a chat, episode chat view model, like that.
And we can look at it real quick.
Basically, it stores the podcast and the episode number, and then it pulls those in for us, stores the question, and it will pull the answer out of a form that we're gonna give them in a second.
Okay, so nothing new there.
We're loading the data and we're returning it.
There's nowhere to process the question, but this will let us actually show it.
So let's see what we get here.
We come over, now it says we can chat with episode.
Let's click and see if we got this right.
No, no, no, we have not podcast chat such and such.
What have we missed in our URL?
Podcasts.
Which I guess all the other ones are podcasts, aren't they?
You know what, I'm gonna change it here.
I like that to be consistent.
Go back, refresh.
Oh, it works this time, hooray.
Well, there it is.
This is our UI that I built for us.
It says AI chat with episode.
It has a little question mark because, hey, what's your question?
What do you wanna know about, notice this will navigate you around.
What do you wanna know about Python bytes and the episode, this one that we just shipped open source?
Here's your question, let's ask it.
Nothing, nothing, we got nothing.
Why?
Well, that part's not written yet.
You can see down here, there's a 404 not found to some htmx thing that we're trying to do.
All right, we're gonna work on that in a second and that will be pretty close to finishing things up.
But let's look at this chat really quick and just understand how this UI works.
So we've got this summary information above, we've got a text area for broad, larger than just a couple of words questions, and we have this ask a question.
So up here at the top, this is just the summary.
What do you wanna know about this podcast?
Don't worry about that.
But here's where it gets interesting.
We have a form and instead of having a post or an action, it says htmx is gonna handle this form for us.
So what that will do is we can type in here and say, I want to know the meaning of life.
And you ask a question and it'll come down here.
You could say, actually, I wanna change this to something else, I wanna know the meaning of fun.
And you ask it again, right?
It won't refresh, it'll just stay right there and you can't see it right now, but in the bottom, there's a little section where those results go.
So that's why we're using htmx here.
Hopefully you're finding an awesome appreciation for htmx.
The target is not this form because we want it to stay, but is in fact down here where we have results.
We also have a spinning indicator.
So instead of having the reload and then you go and click a button when the job is done effectively as we have been, this will just show an activity indicator while we're waiting.
And that's right here, the indicator is that.
And other than that, it's just a basic form, right?
So our text area just has an idea question, it has some rows, it auto focuses, all those kinds of things.
When htmx loads, we wanna delete the old answer while it's thinking.
So that's kind of a little nice UI bit.
And then the button is just a submit button and there's no htmx magic, it's the form that does everything.
So that is the UI that we're working with.
And when I clicked it, you might have seen some flicker down there, maybe there that time.
That's the activity indicator, but it so quickly says 404 that the activity indicator just goes away.
All right, so our job is going to be to write the endpoint that handles this form submission right there over to the server.
And then we're actually gonna call Lemur from there, create that new chat like we did, set the answer, and then return the answer to the user.
And that'll be the entire AI chat feature, which is going to be amazing.
I'm gonna have so much fun with it.
|
|
show
|
4:05 |
We saw we have the UI completely working, but we don't have any way for this to be processed on the server.
So let's go ahead and borrow that again.
And I'm gonna put the URL right here like that, that I copied from the form.
So it's hx-question.
You can put whatever you want for the URL, but I decide, all right, for, just kind of be consistent.
When you see hx, this means it's an htmx partial type of thing, most likely, and that you're not going to call it directly.
There's a lot of flexibility there.
The other thing is, let's look over in our views in the partial section, we have a chat response.
Now the chat response is pretty basic.
It just says, is there an answer?
Show them the answer.
Otherwise, if there's an error, and by the way, there's not also an error and an answer at any time, but I guess we could check not error and error, whatever.
If there's an error, we just show the error.
Otherwise, we'll put a little style like we have on other parts of our assembly AI response sections.
But other than that, we're just going to say, here's the answer.
And remember the JavaScript that said, delete the prior answer, we're going to put the ID there.
Okay, so most importantly, just know there's a chat response HTML that we're going to set.
And so that goes right here.
And there we have it.
Now, in fact, it's quite similar to what we had before.
The thing we just need to do is we need to check if there's an error.
So our load data is checking things like, are you logged in?
Are you allowed to do this?
Did you actually specify a question, et cetera?
Does the podcast exist?
So we'll say if VM.error, we're just going to return VM.toDictionary.
So that will set the error message.
We don't want to go and even try to generate the answer if for some reason they're not allowed to.
So that will do that section there.
Maybe we could even test it and see what we get back.
And well, before we do that, let's go down here and just say, VM.answer, just to make sure this round tripping works.
There's no AI magic happening here.
Is AI magic?
Let me ask.
Bing.
Well, that sure wasn't magic.
Oh, this is post and where I copied it before was a get.
So I don't think we've done a post together.
There are some other parts of the app.
Is AI magic?
One more thing.
This is slash partials.
Yikes, sorry, folks.
Is AI magic?
You've asked, is AI magic?
We don't know.
Woo, amazing.
Look at that.
I think it might be like advanced tech.
Right?
So you can see, we can just ask that question and let's show the timer thing real quick as well.
Let's do a time.
Let's sleep.
Let's suppose it takes three seconds for this to run.
Now I'll put a question mark here.
Look, it's thinking, thinking.
Hmm, yes.
Is it like advanced techs?
Perhaps.
Very deep question you have asked of me, Michael.
Awesome.
So that's the workflow.
We've got it basically from a web functionality perspective, 100% done.
What we do need to do though is actually come up with a answer to that question using Lemur, which we'll do in just a minute.
But what do you think?
Pretty cool?
I love it.
There's so much to love about this.
Like I love the way HTMX makes this so clean and nice.
Let's just keep flowing without writing a ton of JavaScript.
Yeah, I think this is really neat.
Now we just need to make it work from a Q&A perspective.
|
|
show
|
4:23 |
It's time to go to our AI service and make it actually ask the chat.
Now we created this little thing about creating a new chat, but this is only really about juggling whether or not we've seen this before in the database.
We're gonna have another function here called ask chat.
Similar information coming in, except for notice we're not passing in the prompt.
So the user doesn't give us that.
We come up with it, you know, like right here as we have been.
So the first thing we wanna do is make sure there's a transcript.
Now, I know the UI didn't exist, but we don't want people to maybe come around this sort of thing, somehow find a way to ask, you know, abuse this function or as the app grows or whatever.
So let's just make sure there is a transcript.
All right, there we go.
So we're gonna need the transcript anyway.
Remember, that's what we're providing to Lemur, but we're gonna check and make sure that it does exist.
Okay, the next thing we need to do is our prompt engineering and as you saw, this gets pretty interesting.
Our prompt is gonna require the podcast 'cause like before, we wanna say the podcast is about this and so on.
So we're gonna say podcast equals podcast service dot podcast by ID, podcast ID.
And again, one awaits it.
Now, instead of me typing out this whole thing, let me just put the one I want in here and we'll discuss it.
Like before, you are an expert journalist.
I'm gonna give you a transcript for this podcast.
I want your answer.
This is new to us.
I don't think we've done this sort of thing before, but I really like how it came out.
I want your answer to include sources and fragments from the transcript to support your response.
The more open-ended we make it, it can start to make up things and imagine things.
And obviously there's many well-known cases of this hallucination as they call it.
So I'm gonna say that I would both give it this instruction to try to be real, and I'm gonna give it permission to say, I don't know.
Rather than saying, well, they asked me a question, I have to have an answer.
Like, no, I don't want you to make anything up.
It's okay to say, I don't know.
All right?
And then my question about this podcast episode is, and then this is the question that the user has typed in.
Pretty cool, right?
So that's what we're gonna use.
We'll see how well it comes out, but actually it's gonna work pretty well.
The next thing to do, now that we have the prompt, is to see if this is already an asked and answered question.
So we'll say chat equals, and if it's not new, we wanna save it for this user, new chat, which is what we saw.
All right, let's make sure I got this all in the right order here.
Podcast ID, episode number, prompt, question, email.
Okay, and then how do we know if it's asked and answered?
We'll say if chat.answer, return chat.
We're good.
We've already asked it.
Here's the answer.
But if not, we're gonna need to ask Lemur.
Now, we've done this four times already, I think, so, or multiple times at least.
So we're just gonna reuse basically the same code.
We're gonna say, create the Lemur client.
I'm gonna say, asking Lemur about what the question is this time.
Passing the prompt, the basic model, 2000.25, 25% variability.
And remember, we have to pass in the transcript text.
And let's go up here.
Let's look at this real quick.
So what I did is I created a property on our database record that does that join statement.
Remember, we had a comprehension here that said w.text for w and self.words and join that like this.
We talked about that.
Well, we're doing it here, but that's what that's about.
I don't wanna keep, in case we somehow change how we make this, I don't wanna have duplication of that little comprehension all over the place.
So here we have our response, and then it's just chat.answer equals response.response.
And we've seen that there can be whitespace in the beginning and other weird stuff.
So strip.
Now, let's save this into the database because that's always a good idea.
Or in churn chat, right?
We didn't find one in the database before, so it's a new question in some way.
So we're gonna do that and save it.
|
|
show
|
5:05 |
I think we're ready to actually call this function in the UI.
Now, before we do, I think actually this has, I'm a little worried this is gonna limit some stuff.
So let's take this out and we'll add it back if it becomes a problem.
Okay, so let's go over to our podcast views where we have, I have a question, I have asked it.
Let's actually ask the question.
So we'll say this, we'll say chat equals await AI service, ask chat, and we put podcast episode email question.
Now you might be thinking, what if there's nobody logged in that could crash 'cause that would be none.
Well, it's not getting past there, right?
That's our test we've already done.
So this should be totally solid even with verifications and let's go ahead and put, there's two ways, there's a more refined way we're gonna do this, but let's say chat.answer goes here.
That should honestly be all we need to do to make this work.
Let's run it.
And let's try it from the beginning on one of these.
Let's go over here to reply all.
We haven't done anything with them yet.
Reply all the world's most expensive free watch.
No idea what this is about, but let's give it a try.
You can see I picked a few more that we're gonna play with up there.
Let's just make sure that it works.
I'll press, could have the transcript and this should change.
If you look over here, you can see if there is a transcript, the chat does change, but let's just hit enable chat right here to see that this works.
Look at that, preparing episode for AI and now it just means we're transcribing it.
So as usual, this will take just a moment and then we should get something cool here.
Success, chat is ready.
Refresh and now it says we can chat with the episode.
So a story of a man who made an extremely dubious decision to order a watch that he found on Instagram.
Well, let's ask a question here and just see what happens just to kind of round this out.
What happened on Instagram?
Based on the transcript provided, not a fan of that because that's sort of leaking our implementation details, but let's carry on.
It appears this episode of Reply All discusses deceptive drop shipping schemes on Instagram.
Specifically, a man named David fell for one such scheme after seeing an ad on Instagram for a free watch.
Oh, awesome, how cool is that?
The transcript provides evidence that these types of drop shipping schemes were successful using Instagram ads to mislead customers.
Very, very cool.
One thing I wanna point out here is this UI is a little condensed, compact, I don't know.
If you look at the response, it's actually paragraphs and more space in here, but 'cause we put plain text into an HTML field, all of that was lost.
So I told you there's a better way to do this as we can come down here and say vm.setAnswer, chat.answer, and this will cure some of the things that were bugging me there.
So refresh, and because this question, not totally refresh, it's ask again.
Because we asked this before, let's see what happens.
It shouldn't have to rerun it.
Sure enough, it doesn't.
Notice the based on the transcript provided is gone and now it's a, right to the answer.
It appears this episode of Reply All Discuss is deceptive, da da da, and here you can see it now has the concepts all broken out with a summary at the end.
Awesome, so, so good.
So I'm impressed.
This is really, this is impressive.
So let's just look real quick at what this set answer does.
It's worth checking out.
Up at the top, there are a couple of regular expressions.
And these are the ones that look for these misleading pieces.
It says if the text starts with based on that dot dot, we're going to just remove that bit.
And then there's another one that we potentially could run across called transcripts.
Sometimes it thinks there's more than one if it's too long.
And it's just gonna go down here when we say set answer and just turns it and says for each regular expression, get rid of that junk, uppercase the starting character, and then turn what are just text new lines into line breaks in HTML.
So just so you know what's happening there.
And now it makes, I think it makes a big difference.
It makes a real big difference.
So awesome, awesome, awesome, awesome.
I love it.
|
|
show
|
6:04 |
I'm so impressed with this feature.
Let's just play with it for a minute.
If this particular video doesn't inspire you, feel free to just skip ahead.
But we'll wrap things up with this chapter.
But let's just play with it a little bit more and see where it goes.
Okay.
Over here.
Let's talk.
Let's do this one first.
Okay.
Let's talk to the Python bytes.
So in this episode, you can see in the summary that Brian is going to create a Pytest suite timeout plugin for Pytest.
The title is that episode where we just ship open source.
So he releases that to PyPI.
And then it turns out I created something for you mommy and release that to PyPI as well as something for list monk hosted email service also released that to PyPI.
So the three of the four things we discussed that week are actually open source projects that we created that week.
It's kind of insane.
So let's ask some questions about that.
The transcript exists, so we can chat with it already.
What open source projects did the hosts create and release?
Awesome.
The host, Michael and Brian.
So already kind of cool that I figured out discuss the following open source projects they created and released Brian.
Oh my gosh, look at this.
Brian mentioned that he created Pytest plugin called Pytest suite timeout adds a timeout for the entire test suite.
The transcript states, remember we asked it for show me your work.
Brian mentions the new Pytest plugin he created called this and adds a time.
Okay, that's kind of repetitive.
It's okay.
Michael shares that he created a Python client for a self-hosted analytics tool, umami, that allows sending custom events from Python code.
The transcript says, Michael shares this thing.
Yeah.
Okay.
Perfect.
They also discuss some open source alternatives.
So in summary, the two, there's actually three, but I think it missed it.
open source projects, the host created are these two.
Please let me know if you need clarification.
Awesome.
Let's ask one more question.
How can I collect web analytics on my users without violating their privacy?
That's kind of an open-ended question that wasn't exactly mentioned in the transcript, but the umami platform being self-hosted means the data doesn't leave or get shared.
Or there's none of that retargeted garbage that seems to so many people on the web think is one reasonable and two is the only way.
Let's, let's see what that says.
Oh, let's be a little more specific.
How can it without sending their data to a third party?
That made all the difference.
Just be a little more specific, I guess.
Here are the two key points on collecting web analytics without sending data to a third party.
Michael talks about setting up this analytics tool, umami on his own infrastructure, using Docker to gain insights without sharing.
Exactly.
And here's the transcript source.
The host agrees that running the services like analytics and mailing lists on one's own infrastructure gives you more control.
In summary, to collect web analytics without sending data to a third party, one option discussed in the podcast is to set up open source analytics tools like umami on your own infrastructure using Docker.
That is pretty darn good.
That is pretty good because I didn't ask a question of what did they say or summarize this.
I kind of asked for advice in the, in the context of this episode.
Okay.
I'm happy with that one.
That's really, really cool.
Let's go over here and let's chat with this one.
Remember this is about Emma Stone.
How did Emma convince her parents to move to LA?
See if it'll know that.
Yeah.
So Emma Stone convinced her parents to move to LA so she could focus on acting.
Emma discusses experiencing anxiety from a young age, convinced her to homeschool and later move to LA.
Okay.
So according to transcript.
Yeah.
Yeah.
Okay.
One more.
Let's go over here to this one.
Remember this one is about Apple watch.
What is the legal issues?
What are the legal issues with the Apple watch?
Based on the transcript summary provided, the main legal issues discussed were related to ongoing patent dispute between Apple and an Italian company called Massimo sounds Italian.
I'm thinking of it's American.
Here's some of the details.
Oh my gosh.
How awesome is that?
So in summary and such and such and such, I think for some reason it can't process the entire two and a half hours of text.
So it's missing some, some of the detail here, but that's okay.
It's still quite excellent.
What is getting us here?
I'm impressed.
I think I might be most impressed with this one because it talks about so many different topics.
That's kind of the point of that show is to bounce around a bunch of things, which I think might throw it off a little, but it seems to do really well.
All right, there we have it.
We've, we've asked a bunch of questions about our podcasts.
I think we're getting there.
I think it's about, about living up to its name, right?
We have the x-ray podcast app that doesn't just load up the podcast and let you play them transcripts, summaries, key moments, search, and now even the ask AI incredible stuff.
So hopefully you really enjoy this app that we built.
I'm really happy with it.
I think this is quite a neat demonstration of what can be done with AI and assembly AI and all these things.
And there's, there's plenty of places you all could take this or apply it to your particular area.
And honestly, there's a bunch of other things that we haven't even leveraged out of assembly AI.
Still very fun.
Be sure to check out the source so that you have the final version.
|
|
|
2:45 |
|
show
|
2:45 |
As we wrap up this course, I want to just give you a few parting thoughts.
First, be sure to go over to the GitHub repository if you haven't already, start it and consider forking it.
That way you'll have all these resources to take with you and all these examples.
Okay.
And if you have find that there's a problem with the course, you can go over there and submit an issue.
Also use a repo for that.
The URL is at the bottom.
So feel free to jump over there.
You can also just click the take me to the repo for this course in the player.
So what's next?
You've finished this course.
Congratulations on making it to the end.
Now maybe you're inspired by what you saw and you want to learn more and you want to go deeper.
Well, in the appendix, I'll talk about how you can keep working on this project and extend it and make it your own.
But if we're talking about what do you learn next?
Well, I have three courses on FastAPI, which you can check out over at talk, Python training.
Of course, the first one to start with is modern APIs with FastAPI.
There's also the full web apps one, which is kind of what we were doing with it here.
And if you thought the beanie and MongoDB stuff and async database access was awesome, of course on that.
And if the async in general, all that async and await stuff, you're like, wow, I never played with that before.
How do I do more?
How do I get better at it?
Well, we have again, a whole course on async and await and other types of parallelism in Python.
If you were inspired by maybe the HTMX aspects and how could you not be?
I am endlessly inspired by HTMX.
Again, a couple of courses.
We have one on HTMX and flask, another one, same thing, but Django.
And if you feel like, oh, my Python could get better, check out Python for the absolute beginner.
Sounds very, very beginner, but actually it gets pretty advanced and covers a lot of cool concepts like if you don't have a CS degree, it teaches you a lot of things you might have learned along that way.
Finally, if the editor work we've been doing either with PyCharm or VS Code was inspiring, again, a couple of courses on that a great course on PyCharm and a great course on VS code.
All over at talk by the on training.
Thank you for taking this course with me.
And that brings us to your turn.
You've got this cool AI sitting next to you.
Now that you know how to work with lemur and assembly AI, got all the skills you need.
What are you going to build?
Go build something awesome.
And if you do shoot me a message, Michael@talkpython.fm or mkennedy@fostadon.org on mastodon and share with me.
Let me know what you built because I think people are going to build some pretty cool stuff for this course.
All right.
Thanks.
See you later.
|
|
|
14:14 |
|
show
|
6:23 |
In the appendix here, I wanna give you a few tips if you didn't go exactly down the path that I was going, as well as a few extra things if you wanna take this project and go further.
Okay, the first one that I wanna talk about is how do we run this project in VS Code?
You know I'm a fan of PyCharm, but I'm also actually a fan of VS Code.
I just use them for different purposes.
And so when I have big projects that need lots of coordination and lots of pieces moving together, it's all day PyCharm for me.
But there's plenty of times that I would like to use VS Code and I know many people that's their primary editor.
So I'm gonna show you first how to go and work with VS Code.
So over here, I've made a copy of the starter project.
It doesn't matter what point in time that you're working with here, could be any of them.
Okay, so I'm gonna open this up in VS Code.
In macOS, you can drop it on the icon in the dock.
I don't believe that is possible on any other operating system.
You can do files, but not folders.
Okay, now notice we do not have a interpreter for this project.
So we could do one of two things.
We could go up here and we could say create environment, right, use Python to create a virtual environment.
I kinda like to have more control over it.
So let's go, let's do it this way.
Let's open up the terminal here and notice I'm in X-Ray podcast.
That's the starter project.
Let's see what's in here.
You can see our source directory.
This is the top level thing with the requirements.
So let's go ahead and do this here.
PLS is a thing you can pip X install for a nicer LS by the way.
So we'll create a virtual environment, Python -m venv, I like to call it venv.
That's good.
And I'm gonna say --prompt is dot.
So X-Ray podcast is gonna be the name of the virtual environment.
And I have an alias to activate virtual environments, you know, dot bin slash activate, et cetera, just AE.
So now you can see the name is X-Ray podcast.
And if we ask which Python or waiter Python, I believe on Windows, I'm one of the, which or where, I forget which one is the one that exists or not.
Okay, so we could click this, but let's just go over here and we'll pull up the command prompt and I'll say select interpreter.
Okay, it's recommending the one that is a virtual environment.
I'll just choose that.
So we should be good to go.
Why is it not happy with this?
And you know, that's the one I picked.
Let's just close it and open it again as you know, so often is the fix.
So there we go, Python 3.12.2, and it's magically happy.
Notice that this is not ideal 'cause it doesn't think a lot of these things are installed, right?
So the next thing to do in our virtual environment, make sure it's active again, right there.
Be very careful.
pip install -r requirements.txt.
And while we're here, let's go ahead and do that spacey download to get the language model working.
We've already talked about this for the canonicalization of our search indexing.
My apologies to GitHub for using up their bandwidth once again.
All right, we can see many of those errors.
Not all of them, but many of them have gone away here.
So we should be good to go.
Now, how do we run it?
Well, I click this.
Let's see what happens.
It's probably not gonna love it.
Now is it secrets and static doesn't exist.
Oh boy, it's running it from the wrong location.
And also this is not a great debugging experience.
So now that we've got it kind of loaded up, let's go over here to our proper debug section.
I'm gonna say create a launch JSON, and that's gonna be Python debugger.
And what I wanna do is, let's say, I do wanna do a FastAPI.
After all, this is FastAPI.
Maybe that's the most appropriate.
And it says main.py.
Yes, sort of.
So this is pretty good.
Look, instead of just running the app, it's gonna run uvicorn, and most importantly, pass in reload, and enable Jinja, all those things.
Excellent.
Let's run it and see what happens.
Hmm, there is no main.
Huh, what is going on?
Visual Studio just assumes, of course, everything that you possibly could care about will be right in the root directory, 'cause why would you have any subfolders?
Why would you have organization?
Come on, people.
So there's zero hint or help here about how I would do this, but what I can do is I can basically say, as part of the run, go into this, set that as the working folder, and then go from there.
So the way we do that is you can come in here and put in your folder, your $workspace folder to your main, and this is src in our case.
So now if we just rerun it.
Awesome, that looks better, doesn't it?
Looks a lot better.
We've got our traditional startup.
Look, it's watching for changes.
That's great.
So if we make changes, it'll automatically reload.
You don't have to leave that in there, but you can.
Uviacorn is running there.
You can see it's talking to the database.
It looks like everything's working.
If we click on it, sure enough, there it is, and it should have all of our data.
There's our data out of our database.
Okay, so that's how we set this up in Visual Studio Code.
What did I do?
I went and opened up the directory or the folder in VS Code.
Second, I created, well, you could do this first, probably do it first.
Create a virtual environment, pip install the requirements into it, then open up the directory so it just discovers it as you can see that it did right there, and then I created a launch configuration over on this thing, and I added this little bit of code here so that I could look and say, go into that folder and then run main app with reload.
All right, so if you feel like you wanna work with VS Code, no problem.
|
|
show
|
5:42 |
I hope you like the web design and the feel of our app.
I put a lot of effort in to make it feel modern and beautiful and not like just plain HTML, but looking really good.
And so there were a lot of different things I did to make that happen.
One of them was to lean on Tailwind and Tailwind UI.
So for example, Tailwind UI lets you build really nice looking sites here.
I'll just go look at some of the templates.
Like, let's look at this one right here.
This is super, super pretty.
So this is pretty cool.
And it's all based on top of Tailwind, which also I was using.
Now, Tailwind is cool.
One of the things that it does though, and this is a benefit in the end, primarily, but it makes working with it a little bit of one extra step, let's say.
So for example, see here, you can say things like, they're actually running out.
We're gonna use medium, when the size is medium, use W48, and when it's medium, put the text on the left instead of on the center for when it's small.
Tailwind comes with many, many classes, and there's a bunch of CSS in there.
And their goal is to say, well, if you only use a small little sliver of Tailwind, we want you to ship a very small little sliver of that over to your users.
Let's go over to our app real quick and actually look at the CSS files here.
So we don't want console, we want network.
And here you can see our site CSS right there.
Do a full refresh so we get it.
Now, let's look and see what I wrote in our site CSS.
You can see there's a couple of lines here, and it goes down to line 119, but there's directives up here, like @Tailwind and stuff, and @apply.
So Tailwind basically uses this file to generate this release CSS over here, which has a whole lot more stuff.
That's what we were looking at when we get this one over here.
So if we open that in a new tab, you can see what we're actually getting here is all of the Tailwind business.
And so, see, it's only 35K.
That's really small for CSS in general.
So what Tailwind does is it will go and say, I'm going to look at all of your HTML.
Let's go over here where they tell you how to get started.
Roll down a tad.
It says, we're going to run the CLI that looks at all of your inputs and then generates an output based on your HTML files and all the styles that you're using.
So that keeps it really, really small, which is excellent.
The drawback is if I go over here and make a change in our app, like let's get out of there and let's just go somewhere to the homepage.
And this section here, it has a padding bottom 10.
Let's look at that.
That's just this padding down here.
And let's suppose it has, put some text in there.
We can say it's this text just so you see it.
But let's suppose that it has a BG color of Amber 500.
That's a thing we could do.
Let me refresh it.
Hard refresh it.
Why is that not changing?
It's not changing because nowhere else in the website was that class used.
And so it's not included in the CSS because Tailwind said, you know what?
You don't need this.
Oh boy.
So what we can do in several, I'll show you this in VS Code and I'll show you this in PyCharm.
Start with PyCharm since we have it open.
So we can go over here and let's go edit configurations and I'll, you didn't see that.
It was hanging around from before.
So I'm gonna say add a new run configuration.
Not the Python.
We'll do it up this way.
Add a new NPM run configuration.
And it already finds the package.json.
It runs and what it's going to run is watch.
You could do a build, you could build a minified version or you could watch.
Build just does it once.
Watch builds and then we'll make any changes to any HTML or CSS, we'll make it rerun.
So I'll say watch.
And now we can run one of these two.
I'm gonna run this one.
You can see that it did its thing.
If we go back now, ta-da, perfect.
Okay.
And if we make another change, let's just see this as amber.
Make it this really dark amber and take away this text here.
Refresh.
Notice it's automatically watching and making those changes.
So here's the deal.
If you're going to take this project and build new things with it and expand it, one, you're gonna need to use Tailwind or dramatically change the web design, take your pick.
But if you wanna go and keep carrying on with the same web design, Tailwind is how it's going.
And you're gonna need to run, basically when you start working on the project, just kick this watch off and then switch back here to work on the Python side of things, okay?
You can just leave that running down there, this watch and ignore it.
That's it.
It's not super hard to do.
Once you do it once or twice, you can see what is the command that it ran.
It just goes in there and just runs in the directory.
It just runs npm run watch.
So if you don't want to do it within PyCharm, you can run it like that.
|
|
show
|
2:09 |
So now that you know about Tailwind and that we need to run npm watch if we're going to make any changes to it, if you're on the VS Code side of things, how do you do that?
Well, it's super easy.
Let's assume you already have your launch configuration set up from the previous video.
I come over here and do this little drop down here and say, node.js right there.
Notice it even has discovered this run script watch.
Interesting.
But if we pick run node.js, it actually gives us a list for all three options that are specified in the project JSON.
And watch is sure enough the one I want, so I can hit run on that.
And it comes out red, but it looks like it works just fine.
If I make a change and hit save in the HTML file somewhere, you can see that it rebuilt it right there.
So it is working even though it looks erroneous, I suppose, whatever.
And that's all good.
And now we can just come back here to our launch.
And if we want, we can run this again.
You can see run script, right?
Run script watch app is running down here just like you would expect.
Okay, so remember, if you're going to make changes to the web design using Tailwind, you have to run npm watch.
And this is how you do it in VS Code.
Pretty easy.
So again, if for some reason you don't want to go and do all this stuff directly in VS Code, there's also you can just come over here and run npm run watch.
Same type of thing, right?
Put that off to the side.
But you can do it VS Code, PyCharm, so you can kind of have it all self-contained in there or just, you know, kick this off somewhere and pretend, minimize it and pretend it doesn't exist.
Okay, again, that's run in the top level folder where the package.json is.
We go back and look at package.json real quick.
You can see where it's defining what watch is, what build is, what minified build, and all those things, if you're interested.
|