Build An Audio AI App Transcripts
Chapter: Feature 1: Transcripts
Lecture: First Transcript

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Transcribe in time, let's do it. So, Assembly AI, we have to make sure that that's imported. We wanna create a transcriber.
0:13 So we're gonna do things with this thing. We'll say transcriber equals that. And we're also gonna need to pass a configuration file.
0:21 We could just call transcribe, but you'll see there's an incredible number of options and features available.
0:27 So we'll have to come up with a config equals assemblyai. transcriber config like this. Now let's see what we can pass to it.
0:36 If we jump over here, yeah, there's a couple of things. So we could pass in the language, whether or not to do punctuation.
0:44 And we're gonna wanna say true for that one. So let's pass in some things that we'll want. That one is true. Format text, dual channel.
0:54 What else have we got here? That's a lot of help text it was trying to show us. We could do a subset. If we knew here's a huge long thing,
1:02 but I just want this little part of it. Also, if you have certain words in, doesn't necessarily make sense in a completely general app like this,
1:11 but let's suppose you're in like the medical field or in technology, programming, Python, you can say these are words that often get confused
1:20 and put it for something else like PyPI might be spelled like the food instead of P-Y-P-I. You don't want that.
1:27 So you can use word boost to say these words are really important in this particular area. All right, and you can say filter out profanity,
1:36 redact personally identifiable information, which is excellent. You can put speaker labels on there. None of these we're gonna put on ours yet.
1:47 Oh, I really need that to not be in the way. Disfluencies, however, if you like, when you don't do that,
1:59 you don't want those kinds of things in your transcripts. So you wanna, if somebody says, I went to the store,
2:07 it might be nice to just say, I went to the store, right? You don't need all that babbling stuff in there. So we'll say disfluencies, false.
2:15 We don't want them transcribed. We want them omitted. You can do sentiment analysis, auto chapters, entity detection, summarization.
2:23 We're gonna actually use lemur and the LLMs for this later. So that's great. That might be all we're gonna add. Now that I think about it,
2:31 let's actually put a format text is true and just so it doesn't do it, 'cause arbitrary set of speakers, I'm not sure how well that's gonna work
2:39 for us in this situation. So we'll say false. So now come over here and we'll say transcript, almost future equals transcriber.
2:52 So we could say transcribe, and this is a blocking call, but we have such a nice setup here with async and await. And if we do a blocking call,
3:02 it's not just gonna clog up our transcribing service. It's gonna actually block up the entire FastAPI event loop. We don't want that.
3:11 So we're gonna go and use the transcribe async version here, and we'll pass for the data, we'll pass the MP3 URL and for the config,
3:20 we'll just type in the config here like this. Now you might think, Michael, you forgot your await. Put your await there. No, no, I'm not doing that.
3:31 What I actually get back here is, I guess I gotta just do something with it. So the little error goes away and then I can hover over it and show you.
3:38 This is a future of T, a future of transcript, which is cool, but it's not something I can await. It's a thing that just lets me ask questions.
3:47 That thing I started, is it done? No. Is it done? No. Is it done? No. If it is done, then what is its result?
3:56 So it's kind of like a intermediate working thing. So I wrote a function down here at the bottom called run future,
4:03 and it runs a concurrent futures dot future, and it returns the result. In this case, it was a future of transcript.
4:09 So this result will return a transcript. And this thing itself is actually awaitable. So it's a little bit of a hassle,
4:15 but we just have to say transcript, which is a transcript like that from, let's say, assembly AI dot transcript to make it really clear.
4:25 We'll await run future of transcript future, like that. Woo, and let's just print out transcript JSON response,
4:35 and let's print it in a way you can see it really clear. If you put an indent into the JSON dump S, it'll automatically format it.
4:44 Okay, this is gonna be it. We're gonna kick it off and run our transcribe here. Go ahead and restart it so we know that it's gonna work.
4:53 Pull this back up. I'll hit it. We'll go ahead and transcribe this episode 344, and it's starting. And let's see what's going on here.
5:05 Starting new job. Would transcribe that. It shouldn't say would. It said we are transcribing, right? We are transcribing. Dot, dot, dot.
5:22 And it's running. The next thing it's gonna do when it gets done down here is it's gonna do this print statement,
5:32 or we'll see an exception in one of those two. There we go. It's done. I told you it was pretty quick, given how much data we're working with.
5:40 So here we have it. Look at how ginormously large that file is. In fact, it scrolled out past the buffer. So what are we getting back?
5:50 There's some things we'll talk about, look at the pieces, but what's important here is each word comes in. So that, it starts at this timestamp,
6:00 and it ends very, very quickly after. You get a confidence, like how sure is it that this is the word? And if we had said set speakers,
6:11 and it would tell us which speaker it was. All right, let's see if we find one. Here's one, it's not so sure. Assume wasn't awesome.
6:18 I don't know, just making that up. But it's 99% sure, not 100% sure. And then that people, this. So what our end goal is gonna be?
6:29 Remember I showed you that cool printout, or that cool view with the green sections. You can click on the sentences and view it.
6:36 We're gonna take this stuff, and we're gonna turn those into sentences. And if you scroll through a little bit, like this one, other shows dot,
6:47 we'll just look for punctuation. We'll just say, we're gonna go through until we see a period or exclamation mark,
6:52 or question, or something along those lines. And we'll consider that to be a break. Or you could use it based on time. You know, if it gets too long,
6:59 we'll just do a line break there if it runs on. So we'll do a little bit of that magic to turn this string of word after word
7:07 with timestamps into transcript sentences that we can use for our display, for our search engine, all those kinds of things. But that's it.
7:17 And look how incredibly simple, right? Create the transcriber, specify how you want the transcription to go, tell it to go.
7:28 We want to be able to keep our code responsive and zippy. I didn't show you while it was running, but FastAPI is completely in charge
7:36 to still do whatever it needs. This is taking basically no effort. It just kicked it off to the internet and let that thing go.
7:44 And then eventually it's gonna come back and it'll wake up and run. So it adds almost no overhead to what's happening.
7:50 So because we're using this async version, it could just keep on cruising. When it's done, we got our results.
7:56 Of course, we're gonna want to save this to the database and not do it over and over again. But this is the basic way that we do transcriptions
8:03 with Assembly AI from Python.


Talk Python's Mastodon Michael Kennedy's Mastodon