Build An Audio AI App Transcripts
Chapter: Feature 3: Summarize
Lecture: Creating Transcript as a Single Text String

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Next up, step four. And in step four, what we're gonna do is we're gonna create the transcript text. You may be thinking, Michael, are you confused?
0:11 We already have the DB transcript, right? We have this DB transcript here. And so you should be able to just use the transcript text.
0:19 Well, let's go over and go over to our data section to our transcript and see what's in here. We've got our created date, updated date,
0:28 episode, podcast, summary. This is what we're trying to fill out in this section. Here's a bunch of details.
0:35 And then words, I'm gonna say list of these. This is what we're storing in the database. We have a list of nested objects,
0:43 which has text, start time, and confidence. What we need is this is the sentence, this is word one, two, three, four,
0:51 just as pure text to send to the AI, right? So that's what we're gonna do right here. And it's really simple. And this is gonna be a string, right?
1:01 And a really nice way that we can do this is we can go to a string and say .join. And if you're not familiar with this, let's go look real quick here.
1:11 That's a little short, isn't it? All right, let's say I have this and I wanna turn it into a single string. Or what I can do is I can just go and say,
1:28 let's say if we wanna put dashes in between, I could say dash.join words. And what comes out is the new thing created with that.
1:35 So if we just put space here, it'll turn that back into a sentence, right? So that's what we're gonna do. But as part of that step,
1:41 we need to actually turn our rich word object that has confidence and start time into just plain text.
1:50 So we can use a generator comprehension for that. You may have heard me riff on this before. I think generators and list comprehensions
1:58 and set comprehensions and so on are awesome, but they're written in the wrong order because the tools cannot help you with them.
2:05 So I would say something like something from Word for word in dbtranscript.words. What goes here? I don't know, but if we write that first
2:16 and then come back and say dot, then we get autocomplete. Whereas opposed, if I said it like this, w. It's like, well, good luck with that.
2:24 Although PyCharm insanely was going to help us there. So we wanna say text, word.text for w.
2:31 So this turns this set of rich objects into a set of words. And then we join them together in the way we just discussed
2:37 to put spaces between them and off it goes. All right, so that's gonna be our transcript text. That's the final thing that we've needed
2:45 in order to send this off to Lemur. We need the prompt, we need the transcript text, and a couple of other choices we make
2:53 as we ask it to do the summarizing.


Talk Python's Mastodon Michael Kennedy's Mastodon