Build An Audio AI App Transcripts
Chapter: Feature 2: Search
Lecture: The Search Engine Basics
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Let's just do a quick quick look at this search engine that I built so you know how it works It's gonna be using a couple of interesting pieces
0:09
When it starts up, I'm gonna turn this number down for now
0:12
I'm gonna put that to five seconds in this chapter and then later on it's gonna be a little bit slower
0:17
But the website will start and then five seconds later the search background indexing will kick in so we don't have to wait too long
0:24
To see what's happening Then how often does that run it says every five minutes and for this chapter, I'm gonna make that every
0:32
One minute 60 seconds. We're gonna use this thing called an NLP, which is a language. What the heck is that?
0:39
We're gonna go out to spacey. Spacey is a natural language processing System. It's really cool. There's a bunch of
0:48
ML stuff going on here, but we're not going to do anything fancy
0:52
What we're gonna do is we're gonna load up one of their language models and you can tell it to do things that make search
0:59
Engines a little bit nicer through what are called lemmas so we could give it like this text and say
1:04
Go through here every element that you found in that text and give us the lemma underscore. There's a regular lemma. That's a different thing
1:13
The text you want is the lemma underscore. Anyway, what this does is it?
1:19
Pluralizes words. So for example if you had a podcast that said it was entitled the geese of Canada
1:26
But you search for goose if you just look for the keywords
1:30
Well, you're not gonna find that episode even though that's probably the most relevant thing in the entire library
1:36
with the geese of Canada when you want to know about a goose or
1:39
If I'm looking for friends and they only talked about a friend, right? You might want to have that come up
1:46
so it'll do things like change the pluralization or goose to geese and There's a whole bunch of other little
1:53
Variations in there. So we're gonna use spacey to and this NLP aspect of it to come up with those lemmas those
2:00
Normalizations of the words that we find throughout this whole thing We'll be able to manually trigger an indexing
2:09
So for example, if somebody adds a new podcast We want to instantly start indexing not wait for every five minutes or every hour
2:16
We can run a search. We'll search some text like Geese and birds or whatever and it'll break that into a bunch of independent
2:25
Keywords and then it's gonna run a search on those keywords. Okay, so we'll see down here
2:32
It's gonna go through and say look we're doing all this search All right, we're gonna have a database record called a search record in the core essence
2:40
This is sort of MongoDB the one that we start with Docker. The core essence is every search record
2:45
Is gonna have an episode number a podcast ID and then a distinct set of words And that's smaller than you might think so for like an hour-long
2:56
Conversation on a technical topic you might end up with 1,000 words and we have as you should
3:03
Absolutely should have an index on this part of our database
3:07
So we can do an index based search on the keywords that appear and we're gonna use that in an LP to get just the normalized
3:14
Ones in there as well. That's basically how the search engine works
3:18
Finally, there's this task here that just like the part that did the transcribing the background work stuff
3:26
It's just gonna go down here and say run run run while you go just build Build the index and then wait for however long it needs to be in this case
3:37
I said five minutes when it opened so this does and a wait sleep for five minutes That basically takes this thing out of the async
3:45
I'll event process something for five minutes then puts it back in to do a little bit of work And if that was an hour, it'll sleep an hour
3:51
It's really really low overhead when it's not running and it's also pretty fast. So that's basically what it is
3:58
It's going on here and we're going to go and turn it on in the main actually not in the main
4:05
Remember we got it. This is all a sink and a wait so it plugs into our app setup just like the other one So here's our to-do. We'll do asyncio
4:14
create task Search service dot and there's one called a search indexing task. And again like before
4:23
It's not a problem to start this. In fact, that's the entire point Now so we're just gonna run this while true just like we do the other ones