Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 8: File Searcher App
Lecture: Searching single files

Login or purchase this course to watch this video and the rest of the course contents.
0:01 Ok, let's implement the actual search. The first thing that we need to do is actually go to this folder and find all the files.
0:09 So again, our friend OS will help. We can say OS.listdir() and give it a folder,
0:15 and here this is going to return all of the items in here so let's say items, notice I am not calling them files because
0:22 sometimes they are folders, sometimes they are files. So, we are going to do our loop, we are going to say for item in items: and we want to check,
0:30 well if this is a folder we don't want to do this so we'll say if os.path.isdir(item ):
0:37 we are going to not process this item but we want to keep going to the loop and the perfect keyword for that is continue,
0:44 so basically go back to the top of the loop, pull another item and keep going. If it's not a directory, it has to be a file,
0:51 so then let's write a method that will just search the file, so what we need to do is store the results, the matches somewhere
0:57 so we'll just say matches = search single file and we'll give it the file and the text to search.
1:04 Now, one thing that we are going to want to be careful with is when we say listdir this only gives us the file name,
1:10 not the full path name so either way whether it's a directory or a file, we need to do something like this,
1:16 we'll go full item = os.path.join() and remember, we made sure that this is an absolute path and then we want to join up the subitem, perfect,
1:27 so here we got the check full item and down here we are going to pass full item, somehow we are going to- we'll get the search method in a minute,
1:35 somehow we got to do something with these and that's going to be like a collection of some sorts, so let's say all matches,
1:42 initially it's an empty list and I'm going to put some stuff into it in the end we will return all matches,
1:47 so here we are going to say something like this, if this was one item we would say append,
1:52 but if it's a collection we want to add the items from that collection individually here which is what we want, we'll say extend matches.
1:59 Cool, so last bit to make this work for round one is to implement this file, so now we know this is a full path to a file name
2:09 so we'll just go with that and here let's be a little more clear we'll call this search text.
2:14 The first thing to do is open the file so we'll just use our little context manager to make sure we close it under all circumstances
2:20 we'll say file name read only and let's treat this as text, you'll see there are potential problems when we get into things like binary file,
2:27 so if I give it a folder and it just starts go through all of the subfolders, what if there is like images in there or something,
2:32 then we want to just go through each line in the file and check it to see if the search text appears in it,
2:39 so it turns out that these file streams are iterable so I could say for line in fin: and in a really nice way
2:46 just sort of smoothly stream over them without loading the whole file of them in the memory as like an array of strings or something like this,
2:53 this is also going to be really key way to use these generator methods later on but we are not there yet.
2:59 So the first question is we want to search to see if this substring is in the line and we probably don't care about case sensitivity,
3:06 or things like that, so let's say if line.find- there is two ways to look for substirng, I could say index or I could say find,
3:15 if I ask for the index and it doesn't exist it will actually throw an exception, but find, we'll just turn negative one if it's not found,
3:23 so I'll search for search text and before we do we want to create a lower case version of that string,
3:28 we'll say if the return value is greater or equal to zero so they actually found find found the substring.
3:36 The other thing we got to do is make sure that this is lower case, we could do that once above here instead of every time we had a file,
3:45 so let's do something like return text.lower(). Cool, so if this is the case let's for now just somehow collect up this line of text,
3:54 we are going to see that this is not the ideal way to do this and we'll fix it just in a moment,
3:58 but let's come over here and create a little submatches, matches just for this file, and that's the case we'll say append and we'll just,
4:06 for now this is not the final answer, we'll just append the line of text that we are going to match. And in the end we'll return matches.
4:15 So a very limited version of our search is I think working, let's give it a test so we want to come to create a list to store all of our matches,
4:23 we are going to get all of the directories and files in the current folder,
4:29 we'll go to them, we'll build up the full path which is to join the full path with just the names of the subitems,
4:36 then we'll see if it's a directory for now we are going to skip it, if it's a file we'll search it and if there is a match,
4:41 we'll add those in there if it's empty then it will just have no effect. Awesome, let's go and try to run this.
4:46 So on my desktop I actually have some files here that we are going to go search,
4:50 and let's see, if we go to some classics like you can see we have Dracula, The Adventures of Sherlock Holmes, things like that,
4:57 now this is the full text of the Sherlock Holmes book and I got this from Project Gutenberg and there is a bunch of other ones,
5:04 we'll go through in a little bit, but these are actually the full files so you'll see this is 160Kb that's 777 Kb of text that is a beast of a book,
5:15 Ulysses is 1.5 Mb, so we are going to go through all of these files here, notice we are not going to get into the classics yet,
5:24 because we are just looking at the files. We are skipping a case where it's a directory, so let's go and search here, right,
5:32 so we are going to search that books folder which is full of those text books and the phrase we want to search for is let's say "friends",
5:43 we'll that's pretty fantastic, except for we forgot to print out the results, let's do that really quick.
5:54 Ok, so we'll capture the matches and for each one int here we'll just print this out. Try again.
5:58 Search for friends and you can see we found some results, look at that. So we went through all the various text files and we have printed them out,
6:06 now this is not a very helpful answer, but at least you can see that it's working. Let's just do a little bit of a sanity check,
6:17 "And Harry of the six wives' daughters and the lady friends from-" "Your friends are inside," remember, we are just doing substrings, so friends,
6:22 if we had searched for friend it would also find friends, we are not doing anything fancy it's just substring, but it does look like it's working,
6:29 now what line in what book did that appear in? I have no idea, so our next job is to fix that so we actually return more information about our search.


Talk Python's Mastodon Michael Kennedy's Mastodon