Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 8: File Searcher App
Lecture: The performance problem

0:01 So let's jump over here to Windows 10 for a minute, and continue working on our app, and the reason I want to come over to Windows is

0:09 I want to explore this performance problem and the tooling on Windows is super good

0:14 for understanding the performance characteristics of individual applications rather than system wide.

0:21 So, we are going to change the problem space a little bit, we have been previously searching this books folder,

0:26 and if you go look at the properties you will see that's about 5MB of text, that's a serious quantity of text

0:33 and it's blasting through ti so that's already impressive, but let's look and differ them out.

0:37 Here we have some more books but now we have 2.27 GB of txt files, that is a ridiculous amount of text files.

0:47 You'll see that if we try to search that content, well the app actually does surprisingly well,

0:53 it really does go through and it finally has results and so on, but if we leverage this concept of generator methods and related things

0:59 that will build on other applications further down the line, we can actually do amazingly better, ok,

1:05 so just to make sure everything is working on Windows, let me just search the same stuff here, ok so we want to search c/users/mkennedy/desktop/books

1:17 and let's search for "incredible". Excellent, so it looks like we've found some inverness to from incredible age...

1:26 right, Ulysses, A Dolls' House, not too many results there, but you can see it's working. Fantastic,

1:33 now I happen to know from trying this earlier that we need to change this output here

1:39 and in fact if we print out all of this it's going to be so much output when we go through the 2 GB of files that it actually causes the problems,

1:49 the significant part of the performance is literally that print right there, so instead of doing this we are just going to go and do a count,

1:58 so we'll say match count, now right now this is a list and I could just do len of list and just print that out

2:08 but it's going to turn out when that this becomes a generator, len of generator doesn't mean the same thing,

2:16 so let me just independently keep track of the count, and we'll just say something like this, and let's put a little comma separator

2:23 and we'll do .format and match count. So let's run this one more time, ok, same place let's search for "funny" and apparently we've found

2:34 33 matches of the word "funny" and the 5 MB of text that was really quick, that's awesome, right.

2:41 But they just sort of push a little bit harder in the performance perspective,

2:44 let's search the 2.27 GB of text and we are going to search for something, maybe question mark. So let me introduce you to process explorer,

2:55 so process explorer is kind of like task manager, activity monitor, from OS X but it gets a ton of information both visually

3:04 as well as so things like performance counters on windows to tell you what is going on with the apps, and it lets us,

3:11 here is our Python app that is waiting for us to hit go, all right, here we go, our app goes, here is our sort of operating memory down here

3:18 and it's dropping into the distance so it turns out that searching for question marks in this file

3:25 there are ton of them so we are building them up into our list as we are recursively going through these files on disk.

3:32 You can see it's pretty computational heavy, pretty IO intense but really the memory is just growing and growing,

3:40 you'll see that when we get to the talking about generators that maybe this is not the way this app has to behave, right,

3:46 we can actually incorporate very minor changes into our app and get dramatically better performance at least from a memory perspective.

4:00 All right, our process has finished and we found 2.7 million question marks in those files, and look at the memory,

4:09 this is not the most amazing outcome that we could have had. It turns out it took almost 400 MB the way we implemented our algorithm,

4:18 and depending on the how we hold the data or the size of the data, it could be even worse.

Python Jumpstart by Building 10 Apps Transcripts Chapter: App 8: File Searcher App Lecture: The performance problem

Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 8: File Searcher App
Lecture: The performance problem