Managing Python Dependencies Transcripts
Chapter: Finding Quality Python Packages
Lecture: "Rules of Thumb" for Selecting a Great Package - Part 1
0:01 When you're looking to find a quality Python package
0:03 to help you out with a problem at hand, it can be a little bit overwhelming,
0:06 having to select between all of these different options.
0:09 In my time as a Python developer, I've come up with
0:12 a series of rules of thumb for selecting a great package.
0:15 I've turned this into a seven step workflow
0:18 that you can use to find and select quality Python packages.
0:21 Let me walk you through it now, step by step.
0:24 Think of this workflow as a funnel.
0:28 First, you're going to find a poll of candidate packages,
0:32 and just do a bunch of research and basically
0:35 collect as many packages as possible that could help you
0:38 with the problem at hand, and then, with each of the steps in this workflow,
0:41 we're going to successively refine this list by excluding packages,
0:46 each step will help you gather more information and give you
0:50 a better understanding of the quality of each package.
0:54 The goal of this process is to make the decision
0:56 which package to use, really really simple.
0:59 You'll be starting with this long list of candidate packages and in the beginning,
1:03 it will be almost impossible to tell
1:06 which one is the perfect package for your use case,
1:09 but as you keep narrowing down that list, by the end of this workflow,
1:12 you will have narrowed down that candidate list so much and you will have built
1:16 a great understanding of the strength and weaknesses of each library,
1:20 that making that decision is going to be very easy for you.
1:24 The ability to find and identify great Python packages,
1:27 is very helpful even if you're working on your own,
1:30 but it gets so much more powerful if you have to justify your dependency decisions
1:35 to a team of other developers or to your manager.
1:39 You can apply the same workflow and the same criteria
1:42 and use them to explain your decisions;
1:45 to give you a concrete example, you could just take this process
1:48 and as you go through it, take extensive notes
1:51 and basically compile a report about your decision
1:54 and after you went through those seven steps in the workflow,
1:57 this is going to be a pretty bulletproof report
2:00 that you can then share with your team, or your manager.
2:03 Alright, let's jump right in and then you'll learn this important skill in no time.
2:07 Let's start with step 1- finding candidate packages.
2:12 The first thing that I usually do is that I come up with a list of candidate packages
2:16 that will help me solve the problem at hand.
2:19 And there is a number of ways you can fill up that list.
2:22 In my mind, it really helps to come up with the series of options,
2:25 so that you have a base for comparison.
2:27 Now, let's talk about how you would fill up that candidate package list.
2:31 I often start out by browsing through the curated lists I told you about earlier,
2:36 so I would just open up those websites like Awesome Python,
2:39 I will try and find the matching category that is relevant to my problem
2:44 that I am trying to solve and then I'll just click through that category
2:48 checking out all the packages that are listed there.
2:51 Another option would be just to run a quick Google search
2:55 for two to five relevant keywords, imagine you are looking for a way
2:59 to upload files to Amazon's S3 service using Python.
3:03 Here is what I would do, so for that, I just open up Google
3:06 and then I would probably search for something like S3 upload Python,
3:10 you know, very focused keywords and just kind of sprinkle
3:14 the minimal set of keywords that I could think about, and I just search for that.
3:19 And then the results here are going to give me a pretty good overview,
3:23 so I probably just click through the first three results or so,
3:27 and just check out what they have to say.
3:30 Now, this question here, pretty much is what I had in mind
3:33 and looking at the first answer points me to the boto library,
3:38 so I'll probably check that out and add it to the list,
3:41 and then I do the same thing for the other top search results,
3:44 now in this case, I know from personal experience that boto is a great choice,
3:48 so the fact that we're already seeing this result is a pretty good sign.
3:53 Honestly, I found that a quick Google search can really help you out here,
3:56 it's often digging up the right content immediately pointing you
4:00 to results on Stack Overflow or on forums like Reddit or Hacker News.
4:04 So I usually do that really early on in my research process,
4:07 when I am looking for a new Python package.
4:10 You've already seen that I looked at a Stack Overflow result here,
4:14 so Stack Overflow is another great site you can use
4:18 to find recommendations for Python packages,
4:21 if you haven't used Stack Overflow before,
4:24 it's basically a question answer site for developers.
4:27 And you can search it as well, so I am just going to punch in the same keywords
4:31 that I previously searched on Google, just to see what comes up,
4:34 so by default, this will be sorted by relevance, which is kind of an opaque measure,
4:38 so often, I'll just immediately switch over to the votes tab
4:41 which will give me the most upvoted answers.
4:44 Alright, so let's check out the first answer here.
4:47 So, this is the number one upvoted answer for this question,
4:50 I am not going in to read the full question,
4:53 I just want to see what kinds of libraries and tools people recommend here.
4:56 And as I scroll down, I can immediately see that okay,
4:59 boto is another library that people recommend,
5:02 so again, this will be a pretty good indicator that I should really check out
5:06 this boto library because it just keeps popping up again and again.
5:09 Another great recommendation for finding quality Python packages
5:15 are community forums like Reddit or Hacker News,
5:18 and sometimes you can also use Twitter like that,
5:21 let's take a look at those now.
5:24 Reddit is a community forum website that has a pretty large Python community,
5:27 you can find it at reddit.com/r/python
5:32 And reddit has a search feature as well,
5:34 again, what I would do here is I would punch in the same keywords
5:37 and then I would limit my search.
5:41 In this case, we could probably drop the Python
5:43 because we're just searching the Python forum.
5:46 So, anything S3 related will pretty much be about Python.
5:50 Alright, let's see what we got here, so this looks pretty helpful already,
5:53 one interesting bit here is that you can see when the question was submitted,
5:58 or when the form thread was created.
6:01 So you want to make sure you are not looking at super old content
6:04 for things that could change frequently.
6:06 But let's just check out this discussion here.
6:08 So this looks like this is not going to give me the answer immediately,
6:11 but I can still learn a lot about how people talk about the problem here,
6:15 what keywords they use and that could point me in the right direction
6:20 to actually find the library that does what I want or I actually find
6:23 a discussion where someone recommends a specific library
6:26 and then other people can respond to that discussion
6:29 and I can read what they have to say and that is going to give me
6:32 a pretty good idea of whether or not that library might be the right choice for me.
6:36 Another helpful community forum is Hacker News.
6:40 Now, by default Hacker News doesn't have a search function built in,
6:44 but you can use a third party search at hn.algolia.com
6:49 that can do a full tech search on comments and stories inside hacker news.
6:53 Again, let's punch in S3 upload Python and see what happens.
6:58 Alright, so looking at these results again I see boto popping up here
7:01 so this could be interesting, maybe this result is a little bit old,
7:05 but again, this could be a good way to fill up that candidate list
7:09 and identify libraries that other people recommend and use.
7:12 Even if you're not using Twitter,
7:15 just the fact that so many people share their thoughts
7:17 on Twitter all the time, can be pretty powerful
7:20 if you're looking for an answer to your programming question,
7:23 I know it sounds a little bit crazy but this works more often than you'd think,
7:27 so let's try it out, I don't know what is going to happen.
7:30 Again, I am searching for the same set of keywords,
7:32 and then I am just going to check out some of the responses here,
7:36 alright, so sometimes it's going to reference other source material like Stack Overflow,
7:40 or blog posts, okay, so this looks pretty interesting here,
7:44 this guy is talking about a script that uploads stuff to S3, so why don't we check it out.
7:49 So just looking at the code here, it looks like this guy is not using a specific library
7:53 to talk to S3, but he is using the command line tool, this aws s3 command,
7:58 so this could be another option for us to research now, maybe it's a good choice,
8:01 I don't know, I know this process is a little bit time consuming
8:04 but it's really impressive what this process can dig up.
8:08 If you do this for an hour or two, you're going to be pretty much an expert
8:11 on what's out there in terms of libraries that could help you with this job.
8:18 If you've searched all these sites and you're still not happy
8:22 with this candidate package list that you've built up,
8:25 then it might make sense to search PyPi directly,
8:28 personally, I find it a little bit hard to find stuff on PyPi
8:32 because the interface is pretty clunky, and there is very little curation.
8:37 But it might still make sense to spend a few minutes on that
8:40 and see if you can dig up something useful.
8:44 Now, another option to get those candidate packages would be
8:47 to actually ask a question on Stack Overflow or Reddit,
8:51 so on all of these sites you can create a free account,
8:55 and just start asking questions, of course, you want to be mindful
8:58 of questions that people have asked in the past,
9:01 so I recommend that you do some research first
9:03 to avoid running the danger of posting duplicate questions.
9:06 But usually, people are pretty receptive and helpful on these forums,
9:09 so it might make sense to give it a shot.
9:12 However, it's rather time intensive to write and post the question
9:15 and then having to wait a couple of hours or even days to get a response.
9:19 Now at the end of step one, you should have a list of candidate packages
9:24 that you want to do some further research on.
9:27 After you've generated a list of candidate packages,
9:31 the next step is to check out how popular these packages are.
9:36 Usually popularity is a good sign if you're looking for a Python package
9:40 because that often means that the package is well maintained it's high quality,
9:45 and you can't really go wrong with installing it and using it for your own purposes.
9:50 Now, how can you find out if a package is popular?
9:53 One way to do it would be to check out the download stats,
9:56 now you used to be able just to go to PyPi
9:59 and checkout the download stats for a package,
10:02 but this feature was removed when the PyPi architecture changed.
10:05 So right now, you can't really get those stats, they might come back in the future,
10:09 and then I think they are really good indicator,
10:11 but right now, we'll have to go with something else.
10:14 Another good popularity indicator would be
10:17 just the number of Google results and Reddit results and Stack Overflow results
10:20 or recommendations you find for a given package.
10:24 And often, this step of the research process
10:26 happens in combination with the first one,
10:28 so as you go along and search these sites,
10:30 you can take mental notes of which packages show up frequently.
10:34 And this could be really valuable information,
10:36 when you have to make a decision which one you are going to use.
10:39 If a package is hosted on GitHub, you could also check out their GitHub page
10:43 and see how many stars they have on GitHub,
10:46 so the star system on GitHub is a pretty simple voting system
10:49 where people can favorite or star repositories.
10:52 Now, if you are thinking about installing a library
10:55 that has let's say 5 thousand or 10 thousand stars, it's pretty much a no brainer.
10:59 If it only has 10 or 20 then maybe that is not a bad sign,
11:03 but it's also probably not a super popular library.
11:08 Another way to get at that information is using the python.libhunt website
11:12 and it includes a popularity indicator that is based on some other opaque values
11:17 sometimes it can be helpful to compare two packages
11:20 and just kind of see which one has more traction.
11:24 Now at the end of step two, you should have a pretty good understanding
11:27 of the relative popularity of your candidate packages.
11:30 Once you have narrowed down the list of candidate packages
11:34 I would start checking out the actual project homepages.
11:38 You could learn a lot from a project website,
11:41 things like does this website actually feel helpful,
11:44 is it answering my questions that I have as a new user,
11:48 does the website look actively maintained,
11:51 and how successful does this project look,
11:55 did someone actually spend the time to make the website helpful and nice;
11:58 let's play through this with an actual Python project website.
12:03 A great example here is the Requests library,
12:07 and right away when this site loads up,
12:10 this looks like a really high quality library,
12:13 it has its own logo here, it looks like it's supporting a bunch of Python versions
12:17 it looks like it has automated tests which is always a great thing to see,
12:22 and the project maintainer is also tracking test code coverage.
12:25 Here on the left you can see that the page has
12:30 this embedded GitHub stars indicator, and as you can tell,
12:33 the library has a high number of stars here which is usually a good sign.
12:36 What I like here as well is that the page starts with a concrete example
12:40 of what you can do with the library and what it looks like to use it.
12:44 This is great, so they even have a bunch of user testimonials
12:47 from really well known people in the Python community,
12:51 and when I scroll down further, I can see here that it has a pretty extensive user guide
12:55 that covers a number of interesting things and seems really well structured,
12:59 there is also in depth API documentation which is always a good sign.
13:04 Another sign that this is a really popular and strong library is
13:07 that it has a contributor guide with all kinds of information
13:11 about how to contribute to the project, the code style they use,
13:15 how people should report bugs, and a really small and unpopular library
13:20 is usually not going to have a need for that.
13:23 So when you see something like that, that is usually a strong sign
13:26 that the library is really popular and very successful.
13:30 And by extension that means it's usually a safe choice
13:33 for you to use that library in your own programs.
13:36 By the way, if you're wondering where to find a project's homepage,
13:41 if it has one, you can usually find the link on PyPi,
13:44 so it we'll be right here on the left and for older versions of PyPi
13:48 you will typically have to scroll all the way down
13:52 and then you can find the link to the project homepage there.
13:55 There we go, this is the homepage link for the Requests library.
13:59 At the end of step 3, after you check the couple of project homepages,
14:03 your list should have narrowed down a little bit further,
14:06 at this point you are starting to get to know these projects a lot better
14:09 and you have a good idea of how popular they are,
14:12 how well maintained they are, and whether or not you like them.
14:15 So maybe you can already start excluding some libraries
14:18 that you are not really enjoying as much.
14:20 Of course, not all libraries are going to have a dedicated website or homepage,
14:24 that doesn't automatically mean that the library is not great quality,
14:27 many Python projects don't actually have dedicated homepages,
14:31 but if there is one, it absolutely makes sense to check it out.