Managing Python Dependencies Transcripts
Chapter: Finding Quality Python Packages
Lecture: "Rules of Thumb" for Selecting a Great Package - Part 1
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
When you're looking to find a quality Python package to help you out with a problem at hand, it can be a little bit overwhelming,
0:07
having to select between all of these different options. In my time as a Python developer, I've come up with
0:13
a series of rules of thumb for selecting a great package. I've turned this into a seven step workflow
0:19
that you can use to find and select quality Python packages. Let me walk you through it now, step by step. Think of this workflow as a funnel.
0:29
First, you're going to find a poll of candidate packages, and just do a bunch of research and basically
0:36
collect as many packages as possible that could help you with the problem at hand, and then, with each of the steps in this workflow,
0:42
we're going to successively refine this list by excluding packages, each step will help you gather more information and give you
0:51
a better understanding of the quality of each package. The goal of this process is to make the decision which package to use, really really simple.
1:00
You'll be starting with this long list of candidate packages and in the beginning, it will be almost impossible to tell
1:07
which one is the perfect package for your use case, but as you keep narrowing down that list, by the end of this workflow,
1:13
you will have narrowed down that candidate list so much and you will have built a great understanding of the strength and weaknesses of each library,
1:21
that making that decision is going to be very easy for you. The ability to find and identify great Python packages,
1:28
is very helpful even if you're working on your own, but it gets so much more powerful if you have to justify your dependency decisions
1:36
to a team of other developers or to your manager. You can apply the same workflow and the same criteria and use them to explain your decisions;
1:46
to give you a concrete example, you could just take this process and as you go through it, take extensive notes
1:52
and basically compile a report about your decision and after you went through those seven steps in the workflow,
1:58
this is going to be a pretty bulletproof report that you can then share with your team, or your manager.
2:04
Alright, let's jump right in and then you'll learn this important skill in no time. Let's start with step 1- finding candidate packages.
2:13
The first thing that I usually do is that I come up with a list of candidate packages that will help me solve the problem at hand.
2:20
And there is a number of ways you can fill up that list. In my mind, it really helps to come up with the series of options,
2:26
so that you have a base for comparison. Now, let's talk about how you would fill up that candidate package list.
2:32
I often start out by browsing through the curated lists I told you about earlier, so I would just open up those websites like Awesome Python,
2:40
I will try and find the matching category that is relevant to my problem that I am trying to solve and then I'll just click through that category
2:49
checking out all the packages that are listed there. Another option would be just to run a quick Google search
2:56
for two to five relevant keywords, imagine you are looking for a way to upload files to Amazon's S3 service using Python.
3:04
Here is what I would do, so for that, I just open up Google and then I would probably search for something like S3 upload Python,
3:11
you know, very focused keywords and just kind of sprinkle the minimal set of keywords that I could think about, and I just search for that.
3:20
And then the results here are going to give me a pretty good overview, so I probably just click through the first three results or so,
3:28
and just check out what they have to say. Now, this question here, pretty much is what I had in mind
3:34
and looking at the first answer points me to the boto library, so I'll probably check that out and add it to the list,
3:42
and then I do the same thing for the other top search results, now in this case, I know from personal experience that boto is a great choice,
3:49
so the fact that we're already seeing this result is a pretty good sign. Honestly, I found that a quick Google search can really help you out here,
3:57
it's often digging up the right content immediately pointing you to results on Stack Overflow or on forums like Reddit or Hacker News.
4:05
So I usually do that really early on in my research process, when I am looking for a new Python package.
4:11
You've already seen that I looked at a Stack Overflow result here, so Stack Overflow is another great site you can use
4:19
to find recommendations for Python packages, if you haven't used Stack Overflow before, it's basically a question answer site for developers.
4:28
And you can search it as well, so I am just going to punch in the same keywords that I previously searched on Google, just to see what comes up,
4:35
so by default, this will be sorted by relevance, which is kind of an opaque measure, so often, I'll just immediately switch over to the votes tab
4:42
which will give me the most upvoted answers. Alright, so let's check out the first answer here.
4:48
So, this is the number one upvoted answer for this question, I am not going in to read the full question,
4:54
I just want to see what kinds of libraries and tools people recommend here. And as I scroll down, I can immediately see that okay,
5:00
boto is another library that people recommend, so again, this will be a pretty good indicator that I should really check out
5:07
this boto library because it just keeps popping up again and again. Another great recommendation for finding quality Python packages
5:16
are community forums like Reddit or Hacker News, and sometimes you can also use Twitter like that, let's take a look at those now.
5:25
Reddit is a community forum website that has a pretty large Python community, you can find it at reddit.com/r/Python
5:33
And reddit has a search feature as well, again, what I would do here is I would punch in the same keywords and then I would limit my search.
5:42
In this case, we could probably drop the Python because we're just searching the Python forum.
5:47
So, anything S3 related will pretty much be about Python. Alright, let's see what we got here, so this looks pretty helpful already,
5:54
one interesting bit here is that you can see when the question was submitted, or when the form thread was created.
6:02
So you want to make sure you are not looking at super old content for things that could change frequently.
6:07
But let's just check out this discussion here. So this looks like this is not going to give me the answer immediately,
6:12
but I can still learn a lot about how people talk about the problem here, what keywords they use and that could point me in the right direction
6:21
to actually find the library that does what I want or I actually find a discussion where someone recommends a specific library
6:27
and then other people can respond to that discussion and I can read what they have to say and that is going to give me
6:33
a pretty good idea of whether or not that library might be the right choice for me. Another helpful community forum is Hacker News.
6:41
Now, by default Hacker News doesn't have a search function built in, but you can use a third party search at hn.algolia.com
6:50
that can do a full tech search on comments and stories inside hacker news. Again, let's punch in S3 upload Python and see what happens.
6:59
Alright, so looking at these results again I see boto popping up here so this could be interesting, maybe this result is a little bit old,
7:06
but again, this could be a good way to fill up that candidate list and identify libraries that other people recommend and use.
7:13
Even if you're not using Twitter, just the fact that so many people share their thoughts on Twitter all the time, can be pretty powerful
7:21
if you're looking for an answer to your programming question, I know it sounds a little bit crazy but this works more often than you'd think,
7:28
so let's try it out, I don't know what is going to happen. Again, I am searching for the same set of keywords,
7:33
and then I am just going to check out some of the responses here,
7:37
alright, so sometimes it's going to reference other source material like Stack Overflow, or blog posts, okay, so this looks pretty interesting here,
7:45
this guy is talking about a script that uploads stuff to S3, so why don't we check it out.
7:50
So just looking at the code here, it looks like this guy is not using a specific library
7:54
to talk to S3, but he is using the command line tool, this aws s3 command,
7:59
so this could be another option for us to research now, maybe it's a good choice, I don't know, I know this process is a little bit time consuming
8:05
but it's really impressive what this process can dig up. If you do this for an hour or two, you're going to be pretty much an expert
8:12
on what's out there in terms of libraries that could help you with this job. If you've searched all these sites and you're still not happy
8:23
with this candidate package list that you've built up, then it might make sense to search PyPi directly,
8:29
personally, I find it a little bit hard to find stuff on PyPi because the interface is pretty clunky, and there is very little curation.
8:38
But it might still make sense to spend a few minutes on that and see if you can dig up something useful.
8:45
Now, another option to get those candidate packages would be to actually ask a question on Stack Overflow or Reddit,
8:52
so on all of these sites you can create a free account, and just start asking questions, of course, you want to be mindful
8:59
of questions that people have asked in the past, so I recommend that you do some research first
9:04
to avoid running the danger of posting duplicate questions. But usually, people are pretty receptive and helpful on these forums,
9:10
so it might make sense to give it a shot. However, it's rather time intensive to write and post the question
9:16
and then having to wait a couple of hours or even days to get a response. Now at the end of step one, you should have a list of candidate packages
9:25
that you want to do some further research on. After you've generated a list of candidate packages,
9:32
the next step is to check out how popular these packages are. Usually popularity is a good sign if you're looking for a Python package
9:41
because that often means that the package is well maintained it's high quality,
9:46
and you can't really go wrong with installing it and using it for your own purposes. Now, how can you find out if a package is popular?
9:54
One way to do it would be to check out the download stats, now you used to be able just to go to PyPi and checkout the download stats for a package,
10:03
but this feature was removed when the PyPi architecture changed. So right now, you can't really get those stats, they might come back in the future,
10:10
and then I think they are really good indicator, but right now, we'll have to go with something else. Another good popularity indicator would be
10:18
just the number of Google results and Reddit results and Stack Overflow results or recommendations you find for a given package.
10:25
And often, this step of the research process happens in combination with the first one, so as you go along and search these sites,
10:31
you can take mental notes of which packages show up frequently. And this could be really valuable information,
10:37
when you have to make a decision which one you are going to use. If a package is hosted on GitHub, you could also check out their GitHub page
10:44
and see how many stars they have on GitHub, so the star system on GitHub is a pretty simple voting system
10:50
where people can favorite or star repositories. Now, if you are thinking about installing a library
10:56
that has let's say 5 thousand or 10 thousand stars, it's pretty much a no brainer. If it only has 10 or 20 then maybe that is not a bad sign,
11:04
but it's also probably not a super popular library. Another way to get at that information is using the Python.libhunt website
11:13
and it includes a popularity indicator that is based on some other opaque values sometimes it can be helpful to compare two packages
11:21
and just kind of see which one has more traction. Now at the end of step two, you should have a pretty good understanding
11:28
of the relative popularity of your candidate packages. Once you have narrowed down the list of candidate packages
11:35
I would start checking out the actual project homepages. You could learn a lot from a project website,
11:42
things like does this website actually feel helpful, is it answering my questions that I have as a new user, does the website look actively maintained,
11:52
and how successful does this project look, did someone actually spend the time to make the website helpful and nice;
11:59
let's play through this with an actual Python project website. A great example here is the Requests library, and right away when this site loads up,
12:11
this looks like a really high quality library, it has its own logo here, it looks like it's supporting a bunch of Python versions
12:18
it looks like it has automated tests which is always a great thing to see, and the project maintainer is also tracking test code coverage.
12:26
Here on the left you can see that the page has this embedded GitHub stars indicator, and as you can tell,
12:34
the library has a high number of stars here which is usually a good sign. What I like here as well is that the page starts with a concrete example
12:41
of what you can do with the library and what it looks like to use it. This is great, so they even have a bunch of user testimonials
12:48
from really well known people in the Python community, and when I scroll down further, I can see here that it has a pretty extensive user guide
12:56
that covers a number of interesting things and seems really well structured, there is also in depth API documentation which is always a good sign.
13:05
Another sign that this is a really popular and strong library is that it has a contributor guide with all kinds of information
13:12
about how to contribute to the project, the code style they use, how people should report bugs, and a really small and unpopular library
13:21
is usually not going to have a need for that. So when you see something like that, that is usually a strong sign
13:27
that the library is really popular and very successful. And by extension that means it's usually a safe choice
13:34
for you to use that library in your own programs. By the way, if you're wondering where to find a project's homepage,
13:42
if it has one, you can usually find the link on PyPi, so it we'll be right here on the left and for older versions of PyPi
13:49
you will typically have to scroll all the way down and then you can find the link to the project homepage there.
13:56
There we go, this is the homepage link for the Requests library. At the end of step 3, after you check the couple of project homepages,
14:04
your list should have narrowed down a little bit further, at this point you are starting to get to know these projects a lot better
14:10
and you have a good idea of how popular they are, how well maintained they are, and whether or not you like them.
14:16
So maybe you can already start excluding some libraries that you are not really enjoying as much.
14:21
Of course, not all libraries are going to have a dedicated website or homepage, that doesn't automatically mean that the library is not great quality,
14:28
many Python projects don't actually have dedicated homepages, but if there is one, it absolutely makes sense to check it out.