Managing Python Dependencies Transcripts
Chapter: Finding Quality Python Packages
Lecture: "Rules of Thumb" for Selecting a Great Package - Part 2
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
Not all Python packages are going to have a dedicated project homepage. But what every project should have is some form of README file,
0:10
that introduces you to the project. So I always check those too. And what I like to see here, is that
0:17
I want the README to cover the basics of the project, what does the library do, and how do I install it.
0:24
You could learn a lot about the quality of a library, by looking at how the maintainers communicate the value that the library provides,
0:31
I also want to know what license the project is under, because that could really influence in what circumstances
0:37
you can actually use the project and then, it makes sense to quickly check who the author is, is it a group of people,
0:44
is it a company, is it an individual contributor, what have they done in the past, and do they seem trustworthy?
0:50
Let's take a look at a real project README now. Alright, I am going to try and find the README file for the Reqests library now.
0:58
So, typically, what you'd be looking for is a link to the project source repository so it already looks like this is hosted on GitHub here,
1:06
so I am just going to look for a link. Alright, there we go, requests at GitHub, so that should be the link to the GitHub project
1:15
where we can check out the project README, yes, this is it, so when I scroll down this is where GitHub displays the README file,
1:20
and for other source control websites like Bitbucket, they will either display the readme in the same fashion
1:26
or you can view it by finding the README file and then clicking directly on that.
1:30
Now the first thing that I can see here is that this looks really similar to the project homepage which isn't a bad thing,
1:36
I mean, this contains all of the information that I wanted to get out of the README and it looks like it's really well structured and nicely formatted.
1:42
So this is great, this tells me how to install the library, and it looks really simple, it's pointing me to the documentation
1:49
and it also tells me how to contribute to the project. If you're wondering what should go into a great README file, I wrote this article a while ago,
1:57
about how you can write a great README for a github project, I am covering a number of things here that in my opinion
2:02
should go into a great README, for example, it should talk about what the project actually does, how to install it,
2:08
some example usage, how someone could set up a development environment, some link to a change log, and then also license and author info,
2:17
you can check out the full article in the link that you see here. Now let's go back to the Requests README. I said that I'd like to know
2:29
under which license a library was published, so let's find that out now. Usually where you can find that information is in a license file,
2:35
at the root of the repository. So this tells us that Requests is under the Apache license, a popular open source license; if you're wondering
2:43
what the most common open source licenses are, and what their terms are there is a really great website you should check out.
2:49
Go to choosealicence.com/licenses and they have great simple and human readable explanations of the conditions and permissions
2:59
and limitations in the most popular open source licenses. So for example, this is the Apache license used by Requests,
3:06
and this gives us a quick overview over the terms of the license, without actually having to drill down into the nitty gritty details.
3:14
Another thing that I'd like to know is who the authors are, who wrote a library. Now, typically, in an open source library,
3:22
you can find an AUTHORS file that will list all the contributors, again here with Requests you can get a really quick overview
3:30
of who the core maintainers are, and then there was apparently a whole bunch of people who have submitted patches over time,
3:37
and this is a great sign because it means you have a project leadership and then you also have a large group of people who are dedicating patches
3:43
and contributions to the project. We could also check out the GitHub user account that hosts the Request library,
3:50
and in this case, it's Kenneth Reitz and you can see that Kenneth has a number of very popular libraries in the Python space,
3:57
he is working for respectable and well known company and these are all indicators that Requests is a really great library.
4:05
At the end of step 4, maybe the field has narrowed down a little bit further, every Python library should have a good project README,
4:13
and I find it helpful to familiarize myself with the licensing terms for the project, and the team of people working on or maintaining the library.
4:24
In step 5, you're going to make sure that the project is actively maintained. In my mind, this is one of the most important quality indicators,
4:32
now how can you find out if a project is under active development, usually a great way to find that information is to check out
4:40
the project changelog and the update history. You could either do that directly on PyPi or by checking the project source repository,
4:49
also on the source repository you can usually find a bug tracker for the project.
4:53
Now this can tell you a lot about how the project is being maintained. Are there discussions going on, are there many open tickets for severe bugs?
5:03
If there are no tickets, than that is usually not a great sign either, because in my experience, any project that gets some traction,
5:09
has a flood of bug reports coming in; now I would recommend that you skim through some of those bug reports,
5:16
just to make sure that there isn't some large problem with the project that would affect your ability to use it properly.
5:21
Another piece of information you can find directly on the source repository is when the last commit to the project happened.
5:28
Now you don't want to discount projects that do not have a lot of development activity going on at the moment,
5:34
I'd rather pick a well seasoned project that is also well maintained or at least not abandoned over one that's super maintained but also brand new,
5:44
because then you don't really know what the future holds, maybe the project is going to get abandoned in a few months,
5:50
and then you're stuck with it, whereas a seasoned library that still does its job properly but it's not getting a lot of feature updates,
5:56
could still be totally worth your while, there is nothing wrong with an older library that does its job really well.
6:01
At the end of step 5, your list of candidates projects will likely have narrowed down further and this is a good thing,
6:07
the more projects you can weed out, the easier it will be to pick the perfect library for your usecase. You are almost done here. In step 6,
6:17
you would spot check the source code of the project. I always like to look under the hood of a library that I am going to use in my own programs.
6:27
And usually, this is really easy to do if you're dealing with an open source project,
6:30
you just open the project repository website and browse around in the code a little bit. Here is what I like to see.
6:37
Does the code follow community best practices, for example, does it have a consistent formatting style,
6:43
are there comments in the code, are there docstrings, stuff like that, another hugely important topic for me is whether or not
6:51
the code has automated test coverage, in my mind, a good quality Python package will always have an automated test suite.
6:58
Looking at the code will also give you a good idea of how experienced the developers were who wrote the library;
7:04
often you can tell at a glance whether it was someone who had a deep understanding of Python who wrote a library,
7:10
or if it was someone who was maybe coming from an entirely different language background and was just kind of told to write a Python library.
7:18
Now, this doesn't automatically mean disaster, but it's still a really good quality indicator. In the end,
7:24
it all boils down to the question would you feel comfortable making small changes to this library if you had to?
7:31
Because that is what the worst case scenario is. Imagine you are building a really successful application that is using a particular library and then
7:38
the original authors of the library stop maintaining it. Well, if you don't want to give up your project,
7:44
it will pretty much come down to you maintaining this library, at least enough so you can use it for your own purposes.
7:50
This is something that I always try to keep in the back of my head when I make a decision whether to use one library or another.
7:56
Alright, let's take a look at what this looks like in practice. So I am back here looking at the GitHub repository for the Requests library.
8:06
And that gives me a really easy way to browse through the library source code,
8:09
so I don't even have to install it, I can just use the GitHub code viewer and browse around and I don't need to pull this over into my own editor.
8:18
So what I would do here is try and find the main directory where all the source files live in, and in this case,
8:24
it's the requests folder so typically this would be named after the library, and you can see here there is a bunch of Python files in there.
8:32
This seems pretty well structured already and you can also see there is a lot of activity here so these are being updated all the time.
8:40
Now let's check out one of those files. For example, the cookies.py file, that sounds tasty. And I would just spend some time reading that code,
8:48
so things that I immediately like here is that there is docstrings, the imports are nicely formatted, you can see here
8:55
the classes seem like they are named properly, again, there are extensive docstrings for everything,
9:03
this class here with these methods on it, they seem well structured, right, there is not this crazy long like a thousand lines methods here.
9:14
This is all pretty nice and tidy and when I scroll further through the file,
9:18
it all just seems like it's following a structure and it's formatted in the way that makes it easy on the eyes, and that is usually a really good sign,
9:29
like imagine you have to maintain this code, personally I would much rather work with code that looks like this, than some convoluted mess.
9:37
And you can see here it seems to adhere to the PEP 8 formatting style which I think is also a good sign
9:43
because if you are also using PEP 8 or something similar, than this library code is going to look similar to your application code,
9:48
which also helps maintenance. Yeah, so I would say this looks pretty good, let's see if we can find some tests.
9:58
Okay, so there is a tests folder here, and again, it looks like there are whole bunch of tests here, so let's check out the test_structures, alright,
10:11
so they are using pytest which is a library that personally I like a lot so this would be a good sign for me, first of all I love the fact
10:19
they have an automated test suite here and just glancing over those tests, I mean, they seem pretty reasonable, right,
10:31
they seem like they are actually testing stuff, they are not just placeholders or dummy tests they are actually doing some things.
10:37
Now, usually I wouldn't do like a full code review for a library that I want to use, but I just want to do some spot checking to get an idea
10:46
of the code quality for that library, because, in the worst case scenario I might actually have to do some maintenance work on this library,
10:53
if someone stops maintaining it and it's an important part of my own application, then I would be pretty much responsible for keeping this thing alive
11:01
so that I can continue to use it. So this is always something that is in the back of my head;
11:07
of course, Requests here passes that test with flying colors, and seeing how popular that library is, it's probably going to be maintained
11:13
for a really long time so I wouldn't be too worried about this, but, of course it helps that it has great code quality too.
11:20
Okay you made it all the way to step 7, and this is the last step in this workflow.
11:27
So at this point, you would have a much narrow down list of candidates, and now it's time to try out a few of them.
11:34
So at this point, I would go over my notes and my memories, and take this narrow down list of candidates and just start installing
11:40
some of them to try them out, in an interpreter session, and I am always suing a fresh virtual environment for that
11:47
so that I am not cluttering up my system. I would encourage you to do the same, and then you can just launch
11:52
into an interpreter session, import the library, and play with it a little bit. Or you might write a couple of small programs
11:59
just to get a feel for how the library works, so for example, with Requests, maybe I would write a little program
12:06
that downloads a file over HTTP and then I would try and implement the same example with a different library to get a feel
12:13
for what the strength and weaknesses are of each of them. Now actually, installing the library and then trying it out is
12:20
going to tell you something very important; it's going to tell you whether the package installs and imports cleanly, because,
12:27
at the end of the day that is super important, even if you have the best library for your purpose and it's so painful to install
12:34
or it doesn't work on your system, then that is not going to help you. So I always make sure to actually get some hands on experience
12:41
with my top 3 choices or so, so that I can be confident into decision that I make. Another very important question is
12:48
whether or not you enjoy working with the package. I strongly believe that developers should use tools that they enjoy working with,
12:54
and this also applies to third party packages and modules and libraries. So for me, this would always factor into the decision,
13:03
now I realize that there might be business constraints and sometimes you just have to work with something that you are not enjoying as much.
13:09
But if there is a way to get the best of both worlds, a really great library that is actually fun to work with,
13:14
I would always pick the one that is fun to work with and gets the job done.