Managing Python Dependencies Transcripts
Chapter: Finding Quality Python Packages
Lecture: "Rules of Thumb" for Selecting a Great Package - Part 2
0:01 Not all Python packages are going to have a dedicated project homepage.
0:04 But what every project should have is some form of README file,
0:09 that introduces you to the project. So I always check those too.
0:13 And what I like to see here, is that
0:16 I want the README to cover the basics of the project,
0:19 what does the library do, and how do I install it.
0:23 You could learn a lot about the quality of a library,
0:25 by looking at how the maintainers communicate the value that the library provides,
0:30 I also want to know what license the project is under,
0:33 because that could really influence in what circumstances
0:36 you can actually use the project and then, it makes sense to quickly check
0:40 who the author is, is it a group of people,
0:43 is it a company, is it an individual contributor,
0:46 what have they done in the past, and do they seem trustworthy?
0:49 Let's take a look at a real project README now.
0:53 Alright, I am going to try and find the README file for the Reqests library now.
0:57 So, typically, what you'd be looking for is a link to the project source repository
1:02 so it already looks like this is hosted on GitHub here,
1:05 so I am just going to look for a link.
1:09 Alright, there we go, requests at GitHub,
1:12 so that should be the link to the GitHub project
1:14 where we can check out the project README, yes, this is it,
1:16 so when I scroll down this is where GitHub displays the README file,
1:19 and for other source control websites like Bitbucket,
1:22 they will either display the readme in the same fashion
1:25 or you can view it by finding the README file and then clicking directly on that.
1:29 Now the first thing that I can see here is that this looks really similar
1:32 to the project homepage which isn't a bad thing,
1:35 I mean, this contains all of the information that I wanted to get out of the README
1:38 and it looks like it's really well structured and nicely formatted.
1:41 So this is great, this tells me how to install the library, and it looks really simple,
1:46 it's pointing me to the documentation
1:48 and it also tells me how to contribute to the project.
1:52 If you're wondering what should go into a great README file,
1:54 I wrote this article a while ago,
1:56 about how you can write a great README for a github project,
1:59 I am covering a number of things here that in my opinion
2:01 should go into a great README, for example, it should talk about
2:05 what the project actually does, how to install it,
2:07 some example usage, how someone could set up a development environment,
2:11 some link to a change log, and then also license and author info,
2:16 you can check out the full article in the link that you see here.
2:24 Now let's go back to the Requests README. I said that I'd like to know
2:28 under which license a library was published, so let's find that out now.
2:31 Usually where you can find that information is in a license file,
2:34 at the root of the repository.
2:37 So this tells us that Requests is under the Apache license,
2:39 a popular open source license; if you're wondering
2:42 what the most common open source licenses are,
2:44 and what their terms are there is a really great website you should check out.
2:48 Go to choosealicence.com/licenses and they have great simple
2:54 and human readable explanations of the conditions and permissions
2:58 and limitations in the most popular open source licenses.
3:02 So for example, this is the Apache license used by Requests,
3:05 and this gives us a quick overview over the terms of the license,
3:09 without actually having to drill down into the nitty gritty details.
3:13 Another thing that I'd like to know is who the authors are,
3:17 who wrote a library. Now, typically, in an open source library,
3:21 you can find an AUTHORS file that will list all the contributors,
3:24 again here with Requests you can get a really quick overview
3:29 of who the core maintainers are, and then there was apparently
3:33 a whole bunch of people who have submitted patches over time,
3:36 and this is a great sign because it means you have a project leadership
3:39 and then you also have a large group of people who are dedicating patches
3:42 and contributions to the project.
3:45 We could also check out the GitHub user account that hosts the Request library,
3:49 and in this case, it's Kenneth Reitz and you can see that Kenneth has
3:52 a number of very popular libraries in the Python space,
3:56 he is working for respectable and well known company
3:59 and these are all indicators that Requests is a really great library.
4:04 At the end of step 4, maybe the field has narrowed down a little bit further,
4:09 every Python library should have a good project README,
4:12 and I find it helpful to familiarize myself with the licensing terms for the project,
4:17 and the team of people working on or maintaining the library.
4:23 In step 5, you're going to make sure that the project is actively maintained.
4:27 In my mind, this is one of the most important quality indicators,
4:31 now how can you find out if a project is under active development,
4:35 usually a great way to find that information is to check out
4:39 the project changelog and the update history.
4:42 You could either do that directly on PyPi
4:45 or by checking the project source repository,
4:48 also on the source repository you can usually find a bug tracker for the project.
4:52 Now this can tell you a lot about how the project is being maintained.
4:57 Are there discussions going on, are there many open tickets for severe bugs?
5:02 If there are no tickets, than that is usually not a great sign either,
5:05 because in my experience, any project that gets some traction,
5:08 has a flood of bug reports coming in; now I would recommend
5:12 that you skim through some of those bug reports,
5:15 just to make sure that there isn't some large problem with the project
5:18 that would affect your ability to use it properly.
5:20 Another piece of information you can find directly on the source repository
5:24 is when the last commit to the project happened.
5:27 Now you don't want to discount projects that do not have
5:30 a lot of development activity going on at the moment,
5:33 I'd rather pick a well seasoned project that is also well maintained
5:37 or at least not abandoned over one that's super maintained but also brand new,
5:43 because then you don't really know what the future holds,
5:46 maybe the project is going to get abandoned in a few months,
5:49 and then you're stuck with it, whereas a seasoned library
5:52 that still does its job properly but it's not getting a lot of feature updates,
5:55 could still be totally worth your while, there is nothing wrong
5:58 with an older library that does its job really well.
6:00 At the end of step 5, your list of candidates projects
6:03 will likely have narrowed down further and this is a good thing,
6:06 the more projects you can weed out, the easier it will be
6:09 to pick the perfect library for your usecase.
6:12 You are almost done here. In step 6,
6:16 you would spot check the source code of the project.
6:19 I always like to look under the hood
6:22 of a library that I am going to use in my own programs.
6:26 And usually, this is really easy to do if you're dealing with an open source project,
6:29 you just open the project repository website and browse around in the code a little bit.
6:34 Here is what I like to see.
6:36 Does the code follow community best practices,
6:39 for example, does it have a consistent formatting style,
6:42 are there comments in the code, are there docstrings, stuff like that,
6:46 another hugely important topic for me is whether or not
6:50 the code has automated test coverage, in my mind,
6:53 a good quality Python package will always have an automated test suite.
6:57 Looking at the code will also give you a good idea
7:00 of how experienced the developers were who wrote the library;
7:03 often you can tell at a glance whether it was someone
7:06 who had a deep understanding of Python who wrote a library,
7:09 or if it was someone who was maybe coming from
7:12 an entirely different language background
7:14 and was just kind of told to write a Python library.
7:17 Now, this doesn't automatically mean disaster,
7:20 but it's still a really good quality indicator. In the end,
7:23 it all boils down to the question would you feel comfortable
7:26 making small changes to this library if you had to?
7:30 Because that is what the worst case scenario is.
7:33 Imagine you are building a really successful application
7:35 that is using a particular library and then
7:37 the original authors of the library stop maintaining it.
7:40 Well, if you don't want to give up your project,
7:43 it will pretty much come down to you maintaining this library,
7:46 at least enough so you can use it for your own purposes.
7:49 This is something that I always try to keep in the back of my head
7:52 when I make a decision whether to use one library or another.
7:55 Alright, let's take a look at what this looks like in practice.
8:00 So I am back here looking at the GitHub repository for the Requests library.
8:05 And that gives me a really easy way to browse through the library source code,
8:08 so I don't even have to install it, I can just use the GitHub code viewer
8:12 and browse around and I don't need to pull this over into my own editor.
8:17 So what I would do here is try and find the main directory
8:20 where all the source files live in, and in this case,
8:23 it's the requests folder so typically this would be named after the library,
8:27 and you can see here there is a bunch of Python files in there.
8:31 This seems pretty well structured already and you can also see
8:34 there is a lot of activity here so these are being updated all the time.
8:39 Now let's check out one of those files.
8:41 For example, the cookies.py file, that sounds tasty.
8:44 And I would just spend some time reading that code,
8:47 so things that I immediately like here is that there is docstrings,
8:50 the imports are nicely formatted, you can see here
8:54 the classes seem like they are named properly,
8:58 again, there are extensive docstrings for everything,
9:02 this class here with these methods on it, they seem well structured, right,
9:09 there is not this crazy long like a thousand lines methods here.
9:13 This is all pretty nice and tidy and when I scroll further through the file,
9:17 it all just seems like it's following a structure and it's formatted in the way
9:24 that makes it easy on the eyes, and that is usually a really good sign,
9:28 like imagine you have to maintain this code,
9:31 personally I would much rather work with code that looks like this,
9:34 than some convoluted mess.
9:36 And you can see here it seems to adhere to the PEP 8 formatting style
9:40 which I think is also a good sign
9:42 because if you are also using PEP 8 or something similar,
9:45 than this library code is going to look similar to your application code,
9:47 which also helps maintenance.
9:52 Yeah, so I would say this looks pretty good, let's see if we can find some tests.
9:57 Okay, so there is a tests folder here, and again,
10:04 it looks like there are whole bunch of tests here,
10:06 so let's check out the test_structures, alright,
10:10 so they are using pytest which is a library that personally I like a lot
10:13 so this would be a good sign for me, first of all I love the fact
10:18 they have an automated test suite here and just glancing over those tests,
10:26 I mean, they seem pretty reasonable, right,
10:30 they seem like they are actually testing stuff, they are not just placeholders
10:33 or dummy tests they are actually doing some things.
10:36 Now, usually I wouldn't do like a full code review for a library that I want to use,
10:41 but I just want to do some spot checking to get an idea
10:45 of the code quality for that library, because, in the worst case scenario
10:49 I might actually have to do some maintenance work on this library,
10:52 if someone stops maintaining it and it's an important part of my own application,
10:56 then I would be pretty much responsible for keeping this thing alive
11:00 so that I can continue to use it.
11:03 So this is always something that is in the back of my head;
11:06 of course, Requests here passes that test with flying colors,
11:09 and seeing how popular that library is, it's probably going to be maintained
11:12 for a really long time so I wouldn't be too worried about this,
11:15 but, of course it helps that it has great code quality too.
11:19 Okay you made it all the way to step 7, and this is the last step in this workflow.
11:26 So at this point, you would have a much narrow down list of candidates,
11:31 and now it's time to try out a few of them.
11:33 So at this point, I would go over my notes and my memories,
11:36 and take this narrow down list of candidates and just start installing
11:39 some of them to try them out, in an interpreter session,
11:44 and I am always suing a fresh virtual environment for that
11:46 so that I am not cluttering up my system.
11:48 I would encourage you to do the same, and then you can just launch
11:51 into an interpreter session, import the library, and play with it a little bit.
11:55 Or you might write a couple of small programs
11:58 just to get a feel for how the library works,
12:01 so for example, with Requests, maybe I would write a little program
12:05 that downloads a file over HTTP and then I would try and implement
12:09 the same example with a different library to get a feel
12:12 for what the strength and weaknesses are of each of them.
12:16 Now actually, installing the library and then trying it out is
12:19 going to tell you something very important; it's going to tell you
12:22 whether the package installs and imports cleanly, because,
12:26 at the end of the day that is super important, even if you have the best library
12:30 for your purpose and it's so painful to install
12:33 or it doesn't work on your system, then that is not going to help you.
12:37 So I always make sure to actually get some hands on experience
12:40 with my top 3 choices or so, so that I can be confident into decision that I make.
12:45 Another very important question is
12:47 whether or not you enjoy working with the package.
12:50 I strongly believe that developers should use tools that they enjoy working with,
12:53 and this also applies to third party packages and modules and libraries.
12:57 So for me, this would always factor into the decision,
13:02 now I realize that there might be business constraints
13:04 and sometimes you just have to work with something
13:06 that you are not enjoying as much.
13:08 But if there is a way to get the best of both worlds,
13:11 a really great library that is actually fun to work with,
13:13 I would always pick the one that is fun to work with and gets the job done.