Python 3, an Illustrated Tour Transcripts
Chapter: Type Annotations
Lecture: mypy (type consistency verification)
0:00 So let's look at an example of using the mypy tool. I'm going to add typing to a little project I have, it's a markov chain,
0:09 so you can check out this GitHub repository if you want to look at it, but here's how I do it, I'm in a virtual environment
0:15 and I say pip install mypy, that's going to go out and fetch the mypy tool and I'm going to clone this GitHub repository that I have,
0:21 and I'll change into that directory, and in there, there's a file called markov.py, I'm just going to run mypy,
0:27 which gets installed as a binary when I install the mypy tool and I run that on markov.py and it will return no output.
0:35 And again, why this returns no output is because mypy supports gradual typing
0:40 it ignores code that doesn't have annotations and this code didn't have any annotations, so it's not going to have any output there.
0:47 If I want to get a little bit more ambitious, I can put -- strict after mypy that turns on a bunch of features
0:54 and I'm going to get a bunch of warnings or errors from the results here, it's going to say this function is missing a type annotation
1:00 we're calling some other functions in a type context and they're not typed. And so these are the sorts of things that mypy can find for us.
1:09 Again, note that it also supports this gradual typing and so if we leave off the strict, it's just going to ignore anything that we haven't annotated.
1:17 So here are a few hints for adding annotations, 2 ways that you can do it, you can start from the outside code that gets called
1:23 and calls other code and start calling annotating this outer code,
1:27 alternatively you can start wrapping inside code that gets called and annotating that first. Either one of those will work.
1:36 What is important for me is if I've got a public interface, I want to make sure that there's typing around it
1:42 and that it's clear what comes in and out. So I'm going to start annotating something that I think is important
1:49 and I'm going to run the mypy on some file. It might complain because it's going to start type checking where I've annotated
1:56 and then I might need to go in and fix things or add more annotations. And if I want to get ambitious again, I can use this -- strict
2:04 and that will turn on a bunch of flags and add a bunch more checks for me. But basically, after I've gone through this process on my markov file here,
2:13 I'll have a dif that looks something like this. So I'm going to end up importing from the typing module the dict and list types
2:22 and I'm going to make a table result variable here or type and it's going to be this structure here.
2:30 It's going to be a dictionary that maps a string to another dictionary and inside that dictionary, we map a string to account.
2:37 So this code if you're not familiar with it, it creates a markov chain
2:41 a markov chain takes input and gives you some output based on what your input is, and in this case, markov chain is typically used in text prediction
2:50 or if you're typing, predicting what characters to come next and so you can feed a paragraph or a bunch of text into this
2:57 and it will be able to tell you if I have a, what comes after a, after a comes maybe p because we're spelling apple or something like that.
3:06 That's the tooling that the markov chain allows you to do. And so here in my constructor here, I've got data that's coming in
3:13 and I've got size that's an optional value here. And when I annotate that, I'm going to say data is going to be a string,
3:21 size is going to be an int and my constructor returns none. This is the way that you annotate a constructor.
3:29 Also note that I've got a variable here, an instance variable called self.tables,
3:34 and I am annotating that and that is going to be a list of table results.
3:38 So maybe you can see the reason why I made this table result variable here or type is because it makes it a little bit more clear
3:45 I would have this nested list of dictionaries of dictionaries and I can just clearly read that this is a list of table results.
3:52 Here's another method that got type annotated. So predict takes a string of input.
3:58 So we've annotated that and returns the string that's going to come after that input if we feed an a we should get p out, something like that
4:06 and you'll note that I annotated just the method parameters and the method what it returns, but there is one more annotation in here.
4:14 I didn't annotate a bunch of the variables inside of here because mypy didn't complain about those, but it did complain about this guy down here
4:22 and the reason is because I've got a variable called result that is looping over this options.items collection,
4:30 and then I'm also reusing that same variable result down later to randomly choose out of my possible guys what comes next
4:39 because I'm looping over something that might be empty, in this case result could be none and that confuses mypy
4:48 but what's really happening here is this is actually indicated that my reuse of this variable, this was a bug on my part,
4:56 I shouldn't have reused this variable name and so mypy said, well, you've either got to type it or change the name.
5:02 So in this case, I add the typing and mypy doesn't complain about it anymore.
5:06 But the correct thing to do here would be to actually change that variable name.
5:10 You could call this, this is the input and count rather than the resulting count there. Here's another example of the annotation that I added
5:20 this get table function accepts a line, which is a string and the number of characters that we're going to process as input.
5:27 So we could process a single character after a comes p, but we could also say I want to process a and p
5:32 and after a and p comes another p for apple or whatnot, if you add more memory to this markov chain,
5:38 it makes better predictions and can make sentences or paragraphs or that sort of thing.
5:42 And we're going to also say that this get table returns a table result,
5:46 recall that I defined this table result couple slides back, which is a nested dictionary here.
5:51 But again, it's a lot more readable to have this table result defined
5:55 and reuse that table result rather than throwing this nested code around all over the place table result is very clear and should make sense.
6:06 So after doing that, I think my code is more clear, it should be more clear and people who are coming to it should have a very good understanding
6:14 of what is the input and what is the output. I also found a possible bug by reusing the result variable
6:20 so I could annotate that, in retrospect I should have just renamed the variable but mypy can help you find these sorts of issues.