Python 3, an Illustrated Tour Transcripts
Chapter: Numbers
Lecture: Statistics

Login or purchase this course to watch this video and the rest of the course contents.
0:01 In this video we're going to talk about the new statistics module that came out in Python 3.4, this was introduced in PEP 450.
0:08 From the PEP we read, even simple statistical calculations contain traps for the unwary,
0:13 this problem plagues users of many programming languages, not just Python as coders reinvent the same numerically inaccurate code over and over again.
0:23 Here's an example of some of the issues that someone might run into when trying to implement some numerical code.
0:29 This is a simple function for calculating the variance. That's the change of values over a sequence of numbers
0:37 how much they vary and here we are just calculating the sum of the squares minus the square of the sums and dividing by the numbers
0:48 so down below here, after we've defined variance we pass in a list of numbers and we get the variance and we say it's 2.5. It seems to be fine.
0:56 The problem is when we add a large number to that here we're adding 1e to the 13th and we're getting numbers that still should have the same variance
1:07 because the difference between them is still between 1 and 5. And when you run that into our calculation here you get a large negative number
1:16 and this illustrates some of the floating-point issues that you might run into with simple naive calculations.
1:22 And so the impetus of this PEP is to help deal with some of these issues and provide a pure Python implementation of some common statistical functions
1:31 that don't have these sorts of issues. Here we're showing an example of using the library. We simply import it, it's called statistics,
1:40 and inside of there, there are various functions. One of them is variance. We look at the variance of our same data
1:46 and we get 2.5, we add 1e to the 13th for each of those numbers and we still get 2.5. There are various functions included in here.
1:55 I'm not going to go over them, but you can look at the function and if you're dealing with statistical problems, you can use this code if you need to.
2:03 Other nice thing to do is just to use the code to look at it and glean some insights on how you might do numerical processing code in Python
2:11 and deal with some of these issues. This module is written in pure Python and so you can simply load the module up and inspect it
2:20 and see what tools and techniques they're using.


Talk Python's Mastodon Michael Kennedy's Mastodon