Python 3, an Illustrated Tour Transcripts
Chapter: Numbers
Lecture: Statistics
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
In this video we're going to talk about the new statistics module that came out in Python 3.4, this was introduced in PEP 450.
0:08
From the PEP we read, even simple statistical calculations contain traps for the unwary,
0:13
this problem plagues users of many programming languages, not just Python as coders reinvent the same numerically inaccurate code over and over again.
0:23
Here's an example of some of the issues that someone might run into when trying to implement some numerical code.
0:29
This is a simple function for calculating the variance. That's the change of values over a sequence of numbers
0:37
how much they vary and here we are just calculating the sum of the squares minus the square of the sums and dividing by the numbers
0:48
so down below here, after we've defined variance we pass in a list of numbers and we get the variance and we say it's 2.5. It seems to be fine.
0:56
The problem is when we add a large number to that here we're adding 1e to the 13th and we're getting numbers that still should have the same variance
1:07
because the difference between them is still between 1 and 5. And when you run that into our calculation here you get a large negative number
1:16
and this illustrates some of the floating-point issues that you might run into with simple naive calculations.
1:22
And so the impetus of this PEP is to help deal with some of these issues and provide a pure Python implementation of some common statistical functions
1:31
that don't have these sorts of issues. Here we're showing an example of using the library. We simply import it, it's called statistics,
1:40
and inside of there, there are various functions. One of them is variance. We look at the variance of our same data
1:46
and we get 2.5, we add 1e to the 13th for each of those numbers and we still get 2.5. There are various functions included in here.
1:55
I'm not going to go over them, but you can look at the function and if you're dealing with statistical problems, you can use this code if you need to.
2:03
Other nice thing to do is just to use the code to look at it and glean some insights on how you might do numerical processing code in Python
2:11
and deal with some of these issues. This module is written in pure Python and so you can simply load the module up and inspect it
2:20
and see what tools and techniques they're using.