Python for Entrepreneurs Transcripts
Chapter: User accounts and identity
Lecture: Demo: Hashing passwords
0:03 Why should we hash our passwords? No matter how carefully you are with your database, there is a chance that it will get leaked,
0:12 whether it's from poor coding practices, like SQL injection attacks, which we're luckily relatively immune from,
0:20 because we're using an ORM with SQLAlchemy, or maybe you have a copy of the production database on your laptop
0:27 and your laptop gets stolen and you don't have your hard disk encrypted or there is a million reasons why you might lose this data,
0:35 what you'd like to do is set it up so that if you lose this data, the report might say something like this- Gawker was hacked, but don't worry,
0:42 it looks like they use all the best practices around account management and storing them into database and it's extremely unlikely
0:48 anyone can take these passwords and do anything with them other than knowing your email address, which that's really not that private, is it?
0:56 That's not what this Gawker report says. Let's jump over here and have a look.
1:03 Because this is just the way these things go and I want to really make it clear that this is super important and you got to get it just right.
1:10 I think you'll really appreciate what I am going to show you afterwards because it makes getting it right way easier.
1:17 Alright, so it says oh it looks like about a million passwords were stolen, initial attack etc etc, I don't really care, did they hash their passwords,
1:25 yeah the MySQL data contain 1.2 million accounts, half of them were kind of,
1:30 didn't have a hash at all, which meant they connected through like Facebook, or something where the password wasn't ever given to the site,
1:37 so that leaves about 750 thousand potentially vulnerable ones so they did hash the stored password, fabulous,
1:45 and they even salted them, good job Gawker. And so compared to places that don't, hey this is actually really good.
1:51 Unfortunately, neither salting nor hashing was done very well, they were done using some weak algorithm,
1:59 they were stored in some ridiculously small amount of size, etc etc etc. You don't want to be a part of this,
2:07 they have reported that 400 thousand cracked, whatever. This is not the kind of press that you want, right,
2:13 all press is good press except for this kind of press I suppose, so, how do we solve this problem? Well, we hash things correctly,
2:21 luckily there is a fantastic library called passlib and I've used passlib a number of times and it's great,
2:27 so it's a password hashing library for Python 2 and Python 3 and we're going to use it now to create password hashes
2:34 in a really excellent way on this website that we're building. So here is the passlib documentation and it's really easy to create,
2:41 notice, here we have some password "toomanysecrets", and this is the kind of thing that actually we are going to store in the database.
2:49 OK, so how do we get started? Well, we're going to get started by "pip installing" and there is a number of ways, let's go,
2:57 the place we're going to need to do the hash is down here, and let's actually create another function specifically for doing this hash.
3:10 OK so we're going to write this function hash_text, and to do that we're going to use passlib,
3:18 now notice this is not part of our requirements, so we're going to add that there, that's good,
3:23 remember, that puts it over here, let's do a little cleanup on this and you know what PyCharm- passlib is not misspelled, thanks so much.
3:30 I don't know about you, but I like to be able to scan this, so alphabetic, like that, it seems really nice. OK, so we have passlib here that's great,
3:40 I guess I already installed passlib fro whatever reason, playing around with something, but if you don't have it installed, remember,
3:46 you are going to want to make sure you install passlib. OK, so this is going to go away, we want to say AccountService.hash_text
3:58 and this will be the plain_text_password, OK, so if we write this correctly, we'll be in good shape.
4:04 Now, it turns out we don't need all the passlib, we can just get one thing from it, and we are going to go here to this thing called handlers,
4:11 sha2_crypt, and here we're going to import sha512_crypt.
4:15 There is a couple of options in our passlib, if we go to "getting started", "walkthroughs" I think,
4:22 the first thing is if we find the way to "New Application Quickstart Guide",
4:25 you'll see it's the first thing it's choosing the hash, so it says these are four good choices, four different algorithms you can choose, becrypt
4:33 becrypt is probably the best one, although I've seen people have problems installing it on different operating systems and whatnot,
4:41 so I am not going to use becrypt for this example, you can if you like, if you get it working, perfect. If not, we're going to use 512, sha512 hash
4:49 here, which is very strong, actually sorry, we're using this one. What's really cool about this is they actually give you,
4:56 they keep track at this and say all this four hashes share the following properties,
5:00 there are no known voulnerabilities, it's widely documented and reviewed algorithms,
5:04 public domain and so on, there is a few other things that we'll come back to, right. It works across the number of OSs and applications.
5:11 Really really nice, we don't have to worry about this, and they even keep track of it for us.
5:17 What do we have to do here to write this, to implement all these best practices? We are going to need to take the password, we're going to hash it,
5:25 we don't have to just hash it once, because even though that does obscure the password,
5:29 it can be guessed really easily these days with GPU-based password crackers and all sorts of things like that, so you want to do this over and over
5:37 until this becomes computationally expensive to guess, not too much for you to log in, but computationally expensive to guess.
5:45 So we're going to take that and hash it over and over and over iteratively, something like a 150 thousand times and then we are going to take that,
5:52 the input is going to consist of the original password and some salt
5:57 and choosing a strong algorithm, all of those things we need to keep track of using this.
6:03 Let me show you how that works, let's go down to our hash password, OK, so we are going to return, let me just hold it for a minute.
6:11 Now here is what we are going to do, we are going to go to this and we are going to say "encrypt", and what are we going to give it?
6:16 plain text password. Problem solved. Alright, one thing we probably want to set is a number of rounds, and we're going to set it to a 150 thousand.
6:28 Python 3.6 we could write this, and make it really obvious, we are not running Python 3.6 it's almost out but it's not yet.
6:35 What this tells us to do, is not just take the password plus the salt and hash it with this strong algorithm but then fold that over
6:41 and do it again and again a 150 thousand times. So I would say you know, do this until it takes a little bit of time to log in,
6:49 do it so, maybe it takes a tenth of a second computationally to do this, right
6:55 you are going to do this when they log in as well as when we create the account.
6:58 And, well, we can go back and just look in our database when this works, so let's one more time create an account with proper hashing, as you saw,
7:07 I better use a different email address, I am going to put in test, just the word test, off it goes, that worked perfectly,
7:14 and very quickly, and so I didn't notice any slowness and like "why is this thing lagging",
7:19 like it felt instant, if I go over here and look at my table and I refresh this, we now have a new thing and instead of HASH:test we have this,
7:28 let me put this over here just to show you, look at that puppy, OK, so it gives you a little bit information about the algorithm
7:37 so that it can reverify the number of rounds because over time, this is great but in five years, 150 thousand might not be fast enough,
7:45 maybe we want 1.5 million. So this lets you over time upgrade your account any time somebody logs in
7:52 we see oh this is actually an old one, we want to add some more rounds to it, you could recompute their hash if it validates.
7:58 And then, in here we've got the password plus the salt, all over there. Right, so if this gets leaked to the internet,
8:06 determining if that is the word test is not going to be simple, it's going to be simple, it's going to be much much harder
8:12 than if you just had put in the database of course, it's going to be much harder than if you had just hashed it once,
8:18 because it's doing it a 150 thousand times, and because it has the salt
8:22 it's not clear that it's just four characters, it was probably much more than that. And the salt was randomly generated along with this.
8:30 OK, excellent, now final thing is, we've created our accounts how do we test a log in, alright so let's do that. Let's start at the controller level.
8:42 So up here we have a "signin", "hello, sign in", great, and now we need to validate this,
8:48 so let's just say this AccountService, so we are going to say get_authenticated_account
8:53 and of course, we're going to pass the email, remember, that's our user name,
8:57 and we are going to pass the password, and then we'll say "if not account", right,
9:01 either we got an account or we didn't, we'll figure out how we do this in a second,
9:04 we are going to set an error, so we'll just set the error and return to dict, stay on that page.
9:11 Otherwise, we are going to do something like return self.redirect to /account or wherever we want to go, right,
9:19 this is just going to be our indicator that everything worked, so let's try this method,
9:23 you'll see as easy it was to create the hash, is also as easy to validate it.
9:29 Again, this is plain text password, now I can't just reencrypt the password
9:35 and then test it against the database, because it actually contains randomly generated salt,
9:40 so every time you encrypt it you get a different answer even for the same plain text,
9:45 but, what I can do is part of that big string of text stores all the things it needs to- basically to do that internally without regenerating stuff.
9:56 So what I can come down here and do is first I have to get the account,
10:00 so I'll say find account by email address, right, so that is going to be really easy; does the account exist? Yes or no, and then if it doesn't,
10:08 then we're going to return nothing, but if the account exists, what I need to do
10:13 is actually validate that this plain text run through the same algorithm with the same salt, generates the same crazy character set.
10:20 How do I do that? I say here and I say verify, and I give it the secret, which is just going to be the plain text password
10:28 and the hash is on the account, like so. Right, lets just remind you over here, it's this, it's that field, alright.
10:38 So we are just going to return and this returns True and False so let's say False. Actually, we are going to want the account from this
10:47 so we'll say "if not that...then return account". OK, let's make sure we get this right, we get the account by email address,
10:56 and if we have no account under that email, then we're done, then we're going to verify with the same information, let passlib handle that for us,
11:05 if that doesn't work, we are going to return no account. If it does, return account.
11:09 Alright, let's try it, are you ready to sign in to our first account, OK, sign in, let's see,
11:14 they both have the same password, I am going to try this one and I am going to put first something wrong, I am going to put the word cat,
11:21 cat is not the password bam, look at that, error email address or password incorrect.
11:26 Let me say test, bam, you are logged in and it felt instant to me, I mean, you know how it comes across in the video
11:33 but I am trying to be a little bit loud wacking on the keyboard so you can hear it. So let's do this one, you have to be careful, remember,
11:39 this one, its password wasn't generated with the sha or whatever, so this one I just put in #something and it is going to crash if I try it,
11:48 if I do this one, I'll just log in again, test 1, 2, 3, I'll make a lot noise here. How quick it is, nice and speedy, it's fine, remember.
11:56 OK, so that is how we manage the accounts, the final thing that we have to see is how do we actually indicate that
12:02 in our website that people are logged in across more than just one request and the trick to that is going to be cookies.