#100DaysOfCode in Python Transcripts
Chapter: Days 37-39: Using CSV data
Lecture: Demo: Converting our CSV data to a usable form
0:00 Now, we've actually parsed the CSV file
0:02 and we're ready to go.
0:03 But we saw that everything that comes back
0:06 is actually just a bunch of strings.
0:09 And we can't really do data analysis.
0:11 When you want to do numerical operations
0:13 or date time operations.
0:14 But the data type is a string.
0:17 So what we have to do is that we need
0:19 to write a function that will take one of these rows.
0:21 And kind of upgrade it to its types
0:23 that we know exists in there.
0:25 So let's go down here,
0:26 we're going to write a function called parse_row
0:30 And we'll give it a row
0:32 and it's going to return some kind of item.
0:35 So, first of all, let's write one
0:37 that just actually upgrades the values.
0:39 And we'll do one more step beyond that.
0:42 So, we know, if we look over here.
0:45 That we have a date, we have an actual mean
0:48 and all of these things.
0:50 So, let's start by coming over here
0:52 and we're going to upgrade the rows date.
0:56 Let's upgrade the temperature, it's a little simpler, first.
0:58 So we'll come over here and we'll say actual mean temp,
1:02 that's from the header.
1:05 Now, see, this value is actually the result
1:08 of converting it to an integer.
1:11 And we also have the actual min temp.
1:13 Now, you want to be careful here, of course
1:17 that you don't cross those over
1:18 but also, that you're using integers with the integers
1:21 and the floats where there are floats and so on.
1:24 Now, this is not fun to watch me type this out for each one,
1:27 and there's a lot of it,
1:28 it's quite tedious so let me write this out.
1:30 And then we'll come back to it.
1:35 Alright, here we are. So now we've taken everything that we found in the header
1:39 and we've done a conversion from strings to numbers.
1:42 Sometimes those are integers, sometimes those are floats
1:44 we paid careful attention to when that was the case.
1:46 If you're unsure just use floats
1:48 and then we're going to return this row
1:52 but by getting the value out
1:53 and then replacing the value with the integers
1:56 we should be able to come over here and print this.
1:57 So if I do this and, kind of what we did before
2:00 we do a little type of that.
2:02 We print those.
2:03 Now, if I write on it,
2:04 Actually... Excuse me if I do this parse_row.
2:08 And we kind of upgrade this row here
2:11 And notice these are now integers
2:14 if I change the column to the actual precipitation
2:19 you'll see these are floats.
2:22 Here they are.
2:23 Those look like floats, don't they?
2:25 Great, so we've upgraded this
2:28 and, it's already in a really good shape.
2:31 There's one other thing I want to do to make it nicer.
2:34 And that is to use a data type that's built into Python
2:38 that helps you work directly with sort of these values
2:43 in a really simple way, and it's called a namedtuple.
2:46 So let's go to the top here and we'll say
2:49 import, collections
2:51 and inside Collections there's this thing called
2:53 the namedtuple.
2:54 So we can actually define another type
2:58 something better than this basic dictionary
3:00 which is cumbersome to work with.
3:01 You got to do the .get, the value might not be there,
3:04 things like that, so let's define a record,
3:07 like a weather record, and we're going to
3:09 do that by saying, collections.namedtuple.
3:11 Now, you give it two pieces of information.
3:14 One, you...
3:17 You basically replicate this, so you tell it,
3:20 I'm calling you this, but, your name is what I'm calling you
3:24 it's sort of so it knows what its name is.
3:27 And then the next thing you give it is simply this
3:31 giant thing up here, okay, like so.
3:35 Now, PyCharm says, whoa whoa whoa, what're you doing,
3:37 that's like line, or column 200, this is insane.
3:40 So why don't you do some line wrappings,
3:43 just so people can read what's going on, right.
3:45 And this Python will turn that back into one long string
3:49 because there's no comma separating it.
3:52 These might look like they're separating it,
3:53 but they're on the inside, not the outside.
3:56 Okay, so this is going to let us define a type,
3:58 so come down here, and I could actually upgrade this,
4:01 so I could say, r equals record,
4:06 and if you look and see what it takes,
4:07 it takes a date, a temperature,
4:09 all the temperatures and so on.
4:11 And I could say date equals this,
4:13 mean temperature equals this,
4:15 min temperature equals that.
4:17 So I could say date equals row.getdate comma.
4:24 Did I say data?
4:25 Oh, no, date, yeah, date.
4:27 So date equals that and actual mean temperature equals
4:35 row.get, this is starting to feel tedious isn't it?
4:39 And we got to do this for every one of those.
4:41 It turns out these rows are dictionaries,
4:44 there's a shorthand to say this statement,
4:47 for every element in the dictionary.
4:50 So if I want to say, if date is in there,
4:52 go get the value assigned as date.
4:53 If mean, actual mean temp is in there,
4:56 go get that value and assign it to this,
4:57 and the way you do that is you say **row.
5:01 Star star row just says, well, do what I erased, right.
5:05 Set this value to the value from the dictionary,
5:07 set this argument value to the value of the dictionary,
5:09 and because what's in the dictionary
5:12 is literally what we put right here,
5:14 this is going to match exactly and this will work.
5:17 Alright, so now let's return
5:18 this little record thing instead
5:20 and we can rename it better to record.
5:24 There we go.
5:25 And so we'll call it record here.
5:28 Now we go over here record and say .,
5:30 and notice it gives us a nice,
5:31 beautiful list of all the stuff that's in there,
5:34 we don't have to do this this style over here,
5:37 I'll just say, I don't have to know it's a string
5:39 and is the actual precipitation,
5:40 I just say record.actual_precipitation.
5:45 Then into print out the value,
5:47 then we'll do it again.
5:48 So now it should work just the same, boom, it does.
5:52 Okay, whew, so we've now converted our data,
5:56 the last thing to do is to store it.
5:58 So instead of printing out, which is kind of fun,
6:00 but generally not helpful,
6:02 we're going to go to our data and say, data, append record.
6:07 Alright and just in case somebody goes
6:09 and calls this a second time, right,
6:12 we don't want to over do this, whoops not record,
6:15 we'll say data.clear.
6:17 So we'll reset the data and then we'll load the new data
6:20 from this file just in case you run it twice,
6:23 probably not going to happen.
6:25 Alright, so now we're basically ready,
6:28 let's just check and see that this worked.
6:29 Let's go over here and just print research.data,
6:33 just to see that we got something that looks meaningful.
6:37 And look at that, we did.
6:38 Record, the dataset, the mean is,
6:41 we actually didn't parse the date,
6:42 but just keeping it simple.
6:44 Actual mean temperature and so on,
6:46 you can see this goes on to the right for very long.
6:49 But it did exactly what we wanted.
6:51 So it looks like we're off to the races.
6:53 Now the final bit, actually this is the easy part,
6:56 let's answer the questions now that data
6:57 is super structured to work with.