#100DaysOfCode in Python Transcripts
Chapter: Days 37-39: Using CSV data
Lecture: Demo: Converting our CSV data to a usable form

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Now, we've actually parsed the CSV file and we're ready to go. But we saw that everything that comes back is actually just a bunch of strings.
0:10 And we can't really do data analysis. When you want to do numerical operations or date time operations. But the data type is a string.
0:18 So what we have to do is that we need to write a function that will take one of these rows. And kind of upgrade it to its types
0:24 that we know exists in there. So let's go down here, we're going to write a function called parse_row And we'll give it a row
0:33 and it's going to return some kind of item. So, first of all, let's write one that just actually upgrades the values.
0:40 And we'll do one more step beyond that. So, we know, if we look over here. That we have a date, we have an actual mean and all of these things.
0:51 So, let's start by coming over here and we're going to upgrade the rows date. Let's upgrade the temperature, it's a little simpler, first.
0:59 So we'll come over here and we'll say actual mean temp, that's from the header. Now, see, this value is actually the result
1:09 of converting it to an integer. And we also have the actual min temp. Now, you want to be careful here, of course that you don't cross those over
1:19 but also, that you're using integers with the integers and the floats where there are floats and so on.
1:25 Now, this is not fun to watch me type this out for each one, and there's a lot of it, it's quite tedious so let me write this out.
1:31 And then we'll come back to it. Alright, here we are. So now we've taken everything that we found in the header
1:40 and we've done a conversion from strings to numbers. Sometimes those are integers, sometimes those are floats
1:45 we paid careful attention to when that was the case. If you're unsure just use floats and then we're going to return this row
1:53 but by getting the value out and then replacing the value with the integers we should be able to come over here and print this.
1:58 So if I do this and, kind of what we did before we do a little type of that. We print those. Now, if I write on it,
2:05 Actually... Excuse me if I do this parse_row. And we kind of upgrade this row here And notice these are now integers
2:15 if I change the column to the actual precipitation you'll see these are floats. Here they are. Those look like floats, don't they?
2:26 Great, so we've upgraded this and, it's already in a really good shape. There's one other thing I want to do to make it nicer.
2:35 And that is to use a data type that's built into Python that helps you work directly with sort of these values
2:44 in a really simple way, and it's called a namedtuple. So let's go to the top here and we'll say import, collections
2:52 and inside Collections there's this thing called the namedtuple. So we can actually define another type something better than this basic dictionary
3:01 which is cumbersome to work with. You got to do the .get, the value might not be there, things like that, so let's define a record,
3:08 like a weather record, and we're going to do that by saying, collections.namedtuple. Now, you give it two pieces of information. One, you...
3:18 You basically replicate this, so you tell it, I'm calling you this, but, your name is what I'm calling you it's sort of so it knows what its name is.
3:28 And then the next thing you give it is simply this giant thing up here, okay, like so. Now, PyCharm says, whoa whoa whoa, what're you doing,
3:38 that's like line, or column 200, this is insane. So why don't you do some line wrappings, just so people can read what's going on, right.
3:46 And this Python will turn that back into one long string because there's no comma separating it. These might look like they're separating it,
3:54 but they're on the inside, not the outside. Okay, so this is going to let us define a type, so come down here, and I could actually upgrade this,
4:02 so I could say, r equals record, and if you look and see what it takes, it takes a date, a temperature, all the temperatures and so on.
4:12 And I could say date equals this, mean temperature equals this, min temperature equals that. So I could say date equals row.getdate comma.
4:25 Did I say data? Oh, no, date, yeah, date. So date equals that and actual mean temperature equals row.get, this is starting to feel tedious isn't it?
4:40 And we got to do this for every one of those. It turns out these rows are dictionaries, there's a shorthand to say this statement,
4:48 for every element in the dictionary. So if I want to say, if date is in there, go get the value assigned as date.
4:54 If mean, actual mean temp is in there, go get that value and assign it to this, and the way you do that is you say **row.
5:02 Star star row just says, well, do what I erased, right. Set this value to the value from the dictionary,
5:08 set this argument value to the value of the dictionary, and because what's in the dictionary is literally what we put right here,
5:15 this is going to match exactly and this will work. Alright, so now let's return this little record thing instead and we can rename it better to record.
5:25 There we go. And so we'll call it record here. Now we go over here record and say ., and notice it gives us a nice,
5:32 beautiful list of all the stuff that's in there, we don't have to do this this style over here, I'll just say, I don't have to know it's a string
5:40 and is the actual precipitation, I just say record.actual_precipitation. Then into print out the value, then we'll do it again.
5:49 So now it should work just the same, boom, it does. Okay, whew, so we've now converted our data, the last thing to do is to store it.
5:59 So instead of printing out, which is kind of fun, but generally not helpful, we're going to go to our data and say, data, append record.
6:08 Alright and just in case somebody goes and calls this a second time, right, we don't want to over do this, whoops not record, we'll say data.clear.
6:18 So we'll reset the data and then we'll load the new data from this file just in case you run it twice, probably not going to happen.
6:26 Alright, so now we're basically ready, let's just check and see that this worked. Let's go over here and just print research.data,
6:34 just to see that we got something that looks meaningful. And look at that, we did. Record, the dataset, the mean is, we actually didn't parse the date,
6:43 but just keeping it simple. Actual mean temperature and so on, you can see this goes on to the right for very long. But it did exactly what we wanted.
6:52 So it looks like we're off to the races. Now the final bit, actually this is the easy part, let's answer the questions now that data
6:58 is super structured to work with.


Talk Python's Mastodon Michael Kennedy's Mastodon