#100DaysOfCode in Python Transcripts
Chapter: Days 37-39: Using CSV data
Lecture: Demo: Converting our CSV data to a usable form
Login or
purchase this course
to watch this video and the rest of the course contents.
0:00
Now, we've actually parsed the CSV file
0:02
and we're ready to go.
0:03
But we saw that everything that comes back
0:06
is actually just a bunch of strings.
0:09
And we can't really do data analysis.
0:11
When you want to do numerical operations
0:13
or date time operations.
0:14
But the data type is a string.
0:17
So what we have to do is that we need
0:19
to write a function that will take one of these rows.
0:21
And kind of upgrade it to its types
0:23
that we know exists in there.
0:25
So let's go down here,
0:26
we're going to write a function called parse_row
0:30
And we'll give it a row
0:32
and it's going to return some kind of item.
0:35
So, first of all, let's write one
0:37
that just actually upgrades the values.
0:39
And we'll do one more step beyond that.
0:42
So, we know, if we look over here.
0:45
That we have a date, we have an actual mean
0:48
and all of these things.
0:50
So, let's start by coming over here
0:52
and we're going to upgrade the rows date.
0:56
Let's upgrade the temperature, it's a little simpler, first.
0:58
So we'll come over here and we'll say actual mean temp,
1:02
that's from the header.
1:05
Now, see, this value is actually the result
1:08
of converting it to an integer.
1:11
And we also have the actual min temp.
1:13
Now, you want to be careful here, of course
1:17
that you don't cross those over
1:18
but also, that you're using integers with the integers
1:21
and the floats where there are floats and so on.
1:24
Now, this is not fun to watch me type this out for each one,
1:27
and there's a lot of it,
1:28
it's quite tedious so let me write this out.
1:30
And then we'll come back to it.
1:35
Alright, here we are. So now we've taken everything that we found in the header
1:39
and we've done a conversion from strings to numbers.
1:42
Sometimes those are integers, sometimes those are floats
1:44
we paid careful attention to when that was the case.
1:46
If you're unsure just use floats
1:48
and then we're going to return this row
1:52
but by getting the value out
1:53
and then replacing the value with the integers
1:56
we should be able to come over here and print this.
1:57
So if I do this and, kind of what we did before
2:00
we do a little type of that.
2:02
We print those.
2:03
Now, if I write on it,
2:04
Actually... Excuse me if I do this parse_row.
2:08
And we kind of upgrade this row here
2:11
And notice these are now integers
2:14
if I change the column to the actual precipitation
2:19
you'll see these are floats.
2:22
Here they are.
2:23
Those look like floats, don't they?
2:25
Great, so we've upgraded this
2:28
and, it's already in a really good shape.
2:31
There's one other thing I want to do to make it nicer.
2:34
And that is to use a data type that's built into Python
2:38
that helps you work directly with sort of these values
2:43
in a really simple way, and it's called a namedtuple.
2:46
So let's go to the top here and we'll say
2:49
import, collections
2:51
and inside Collections there's this thing called
2:53
the namedtuple.
2:54
So we can actually define another type
2:58
something better than this basic dictionary
3:00
which is cumbersome to work with.
3:01
You got to do the .get, the value might not be there,
3:04
things like that, so let's define a record,
3:07
like a weather record, and we're going to
3:09
do that by saying, collections.namedtuple.
3:11
Now, you give it two pieces of information.
3:14
One, you...
3:17
You basically replicate this, so you tell it,
3:20
I'm calling you this, but, your name is what I'm calling you
3:24
it's sort of so it knows what its name is.
3:27
And then the next thing you give it is simply this
3:31
giant thing up here, okay, like so.
3:35
Now, PyCharm says, whoa whoa whoa, what're you doing,
3:37
that's like line, or column 200, this is insane.
3:40
So why don't you do some line wrappings,
3:43
just so people can read what's going on, right.
3:45
And this Python will turn that back into one long string
3:49
because there's no comma separating it.
3:52
These might look like they're separating it,
3:53
but they're on the inside, not the outside.
3:56
Okay, so this is going to let us define a type,
3:58
so come down here, and I could actually upgrade this,
4:01
so I could say, r equals record,
4:06
and if you look and see what it takes,
4:07
it takes a date, a temperature,
4:09
all the temperatures and so on.
4:11
And I could say date equals this,
4:13
mean temperature equals this,
4:15
min temperature equals that.
4:17
So I could say date equals row.getdate comma.
4:24
Did I say data?
4:25
Oh, no, date, yeah, date.
4:27
So date equals that and actual mean temperature equals
4:35
row.get, this is starting to feel tedious isn't it?
4:39
And we got to do this for every one of those.
4:41
It turns out these rows are dictionaries,
4:44
there's a shorthand to say this statement,
4:47
for every element in the dictionary.
4:50
So if I want to say, if date is in there,
4:52
go get the value assigned as date.
4:53
If mean, actual mean temp is in there,
4:56
go get that value and assign it to this,
4:57
and the way you do that is you say **row.
5:01
Star star row just says, well, do what I erased, right.
5:05
Set this value to the value from the dictionary,
5:07
set this argument value to the value of the dictionary,
5:09
and because what's in the dictionary
5:12
is literally what we put right here,
5:14
this is going to match exactly and this will work.
5:17
Alright, so now let's return
5:18
this little record thing instead
5:20
and we can rename it better to record.
5:24
There we go.
5:25
And so we'll call it record here.
5:28
Now we go over here record and say .,
5:30
and notice it gives us a nice,
5:31
beautiful list of all the stuff that's in there,
5:34
we don't have to do this this style over here,
5:37
I'll just say, I don't have to know it's a string
5:39
and is the actual precipitation,
5:40
I just say record.actual_precipitation.
5:45
Then into print out the value,
5:47
then we'll do it again.
5:48
So now it should work just the same, boom, it does.
5:52
Okay, whew, so we've now converted our data,
5:56
the last thing to do is to store it.
5:58
So instead of printing out, which is kind of fun,
6:00
but generally not helpful,
6:02
we're going to go to our data and say, data, append record.
6:07
Alright and just in case somebody goes
6:09
and calls this a second time, right,
6:12
we don't want to over do this, whoops not record,
6:15
we'll say data.clear.
6:17
So we'll reset the data and then we'll load the new data
6:20
from this file just in case you run it twice,
6:23
probably not going to happen.
6:25
Alright, so now we're basically ready,
6:28
let's just check and see that this worked.
6:29
Let's go over here and just print research.data,
6:33
just to see that we got something that looks meaningful.
6:37
And look at that, we did.
6:38
Record, the dataset, the mean is,
6:41
we actually didn't parse the date,
6:42
but just keeping it simple.
6:44
Actual mean temperature and so on,
6:46
you can see this goes on to the right for very long.
6:49
But it did exactly what we wanted.
6:51
So it looks like we're off to the races.
6:53
Now the final bit, actually this is the easy part,
6:56
let's answer the questions now that data
6:57
is super structured to work with.