Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 9: Real Estate Analysis App
Lecture: CSV Processing From Scratch
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
So now it's time to load the data,
0:03
we've got everything just lined up for us to grab this file and run with it.
0:05
Let's do a quick little context manager
0:07
we'll do a with open file name as we'll do read only text,
0:12
and let's go and set the encoding, to utf-8,
0:17
remember, this is the file we are going after and it's right at the top
0:22
you can see we have the street then the city,
0:25
then the zip code, then the state, beds, bath and so on,
0:27
this will give us information about all the real estate purchases
0:29
that have happened over here.
0:31
When we are processing the csv, there is actually a couple of options,
0:35
one option would be to just parse this directly as strings,
0:41
now three is another option we'll use in just a minute
0:44
but let me just kind of sketch that out for you
0:47
because understanding how that might go,
0:50
would help you approach file formats that there is not a built in reader for.
0:53
The first thing we need to do is capture the header,
0:55
and that's not the thing of the top that we are printing out but you know,
0:58
the sort of column names at the top of this,
1:00
so we can say fin.readline() singular
1:03
and we could do something like print found header,
1:06
to be and then we'll just print that out.
1:08
And then the next thing we need to do is we need to go through
1:12
line by line and turn this into some kind of data structure,
1:15
we'll start simple and we'll get something more interesting in a minute,
1:18
so we can say for line in fin:, now we could have just started out this way
1:23
but remember the first line is the header
1:26
and then every subsequent line is actually the data,
1:28
so we do one read line and then we stream over it,
1:31
and we need to turn the line into the individual columns of data
1:36
and with the csv comma separated value one way let's call this line data,
1:42
one way to do this would be to say line.split() and we can just split on the comma,
1:47
now this is not perfect because you'll see that there are ways
1:50
to like escape commas and so on
1:52
and the edge cases can be a little bit tricky,
1:55
so we are not going to end up using this,
1:57
but let me just print out well, let's build up all this we'll say lines,
2:00
and we'll gather this here, and we'll say lines.append(line_data),
2:06
and then let's just print out let's say the first bunch here so we'll say print lines,
2:11
we don't want all like thousands of lines, so let's just say we'll take the first line,
2:15
remember we can slice from 0 to 5 and just get the first lines like so.
2:19
So let's run this to see what happens.
2:22
Fantastic, here is our header, that looks good
2:24
and look here is a string for High Street,
2:27
the string for Sacramento, now there is two lists here,
2:30
and we have basically a list of lists,
2:34
that was the first row, that's the second row,
2:35
now notice the very last bit has a \n
2:39
and that's because it's coming form when we read a line out of a file,
2:43
a text file in Python it keeps that on the end there,
2:46
so let's do a strip and we can strip the bits off the front and the back
2:51
and I suppose we could go and do that with the header as well,
2:54
now we try that again, perfect, our latitude is looking solid.
2:59
Ok, so now if we wanted to go work with the data, remember,
3:05
this is not where we are going to stop, this is where we are starting,
3:07
if we wanted to work with like the number of beds
3:10
if I want to just print out those,
3:12
I could somehow go over to this bit here and for each line, and the line data,
3:16
I could come over and put bed was I think the fifth one, it didn't really matter,
3:21
let's say it's a fifth one, then I would say 4 and I could say bed count equals that,
3:25
ok, now there is a couple of challenges,
3:27
because this actually comes back as a string so I need to parse it and so on,
3:31
but this is sort of the basic way of working with this data.
3:34
So that is kind of how we would attack csv files from scratch,
3:39
but Python has built in support for csv and we can do better,
3:43
so let's use the csv module and then build on that.