Python Jumpstart by Building 10 Apps Transcripts
Chapter: App 9: Real Estate Analysis App
Lecture: CSV Processing From Scratch
Login or
purchase this course
to watch this video and the rest of the course contents.
0:01
So now it's time to load the data, we've got everything just lined up for us to grab this file and run with it. Let's do a quick little context manager
0:08
we'll do a with open file name as we'll do read only text, and let's go and set the encoding, to utf-8,
0:18
remember, this is the file we are going after and it's right at the top you can see we have the street then the city,
0:26
then the zip code, then the state, beds, bath and so on, this will give us information about all the real estate purchases
0:30
that have happened over here. When we are processing the csv, there is actually a couple of options,
0:36
one option would be to just parse this directly as strings, now three is another option we'll use in just a minute
0:45
but let me just kind of sketch that out for you because understanding how that might go,
0:51
would help you approach file formats that there is not a built in reader for. The first thing we need to do is capture the header,
0:56
and that's not the thing of the top that we are printing out but you know, the sort of column names at the top of this,
1:01
so we can say fin.readline() singular and we could do something like print found header, to be and then we'll just print that out.
1:09
And then the next thing we need to do is we need to go through line by line and turn this into some kind of data structure,
1:16
we'll start simple and we'll get something more interesting in a minute, so we can say for line in fin:, now we could have just started out this way
1:24
but remember the first line is the header and then every subsequent line is actually the data, so we do one read line and then we stream over it,
1:32
and we need to turn the line into the individual columns of data and with the csv comma separated value one way let's call this line data,
1:43
one way to do this would be to say line.split() and we can just split on the comma, now this is not perfect because you'll see that there are ways
1:51
to like escape commas and so on and the edge cases can be a little bit tricky, so we are not going to end up using this,
1:58
but let me just print out well, let's build up all this we'll say lines, and we'll gather this here, and we'll say lines.append(line_data),
2:07
and then let's just print out let's say the first bunch here so we'll say print lines,
2:12
we don't want all like thousands of lines, so let's just say we'll take the first line,
2:16
remember we can slice from 0 to 5 and just get the first lines like so. So let's run this to see what happens.
2:23
Fantastic, here is our header, that looks good and look here is a string for High Street, the string for Sacramento, now there is two lists here,
2:31
and we have basically a list of lists, that was the first row, that's the second row, now notice the very last bit has a \n
2:40
and that's because it's coming form when we read a line out of a file, a text file in Python it keeps that on the end there,
2:47
so let's do a strip and we can strip the bits off the front and the back and I suppose we could go and do that with the header as well,
2:55
now we try that again, perfect, our latitude is looking solid. Ok, so now if we wanted to go work with the data, remember,
3:06
this is not where we are going to stop, this is where we are starting, if we wanted to work with like the number of beds
3:11
if I want to just print out those, I could somehow go over to this bit here and for each line, and the line data,
3:17
I could come over and put bed was I think the fifth one, it didn't really matter,
3:22
let's say it's a fifth one, then I would say 4 and I could say bed count equals that, ok, now there is a couple of challenges,
3:28
because this actually comes back as a string so I need to parse it and so on, but this is sort of the basic way of working with this data.
3:35
So that is kind of how we would attack csv files from scratch, but Python has built in support for csv and we can do better,
3:44
so let's use the csv module and then build on that.