Eve: Building RESTful APIs with MongoDB and Flask Transcripts
Chapter: Your first Eve service
Lecture: Recap

Login or purchase this course to watch this video and the rest of the course contents.
0:00 In this section we're going to make a model that predicts whether something is positive or negative.
0:04 I'm going to use the XGBoost model. Make sure you have installed XGBoost. We're going to import that. All I'm going to do here is
0:13 I'm going to say x is equal to this tfdf and our y is going to be equal to whether something is positive.
0:23 So what's our tfdf? That is our data frame that we stuck onto the end of the other one. So let's look at x. x looks like this.
0:33 It's a bunch of numbers in here. So this is basically how important 10 is in document 3.
0:43 You can see that zone appeared in this document but a lot of these are zeros. This is sparse because not all of the reviews have all the
0:50 columns. Okay what does y look like? Y is just a series whether something is positive or negative. And what am I going to do here? I'm going
0:58 to use scikit-learn to split our data into a training set and a testing set. Why do we want a training set and testing set? Well we want to see how our
1:05 model would perform on data that it hasn't seen before. So what we do is we hold out some subset of that, we train a model, and then with
1:11 the subset we held out we evaluate our model. We already know what the true positive negative labels are but we see
1:18 how well our model predicts those based on data that it hasn't seen, giving us some sort of feel of how it might perform in the real world.
1:25 Okay so we've split up our data let's train our model. It's really easy to train a model you just say fit. So we're going to fit it with the x and
1:32 the y. The x are the features the y is the label whether something was positive or negative. That takes a while because we've got a
1:39 lot of columns but it looks like we did get a model that came out of that. And then let's evaluate it. One way to evaluate it is to use the score.
1:46 We're going to pass in the testing data, the data that it hasn't seen, and we're going to pass in the real labels for that and this is going to
1:52 give us back an accuracy. It looks like it got 78% right. Is 78% good? Well the answer to that is it depends.
2:01 It might be good it might not be good. This is saying that basically four-fifths of the reviews that you classify as positive or negative
2:11 are correct. Now if you have a situation where you're making a model that predicts maybe fraud, and fraud is not very common like you
2:21 could imagine fraud might occur in like one in a thousand, well you could make a model that's highly accurate. You just predict not fraud.
2:27 It's 99% accurate. So accuracy in and of itself might not be a sufficient metric to determine whether something's good, but
2:35 it's good to give us a baseline. This is better than flipping a coin. It's 80% accurate.


Talk Python's Mastodon Michael Kennedy's Mastodon