Data Science Jumpstart with 10 Projects Transcripts
Chapter: Project 8: Predicting Heart Disease with Machine Learning
Lecture: Tuning an XGBoost Model with Hyperopt

Login or purchase this course to watch this video and the rest of the course contents.
0:00 Okay, we're going to tune the model using a library called Hyperopt.
0:04 The idea here is that there are hyperparameters that control how the model performs,
0:08 and we're going to change some of those hyperparameters to make it so it's not memorizing the data as much.
0:13 If you think about a decision tree, you can make a decision tree that goes really deep,
0:18 and it basically memorizes all the features, and you can trace a path through to an individual.
0:24 That seems useful, and actually is useful to describe something,
0:28 but when we want to make predictions, we actually don't want it to be super fit or super complex like that. We want to simplify it a little bit.
0:37 So one of the main levers that we have for simplifying our model is how deep our tree can go. And remember, XGBoost has a bunch of trees.
0:45 One of the things that we can do is tweak those tree levels and make the model perform differently.
0:49 I've got some code here that leverages this Hyperopt library.
0:54 The nice thing about this Hyperopt library is that it allows us to define a space of parameters that we want to explore,
1:00 and then it looks at that space and it sees where it's performing well, and it sort of keeps checking out around that space.
1:06 Oftentimes, these hyperparameters are floating point numbers,
1:09 so it'd be annoying to specify all the numbers between 0 and 1 in terms of like 0.001, 0.002, etc.
1:18 So this Hyperopt library just says, here's a distribution of what those numbers are,
1:24 and it starts exploring those, and when it finds good things, it kind of focuses on those.
1:29 Every once in a while, it will do an exploration, where it will try and find some other value,
1:34 but if it doesn't have good results, it will exploit the ones that have good results.
1:38 You can see in this, I'm actually not changing all the hyperparameters at once. I'm doing them in what I'm calling rounds.
1:45 So I've grouped a bunch of hyperparameters that perform similarly. This first one is scale positive weight.
1:53 This is going to change the weights for unbalanced data. The next one is looking at our tree. So the depth of the tree and min child weight.
2:06 Min child weight is used to determine when to make a split. This next section here is for sampling.
2:12 So how many rows and columns of our data do we use when we make trees? These next ones, reg alpha and reg lambda and gamma,
2:22 these are regularization hyperparameters, and the last one is the learning rate.
2:26 Basically, if you're thinking about golfing, this is how hard you're hitting. And sometimes you want to actually not hit quite as hard,
2:34 and you get to a better result faster. Why do I want to do it in rounds rather than everything all at once?
2:39 Well, you might want to do everything all at once. If you think about it, if you've got 10 parameters and you're trying to optimize all of them,
2:44 it's hard to optimize all of them. So I've grouped them into smaller sections so I don't have quite as big of a space to explore.
2:52 And the grouping is meant to look at hyperparameters that do similar things. Is this perfect? No, it's not.
2:58 But it tends to give quicker results because you don't have that combinatoric explosion that you would have if you looked at all of them all at once.
3:05 So if you don't have a weekend to spend searching this space out, I suggest using something like this for quick and easy results.
3:12 Okay, let's kick that off. You can see that this is progressing here. This is one of the rounds, and you can see that there's this loss score.
3:25 The best loss is negative 0.5. And this is just exploring the space, trying to lower that loss.
3:35 Okay, we're going to do, in this case, we're going to do 20 evaluations in this round. So we're at 14 of 20.
3:41 It's taking about two seconds per evaluation. And we've got multiple rounds, so this is going to take a couple minutes to run.
3:48 You can see in our next round, our loss has gone down a little bit, indicating that we've tweaked the hyperparameter such that it's doing better.
4:00 And here it got even better. So that's good. Okay, at this point, we're done. And here are the parameters.
4:09 So what I'm going to do is I'm going to copy this, and I'm going to paste this down here below.
4:15 One thing to be aware of is there is some randomness in this. So there's no guarantee that if you ran this multiple times,
4:21 you would get the exact same values here, especially these values that are floating point numbers.
4:26 Hopefully the values that are integer-like, you would get the same values. But when you combine those with the floating point numbers,
4:33 there's no guarantee that you get the exact same values. I do like to just copy and paste them so that when I come back to this,
4:39 I can rerun the model with the same parameters. Okay, so once we've done that, I'm going to now say, okay, let's make a model here.
4:47 And I'm going to use these parameters that I just specified. And I'm going to say, let's have 2,500 estimators, but 50 early stopping rounds.
4:57 What this means is I can hit the ball 2,500 times, but if after 50 times, the past 50 times, if I haven't improved,
5:04 then stop hitting the ball because you're not getting better. Okay, and I got an error here.
5:08 It says that the max depth expected an integer, but it wasn't an integer. So sadly, this kicked out a non-integer value here.
5:15 I'm just going to convert that to an integer and try it again. And there's our result. It looks like it didn't need to make 2,500.
5:22 It only made 62 trees there. Let's look at the score of that. And there's our score of our original. So in this case, our score didn't really go up.
5:34 In fact, it went down a little bit. But if you look at our model, our model is not overfitting so much on the data.
5:43 Is it acceptable if the accuracy goes down? It might be acceptable if the accuracy goes down. Again, in this case, we are not overfitting as much.
5:52 So I actually feel a little bit better about this model, even though the accuracy went down.
5:56 There are other metrics we can look at to see if we improve the model by other metrics, not just accuracy.


Talk Python's Mastodon Michael Kennedy's Mastodon