Grid Search, Cross Validation and XGBoost in R

This post is to help take the chapter on tree-based methods of the highly reputed Introduction to Statistical Learning text to the next level with grid search, cross validation and XGBoost.

I assume that you have read the chapter on Tree-Based Methods (chapter 8 in the copy I got) and executed the code in the lab section. Here we look into regression trees only. It should be straight forward to apply the ideas here to a classification problem by changing the objective function.

The last statement of the Boosting sub-section (of the lab) states that

using lambda of 0.2 leads to a slightly lower test MSE than lambda of 0.001.

It is a pity that the authors do not teach how to select the optimal lambda via grid search and cross-validation. Here is an attempt to do that.

Grid Search with Cross Validation

Let us first split the Boston dataset into 80% training and 20% test (unlike the 50-50 split the authors do).

trainInd <- sample(1:nrow(Boston), 0.8 * nrow(Boston))
dfTrain <- Boston[ trainInd,]
dfTest  <- Boston[-trainInd,]
dim(dfTrain) # 404 14
dim(dfTest) # 102 14

For selecting the best parameters for gbm, we will need the caret package. Install it using this R command:


Here is the code to do a grid search over all the hyper-parameters of gbm:


# set a grid of parameters to optimize upon;
# all these are gbm params and must be specified
gbmGrid <- expand.grid(
  n.trees = c(250, 500, 1000),
  shrinkage = c(0.001, 0.01, 0.1), # lambda
  interaction.depth = c(1, 2, 4), 
  n.minobsinnode = c(10) # left at default

# 5-fold CV; set method to "repeatedcv" 
# and set repeats for repeated CV
gbmTrControl <- trainControl(
  method = "cv",
  number = 5,
  #repeats = 4,
  verboseIter = FALSE,
  allowParallel = TRUE

# do the training
gbmTrain <- train(
  x = as.matrix(dfTrain[, ! names(dfTrain) 
                              %in% c("medv")]), 
  y = dfTrain$medv,
  trControl = gbmTrControl,
  tuneGrid = gbmGrid,
  method = "gbm"

# get the top 5 models
                           order(RMSE)), ], 5)
# MSE for best model = 3.42 ^ 2 = 11.7

# get the best model's parameters
#   n.trees interaction.depth shrinkage n.minobsinnode
#       500                 4       0.1             10

yhatGbm <- predict(gbmTrain, newdata = dfTest)
mean((yhatGbm - dfTest$medv) ^ 2) # 7.28

# plot the preds in red and actual in blue
plot(dfTest$medv, yhatGbm, col = "red")
abline(0, 1, col = "blue")

The plot looks like this:


In this case, we are actually getting much lower test MSE than training MSE.

Feel free to expand the grid search to larger ranges and select better parameters.


Next, we see how to do the same optimal parameter selection for XGBoost.

Install the XGBoost package using


The following code then searches and gives you the best hyper-parameters for XGBoost.


xgbGrid <- expand.grid(
  nrounds = c(250, 500, 1000),
  max_depth = c(1, 2, 4),
  eta = c(0.001, 0.003, 0.01),
  gamma = c(0, 1, 2),
  colsample_bytree = c(1, 0.5, 0.25),
  min_child_weight = c(1, 2)

xgbTrControl <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 2,
  verboseIter = FALSE,
  returnData = FALSE,
  allowParallel = TRUE

xgbTrain <- train(
  x = as.matrix(dfTrain[, ! names(dfTrain) %in% c("medv")]), 
  y = dfTrain$medv,
  objective = "reg:linear",
  trControl = xgbTrControl,
  tuneGrid = xgbGrid,
  method = "xgbTree"

# get the top model and its results
                             order(RMSE)), ], 1)
# MSE 3.12 ^ 2 = 9.74

yhatXgb <- predict(xgbTrain, newdata = dfTest)
mean((yhatXgb - dfTest$medv) ^ 2) # 10.49

plot(dfTest$medv, yhatXgb, col = "red")
abline(0, 1, col = "blue")

# Variable Importance
names <- names(dfTrain)[! names(dfTrain) %in% c("medv")]
importanceMatrix <- xgb.importance(names, 
                                   model = xgbTrain$finalModel)

The variable importance plot looks like this:


Again, we can see that lstat and rm are the two main features.

The test MSE for the two models are lower compared to the ones presented in the text. This could be because our training set size is larger and also because of the optimizations we did for the models. For fair comparison, you should test the code in the text with 80% training data like we did here.

PS: I could not figure out easily how to plot the variable importance for the GBM model.

Print Friendly, PDF & Email

One thought on “Grid Search, Cross Validation and XGBoost in R

  1. Thanks a lot, I didn’t know we could use the outpout model of Caret as an input of a xgb library function as in xgb.importance(names, model = fit.xgbt$finalModel).
    Actually, it seems that this does not work for all xgb functions (xgb.plot.tree(fit.xgbt$finalModel) doesn’t work)

Leave a Reply

Your email address will not be published. Required fields are marked *