A Simple and Fast Baseline for Tuning Large XGBoost Models

11/12/2021
by   Sanyam Kapoor, et al.
0

XGBoost, a scalable tree boosting algorithm, has proven effective for many prediction tasks of practical interest, especially using tabular datasets. Hyperparameter tuning can further improve the predictive performance, but unlike neural networks, full-batch training of many models on large datasets can be time consuming. Owing to the discovery that (i) there is a strong linear relation between dataset size training time, (ii) XGBoost models satisfy the ranking hypothesis, and (iii) lower-fidelity models can discover promising hyperparameter configurations, we show that uniform subsampling makes for a simple yet fast baseline to speed up the tuning of large XGBoost models using multi-fidelity hyperparameter optimization with data subsets as the fidelity dimension. We demonstrate the effectiveness of this baseline on large-scale tabular datasets ranging from 15-70GB in size.

READ FULL TEXT
research
03/12/2019

Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning

Bayesian optimization is popular for optimizing time-consuming black-box...
research
09/26/2022

Improving Multi-fidelity Optimization with a Recurring Learning Rate for Hyperparameter Tuning

Despite the evolution of Convolutional Neural Networks (CNNs), their per...
research
07/28/2023

Is One Epoch All You Need For Multi-Fidelity Hyperparameter Optimization?

Hyperparameter optimization (HPO) is crucial for fine-tuning machine lea...
research
07/14/2022

PASHA: Efficient HPO with Progressive Resource Allocation

Hyperparameter optimization (HPO) and neural architecture search (NAS) a...
research
12/05/2018

An empirical study on hyperparameter tuning of decision trees

Machine learning algorithms often contain many hyperparameters whose val...
research
07/17/2023

Hyperparameter Tuning Cookbook: A guide for scikit-learn, PyTorch, river, and spotPython

This document provides a comprehensive guide to hyperparameter tuning us...
research
01/29/2019

Impact of Training Dataset Size on Neural Answer Selection Models

It is held as a truism that deep neural networks require large datasets ...

Please sign up or login with your details

Forgot password? Click here to reset