Feature Encodings for Gradient Boosting with Automunge

09/25/2022
by   Nicholas J. Teague, et al.
0

Selecting a default feature encoding strategy for gradient boosted learning may consider metrics of training duration and achieved predictive performance associated with the feature representations. The Automunge library for dataframe preprocessing offers a default of binarization for categoric features and z-score normalization for numeric. The presented study sought to validate those defaults by way of benchmarking on a series of diverse data sets by encoding variations with tuned gradient boosted learning. We found that on average our chosen defaults were top performers both from a tuning duration and a model performance standpoint. Another key finding was that one hot encoding did not perform in a manner consistent with suitability to serve as a categoric default in comparison to categoric binarization. We present here these and further benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2023

A Comparison of Modeling Preprocessing Techniques

This paper compares the performance of various data processing methods i...
research
11/05/2019

A Comparative Analysis of XGBoost

XGBoost is a scalable ensemble technique based on gradient boosting that...
research
07/31/2020

Rethinking Defaults Values: a Low Cost and Efficient Strategy to Define Hyperparameters

Machine Learning (ML) algorithms have been successfully employed by a va...
research
08/09/2013

MaLeS: A Framework for Automatic Tuning of Automated Theorem Provers

MaLeS is an automatic tuning framework for automated theorem provers. It...
research
11/23/2017

Grabit: Gradient Tree Boosted Tobit Models for Default Prediction

We introduce a novel model which is obtained by applying gradient tree b...
research
11/20/2020

Sequential Defaulting in Financial Networks

We consider financial networks, where banks are connected by contracts s...

Please sign up or login with your details

Forgot password? Click here to reset