How to out-perform default random forest regression: choosing hyperparameters for applications in large-sample hydrology

05/11/2023
by   Divya K. Bilolikar, et al.
0

Predictions are a central part of water resources research. Historically, physically-based models have been preferred; however, they have largely failed at modeling hydrological processes at a catchment scale and there are some important prediction problems that cannot be modeled physically. As such, machine learning (ML) models have been seen as a valid alternative in recent years. In spite of their availability, well-optimized state-of-the-art ML strategies are not being widely used in water resources research. This is because using state-of-the-art ML models and optimizing hyperparameters requires expert mathematical and statistical knowledge. Further, some analyses require many model trainings, so sometimes even expert statisticians cannot properly optimize hyperparameters. To leverage data and use it effectively to drive scientific advances in the field, it is essential to make ML models accessible to subject matter experts by improving automated machine learning resources. ML models such as XGBoost have been recently shown to outperform random forest (RF) models which are traditionally used in water resources research. In this study, based on over 150 water-related datasets, we extensively compare XGBoost and RF. This study provides water scientists with access to quick user-friendly RF and XGBoost model optimization.

READ FULL TEXT

page 9

page 10

page 11

research
04/10/2018

Hyperparameters and Tuning Strategies for Random Forest

The random forest algorithm (RF) has several hyperparameters that have t...
research
02/25/2022

MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest

The broad adoption of Machine Learning (ML) in security-critical fields ...
research
12/08/2020

Retrieval of Case 2 Water Quality Parameters with Machine Learning

Water quality parameters are derived applying several machine learning r...
research
10/08/2020

Exploring Sensitivity of ICF Outputs to Design Parameters in Experiments Using Machine Learning

Building a sustainable burn platform in inertial confinement fusion (ICF...
research
01/30/2022

Machine learning based modelling and optimization in hard turning of AISI D6 steel with newly developed AlTiSiN coated carbide tool

In recent times Mechanical and Production industries are facing increasi...
research
05/30/2023

Sensitivity Analysis of RF+clust for Leave-one-problem-out Performance Prediction

Leave-one-problem-out (LOPO) performance prediction requires machine lea...
research
12/10/2020

A machine learning approach to galaxy properties: Joint redshift - stellar mass probability distributions with Random Forest

We demonstrate that highly accurate joint redshift - stellar mass PDFs c...

Please sign up or login with your details

Forgot password? Click here to reset