cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R

02/21/2020
by   Christian Thiele, et al.
0

'Optimal cutpoints' for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in 'optimal' cutpoints that are highly variable and systematically overestimate the out-of-sample performance. To address these concerns, the cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance. The robust methods include bootstrapping and smoothing based on kernel estimation, generalized additive models, smoothing splines, and local regression. These methods can be applied to a wide range of binary-classification and cost-based metrics. cutpointr also provides mechanisms to utilize user-defined metrics and estimation methods. The package has capabilities for parallelization of the bootstrapping, including reproducible random number generation. Furthermore, it is pipe-friendly, for example for compatibility with functions from tidyverse. Various functions for plotting receiver operating characteristic curves, precision recall graphs, bootstrap results and other representations of the data are included. The package contains example data from a study on psychological characteristics and suicide attempts suitable for applying binary classification algorithms.

READ FULL TEXT
research
06/13/2018

Partial AUC Maximization via Nonlinear Scoring Functions

We propose a method for maximizing a partial area under a receiver opera...
research
03/20/2020

Probabilistic learning of boolean functions applied to the binary classification problem with categorical covariates

In this work we cast the problem of binary classification in terms of es...
research
09/04/2023

perms: Marginal likelihood estimation for binary Bayesian nonparametric models in Python and R

Binary responses arise in a multitude of statistical problems, including...
research
12/21/2018

GaussianProcesses.jl: A Nonparametric Bayes package for the Julia Language

Gaussian processes are a class of flexible nonparametric Bayesian tools ...
research
02/17/2022

Combining Varied Learners for Binary Classification using Stacked Generalization

The Machine Learning has various learning algorithms that are better in ...
research
08/23/2019

Pareto-optimal data compression for binary classification tasks

The goal of lossy data compression is to reduce the storage cost of a da...
research
04/29/2019

Asymmetric Impurity Functions, Class Weighting, and Optimal Splits for Binary Classification Trees

We investigate how asymmetrizing an impurity function affects the choice...

Please sign up or login with your details

Forgot password? Click here to reset