NesPrInDT: Nested undersampling in PrInDT

03/27/2021
by   Claus Weihs, et al.
0

In this paper, we extend our PrInDT method (Weihs, Buschfeld 2021) towards additional undersampling of one of the predictors. This helps us to handle multiple unbalanced data sets, i.e. data sets that are not only unbalanced with respect to the class variable but also in one of the predictor variables. Beyond the advantages of such an approach, our study reveals that the balanced accuracy in the full data set can be much lower than in the predictor undersamples. We discuss potential reasons for this problem and draw methodological conclusions for linguistic studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2017

Estimating historic movement of a climatological variable from a pair of misaligned data sets

We consider in this paper the problem of estimating the mean function fr...
research
04/09/2023

Maximum Agreement Linear Prediction via the Concordance Correlation Coefficient

This paper examines distributional properties and predictive performance...
research
08/11/2021

Repeated undersampling in PrInDT (RePrInDT): Variation in undersampling and prediction, and ranking of predictors in ensembles

In this paper, we extend our PrInDT method (Weihs Buschfeld 2021a) t...
research
01/05/2015

Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution

It is tempting to treat frequency trends from the Google Books data sets...
research
11/04/2021

Scaffolding Sets

Predictors map individual instances in a population to the interval [0,1...
research
04/11/2016

Semi-supervised learning of local structured output predictors

In this paper, we study the problem of semi-supervised structured output...
research
01/21/2021

Computation of quantile sets for bivariate data

Algorithms are proposed for the computation of set-valued quantiles and ...

Please sign up or login with your details

Forgot password? Click here to reset