Testing Cross-Validation Variants in Ranking Environments

by   Balázs R. Sziklai, et al.

This research investigates how to determine whether two rankings can come from the same distribution. We evaluate three hybrid tests: Wilcoxon's, Dietterich's, and Alpaydin's statistical tests combined with cross-validation, each operating with folds ranging from 5 to 10, thus altogether 18 variants. We have used the framework of a popular comparative statistical test, the Sum of Ranking Differences, but our results are representative of all ranking environments. To compare these methods, we have followed an innovative approach borrowed from Economics. We designed eight scenarios for testing type I and II errors. These represent typical situations (i.e., different data structures) that cross-validation (CV) tests face routinely. The optimal CV method depends on the preferences regarding the minimization of type I/II errors, size of the input, and expected patterns in the data. The Wilcoxon method with eight folds proved to be the best under all three investigated input sizes, although there were scenarios and decision aspects where other methods, namely Wilcoxon 10 and Alpaydin 10, performed better.



There are no comments yet.


page 9

page 11


A New Approach to Multilabel Stratified Cross Validation with Application to Large and Sparse Gene Ontology Datasets

Multilabel learning is an important topic in machine learning research. ...

Bagging cross-validated bandwidth selection in nonparametric regression estimation with applications to large-sized samples

Cross-validation is a well-known and widely used bandwidth selection met...

How to evaluate sentiment classifiers for Twitter time-ordered data?

Social media are becoming an increasingly important source of informatio...

Mass-Univariate Hypothesis Testing on MEEG Data using Cross-Validation

Recent advances in statistical theory, together with advances in the com...

Automatic Passenger Counting: Introducing the t-Test Induced Equivalence Test

Automatic passenger counting in public transport has been emerging rapid...

Bootstrap Bias Corrected Cross Validation applied to Super Learning

Super learner algorithm can be applied to combine results of multiple ba...

Performance, Successes and Limitations of Deep Learning Semantic Segmentation of Multiple Defects in Transmission Electron Micrographs

In this work, we perform semantic segmentation of multiple defect types ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.