Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

by   Soudabeh Barghi, et al.

Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss.


page 3

page 4

page 5

page 10


SnakeLines: integrated set of computational pipelines for sequencing reads

Background: With the rapid growth of massively parallel sequencing techn...

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

Reproducibility of computational studies is a hallmark of scientific met...

Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis

The success of graph neural network-based models (GNNs) has significantl...

An Evaluation Study of Generative Adversarial Networks for Collaborative Filtering

This work explores the reproducibility of CFGAN. CFGAN and its family of...

Algorithm Selection for Collaborative Filtering: the influence of graph metafeatures and multicriteria metatargets

To select the best algorithm for a new problem is an expensive and diffi...

Perturbation-Recovery Method for Recommendation

Collaborative filtering is one of the most influential recommender syste...

Predicting Yield Performance of Parents in Plant Breeding: A Neural Collaborative Filtering Approach

Experimental corn hybrids are created in plant breeding programs by cros...

Please sign up or login with your details

Forgot password? Click here to reset