Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering

09/26/2018
by   Soudabeh Barghi, et al.
11

Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss.

READ FULL TEXT

page 3

page 4

page 5

page 10

research
06/25/2021

SnakeLines: integrated set of computational pipelines for sequencing reads

Background: With the rapid growth of massively parallel sequencing techn...
research
10/13/2018

Ten Simple Rules for Reproducible Research in Jupyter Notebooks

Reproducibility of computational studies is a hallmark of scientific met...
research
08/01/2023

Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis

The success of graph neural network-based models (GNNs) has significantl...
research
01/05/2022

An Evaluation Study of Generative Adversarial Networks for Collaborative Filtering

This work explores the reproducibility of CFGAN. CFGAN and its family of...
research
07/23/2018

Algorithm Selection for Collaborative Filtering: the influence of graph metafeatures and multicriteria metatargets

To select the best algorithm for a new problem is an expensive and diffi...
research
11/17/2022

Perturbation-Recovery Method for Recommendation

Collaborative filtering is one of the most influential recommender syste...
research
01/27/2020

Predicting Yield Performance of Parents in Plant Breeding: A Neural Collaborative Filtering Approach

Experimental corn hybrids are created in plant breeding programs by cros...

Please sign up or login with your details

Forgot password? Click here to reset