Statistical Methods for Replicability Assessment

03/20/2019
by   Kenneth Hung, et al.
0

Large-scale replication studies like the Reproducibility Project: Psychology (RP:P) provide invaluable systematic data on scientific replicability, but most analyses and interpretations of the data fail to agree on the definition of "replicability" and disentangle the inexorable consequences of known selection bias from competing explanations. We discuss three concrete definitions of replicability based on (1) whether published findings about the signs of effects are mostly correct, (2) how effective replication studies are in reproducing whatever true effect size was present in the original experiment, and (3) whether true effect sizes tend to diminish in replication. We apply techniques from multiple testing and post-selection inference to develop new methods that answer these questions while explicitly accounting for selection bias. Re-analyzing the RP:P data, we estimate that 22 out of 68 (32 directional claims were false (upper confidence bound 47 estimate that among claims significant at the stricter significance threshold 0.005, only 2.2 out of 33 (7 18 difference in effect size between original and replication studies and, after adjusting for multiplicity, identify five (11 replication). We estimate that the effect size declined by at least 20 replication study relative to the original study in 16 of the 46 (35 pairs (lower confidence bound 11 assumptions about the true effect sizes.

READ FULL TEXT
research
09/16/2020

The assessment of replication success based on relative effect size

Replication studies are increasingly conducted to confirm original findi...
research
05/08/2023

Replication of "null results" – Absence of evidence or evidence of absence?

In several large-scale replication projects, statistically non-significa...
research
06/20/2020

Improving the replicability of results from a single psychological experiment

We identify two aspects of selective inference as major obstacles for re...
research
11/21/2017

Why "Redefining Statistical Significance" Will Not Improve Reproducibility and Could Make the Replication Crisis Worse

A recent proposal to "redefine statistical significance" (Benjamin, et a...
research
05/19/2020

Identifying Statistical Bias in Dataset Replication

Dataset replication is a useful tool for assessing whether improvements ...
research
03/10/2021

Financial factors selection with knockoffs: fund replication, explanatory and prediction networks

We apply the knockoff procedure to factor selection in finance. By build...
research
09/03/2023

Diagnosing the role of observable distribution shift in scientific replications

Many researchers have identified distribution shift as a likely contribu...

Please sign up or login with your details

Forgot password? Click here to reset