Statistical modeling: the three cultures

12/08/2020
by   Adel Daoud, et al.
0

Two decades ago, Leo Breiman identified two cultures for statistical modeling. The data modeling culture (DMC) refers to practices aiming to conduct statistical inference on one or several quantities of interest. The algorithmic modeling culture (AMC) refers to practices defining a machine-learning (ML) procedure that generates accurate predictions about an event of interest. Breiman argued that statisticians should give more attention to AMC than to DMC, because of the strengths of ML in adapting to data. While twenty years later, DMC has lost some of its dominant role in statistics because of the data-science revolution, we observe that this culture is still the leading practice in the natural and social sciences. DMC is the modus operandi because of the influence of the established scientific method, called the hypothetico-deductive scientific method. Despite the incompatibilities of AMC with this scientific method, among some research groups, AMC and DMC cultures mix intensely. We argue that this mixing has formed a fertile spawning pool for a mutated culture that we called the hybrid modeling culture (HMC) where prediction and inference have fused into new procedures where they reinforce one another. This article identifies key characteristics of HMC, thereby facilitating the scientific endeavor and fueling the evolution of statistical cultures towards better practices. By better, we mean increasingly reliable, valid, and efficient statistical practices in analyzing causal relationships. In combining inference and prediction, the result of HMC is that the distinction between prediction and inference, taken to its limit, melts away. We qualify our melting-away argument by describing three HMC practices, where each practice captures an aspect of the scientific cycle, namely, ML for causal inference, ML for data acquisition, and ML for theory prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2019

Generalised linear models for prognosis and intervention: Theory, practice, and implications for machine learning

In health research, machine learning (ML) is often hailed as the new fro...
research
04/28/2018

Data science is science's second chance to get causal inference right: A classification of data science tasks

Causal inference from observational data is the goal of many health and ...
research
06/20/2021

Machine learning in the social and health sciences

The uptake of machine learning (ML) approaches in the social and health ...
research
09/04/2020

Unlucky Number 13? Manipulating Evidence Subject to Snooping

Questionable research practices like HARKing or p-hacking have generated...
research
04/24/2019

The Scientific Method in the Science of Machine Learning

In the quest to align deep learning with the sciences to address calls f...
research
01/13/2021

Designing Machine Learning Toolboxes: Concepts, Principles and Patterns

Machine learning (ML) and AI toolboxes such as scikit-learn or Weka are ...
research
06/14/2015

The Artists who Forged Themselves: Detecting Creativity in Art

Creativity and the understanding of cognitive processes involved in the ...

Please sign up or login with your details

Forgot password? Click here to reset