A few statistical principles for data science

02/03/2021
by   Noel Cressie, et al.
0

In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions), and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies it by modelling data as realisations generated from a probability space. Another approach to uncertainty quantification is to find similar data sets, and then use the variability of results between these data sets to capture the uncertainty. Both approaches allow 'error bars' to be put on estimates obtained from the original data set, although the interpretations are different. A third approach, that concentrates on giving a single answer and gives up on uncertainty quantification, could be considered as Data Engineering, although it has staked a claim in the Data Science terrain. This article presents a few (actually nine) statistical principles for data scientists that have helped me, and continue to help me, when I work on complex interdisciplinary projects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2020

Workshop on Quantification, Communication, and Interpretation of Uncertainty in Simulation and Data Science

Modern science, technology, and politics are all permeated by data that ...
research
06/30/2023

Redeeming Data Science by Decision Modelling

With the explosion of applications of Data Science, the field is has com...
research
10/25/2022

Measuring uncertainty when pooling interval-censored data sets with different precision

Data quality is an important consideration in many engineering applicati...
research
06/21/2021

Facilitating team-based data science: lessons learned from the DSC-WAV project

While coursework provides undergraduate data science students with some ...
research
06/26/2019

Wise Data: A Novel Approach in Data Science from a Network Science Perspective

Human beings have been generating data since very long times ago. We ask...
research
09/11/2022

The Content of Statistics and Data Science Collaborations: the QQQ Framework

For today's applied statisticians and data scientists, collaboration is ...
research
04/14/2022

Delivering data differently

Human-computer interaction relies on mouse/touchpad, keyboard, and scree...

Please sign up or login with your details

Forgot password? Click here to reset