The algebra and machine representation of statistical models

06/16/2020
by   Evan Patterson, et al.
0

As the twin movements of open science and open source bring an ever greater share of the scientific process into the digital realm, new opportunities arise for the meta-scientific study of science itself, including of data science and statistics. Future science will likely see machines play an active role in processing, organizing, and perhaps even creating scientific knowledge. To make this possible, large engineering efforts must be undertaken to transform scientific artifacts into useful computational resources, and conceptual advances must be made in the organization of scientific theories, models, experiments, and data. This dissertation takes steps toward digitizing and systematizing two major artifacts of data science, statistical models and data analyses. Using tools from algebra, particularly categorical logic, a precise analogy is drawn between models in statistics and logic, enabling statistical models to be seen as models of theories, in the logical sense. Statistical theories, being algebraic structures, are amenable to machine representation and are equipped with morphisms that formalize the relations between different statistical methods. Turning from mathematics to engineering, a software system for creating machine representations of data analyses, in the form of Python or R programs, is designed and implemented. The representations aim to capture the semantics of data analyses, independent of the programming language and libraries in which they are implemented.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2019

A literature survey of matrix methods for data science

Efficient numerical linear algebra is a core ingredient in many applicat...
research
09/24/2018

Computational and informatics advances for reproducible data analysis in neuroimaging

The reproducibility of scientific research has become a point of critica...
research
07/06/2023

What Should Data Science Education Do with Large Language Models?

The rapid advances of large language models (LLMs), such as ChatGPT, are...
research
06/23/2019

Algebraic Statistics in Practice: Applications to Networks

Algebraic statistics uses tools from algebra (especially from multilinea...
research
02/04/2023

Computational philosophy of science

Philosophy of science attempts to describe all parts of the scientific p...
research
12/30/2019

Expanding the scope of statistical computing: Training statisticians to be software engineers

Traditionally, statistical computing courses have taught the syntax of a...
research
04/10/2020

On Strong Scaling and Open Source Tools for Analyzing Atom Probe Tomography Data

Atom probe tomography (APT) has matured to a versatile nanoanalytical ch...

Please sign up or login with your details

Forgot password? Click here to reset