On The Problem of Relevance in Statistical Inference

04/20/2020
by   Subhadeep Mukhopadhyay, et al.
0

How many statistical inference tools we have for inference from massive data? A huge number, but only when we are ready to assume the given database is homogenous, consisting of a large cohort of "similar" cases. Why we need the homogeneity assumption? To make `learning from the experience of others' or `borrowing strength' possible. But, what if, we are dealing with a massive database of heterogeneous cases (which is a norm in almost all modern data-science applications including neuroscience, genomics, healthcare, and astronomy)? How many methods we have in this situation? Not much, if not ZERO. Why? It's not obvious how to go about gathering strength when each piece of information is fuzzy. The danger is that, if we include irrelevant cases, borrowing information might heavily damage the quality of the inference! This raises some fundamental questions for big data inference: When (not) to borrow? Whom (not) to borrow? How (not) to borrow? These questions are at the heart of the "Problem of Relevance" in statistical inference – a puzzle that has remained too little addressed since its inception nearly half a century ago. Here we offer the first practical theory of relevance with precisely describable statistical formulation and algorithm. Through examples, we demonstrate how our new statistical perspective answers previously unanswerable questions in a realistic and feasible way.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

Statistical inference as Green's functions

Statistical inference from data is foundational task in science. Recentl...
research
09/09/2015

Statistical Inference, Learning and Models in Big Data

The need for new methods to deal with big data is a common theme in most...
research
09/14/2021

Statistical Inference: The Missing Piece of RecSys Experiment Reliability Discourse

This paper calls attention to the missing component of the recommender s...
research
06/28/2019

Large-scale inference with block structure

The detection of weak and rare effects in large amounts of data arises i...
research
01/19/2023

Parametrization Cookbook: A set of Bijective Parametrizations for using Machine Learning methods in Statistical Inference

We present in this paper a way to transform a constrained statistical in...
research
02/01/2022

Quantifying Relevance in Learning and Inference

Learning is a distinctive feature of intelligent behaviour. High-through...
research
05/09/2021

Trustworthiness of statistical inference

We examine the role of trustworthiness and trust in statistical inferenc...

Please sign up or login with your details

Forgot password? Click here to reset