Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

03/04/2012
by   Michael W. Mahoney, et al.
0

Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. In this article, I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. By using several case studies, I will illustrate, both theoretically and empirically, the nonobvious fact that approximate computation, in and of itself, can implicitly lead to statistical regularization. This and other recent work suggests that, by exploiting in a more principled way the statistical properties implicit in worst-case algorithms, one can in many cases satisfy the bicriteria of having algorithms that are scalable to very large-scale databases and that also have good inferential or predictive properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2010

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

In recent years, ideas from statistics and scientific computing have beg...
research
06/23/2014

A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

We consider statistical as well as algorithmic aspects of solving large-...
research
11/09/2018

A Bayesian Perspective of Statistical Machine Learning for Big Data

Statistical Machine Learning (SML) refers to a body of algorithms and me...
research
06/23/2013

A Statistical Perspective on Algorithmic Leveraging

One popular method for dealing with large-scale data sets is sampling. F...
research
11/30/2021

Black box tests for algorithmic stability

Algorithmic stability is a concept from learning theory that expresses t...
research
01/28/2022

Limitation of characterizing implicit regularization by data-independent functions

In recent years, understanding the implicit regularization of neural net...
research
01/05/2017

A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers

Regularization techniques are widely employed in optimization-based appr...

Please sign up or login with your details

Forgot password? Click here to reset