Statistical inference for sketching algorithms

06/06/2023
by   R. P. Browne, et al.
0

Sketching algorithms use random projections to generate a smaller sketched data set, often for the purposes of modelling. Complete and partial sketch regression estimates can be constructed using information from only the sketched data set or a combination of the full and sketched data sets. Previous work has obtained the distribution of these estimators under repeated sketching, along with the first two moments for both estimators. Using a different approach, we also derive the distribution of the complete sketch estimator, but additionally consider the error term under both repeated sketching and sampling. Importantly, we obtain pivotal quantities which are based solely on the sketched data set which specifically not requiring information from the full data model fit. These pivotal quantities can be used for inference on the full data set regression estimates or the model parameters. For partial sketching, we derive pivotal quantities for a marginal test and an approximate distribution for the partial sketch under repeated sketching or repeated sampling, again avoiding reliance on a full data model fit. We extend these results to include the Hadamard and Clarkson-Woodruff sketches then compare them in a simulation study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2021

Repeated measurements with unintended feedback: The Dutch new herring scandals

An econometric analysis of consumer research data which hit newspaper he...
research
11/09/2018

Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions

The Count-Min sketch is an important and well-studied data summarization...
research
07/20/2023

A Framework for Statistical Inference via Randomized Algorithms

Randomized algorithms, such as randomized sketching or projections, are ...
research
07/21/2020

A Generalized Hosmer-Lemeshow Goodness-of-Fit Test for a Family of Generalized Linear Models

Generalized linear models (GLMs) are used within a vast number of applic...
research
07/03/2019

An Econometric Perspective of Algorithmic Sampling

Datasets that are terabytes in size are increasingly common, but compute...
research
11/09/2022

Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability

A flexible method is developed to construct a confidence interval for th...
research
11/06/2018

Frank-Wolfe Algorithm for Exemplar Selection

In this paper, we consider the problem of selecting representatives from...

Please sign up or login with your details

Forgot password? Click here to reset