Fast generalised linear models by database sampling and one-step polishing

03/14/2018
by   Thomas Lumley, et al.
0

In this note, I show how to fit a generalised linear model to N observations on p variables stored in a relational database, using one sampling query and one aggregation queries, as long as N^1/2+δ observations can be stored in memory. The resulting estimator is fully efficient and asymptotically equivalent to the maximum likelihood estimator, and so its variance can be estimated from the Fisher information in the usual way. A proof-of-concept implementation uses R with MonetDB and with SQLite, and could easily be adapted to other popular databases. I illustrate the approach with examples of taxi-trip data in New York City and factors related to car colour in New Zealand.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Estimating individual admixture from finite reference databases

The concept of individual admixture (IA) assumes that the genome of indi...
research
12/17/2019

Optimality of Observed Information Adaptive Designs in Linear Models

This work considers experimental design in linear models with additive e...
research
07/19/2023

Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models

This paper investigates the properties of Quasi Maximum Likelihood estim...
research
02/23/2018

Database Aggregation

Knowledge can be represented compactly in a multitude ways, from a set o...
research
06/13/2018

Asymptotic distribution of least square estimators for linear models with dependent errors

In this paper, we consider the usual linear regression model in the case...
research
03/08/2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning

Learning from the data stored in a database is an important function inc...
research
03/08/2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report)

Learning from the data stored in a database is an important function inc...

Please sign up or login with your details

Forgot password? Click here to reset