Logistic regression models for aggregated data

12/09/2019
by   Tom Whitaker, et al.
0

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from symbolic data analysis to summarise the collection of predictor variables into histogram form, and perform inference on this summary dataset. We develop ideas based on composite likelihoods to derive an efficient one-versus-rest approximate composite likelihood model for histogram-based random variables, constructed from low-dimensional marginal histograms obtained from the full histogram. We demonstrate that this procedure can achieve comparable classification rates compared to the standard full data multinomial analysis and against state-of-the-art subsampling algorithms for logistic regression, but at a substantially lower computational cost. Performance is explored through simulated examples, and analyses of large supersymmetry and satellite crop classification datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2021

Scalable Econometrics on Big Data – The Logistic Regression on Spark

Extra-large datasets are becoming increasingly accessible, and computing...
research
08/30/2019

Composite likelihood methods for histogram-valued random variables

Symbolic data analysis has been proposed as a technique for summarising ...
research
05/08/2019

Regression from Dependent Observations

The standard linear and logistic regression models assume that the respo...
research
12/04/2019

Bayesian Group Selection in Logistic Regression with Application to MRI Data Analysis

We consider Bayesian logistic regression models with group-structured co...
research
10/15/2018

Adversarial Learning and Explainability in Structured Datasets

We theoretically and empirically explore the explainability benefits of ...
research
07/09/2018

Data Likelihood of Active Fires Satellite Detection and Applications to Ignition Estimation and Data Assimilation

Data likelihood of fire detection is the probability of the observed det...

Please sign up or login with your details

Forgot password? Click here to reset