A model robust sub-sampling approach for Generalised Linear Models in Big data settings

07/29/2022
by   Amalan Mahendran, et al.
0

In today's modern era of Big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is sub-sampling, where a subset of the Big data is analysed and used as the basis for inference rather than considering the whole data set. A key question when applying sub-sampling approaches is how to select an informative subset based on the questions being asked of the data. A recent approach for this has been proposed based on determining sub-sampling probabilities for each data point, but a limitation of this approach is that appropriate sub-sampling probabilities rely on an assumed model for the Big data. In this article, to overcome this limitation, we propose a model robust approach where a set of models is considered, and the sub-sampling probabilities are evaluated based on the weighted average of probabilities that would be obtained if each model was considered singularly. Theoretical support for such an approach is provided. Our model robust sub-sampling approach is applied in a simulation study and in two real world applications where performance is compared to current sub-sampling practices. The results show that our model robust approach outperforms alternative approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

A Survey of Bayesian Statistical Approaches for Big Data

The modern era is characterised as an era of information or Big Data. Th...
research
12/12/2021

Markov subsampling based Huber Criterion

Subsampling is an important technique to tackle the computational challe...
research
12/03/2019

Less Is Better: Unweighted Data Subsampling via Influence Function

In the time of Big Data, training complex models on large-scale data set...
research
04/17/2019

School management information systems: Challenges to educational decision-making in the big data era

Despite the benefits of school management information systems (SMIS), th...
research
10/17/2019

ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance

Configuration space complexity makes the big-data software systems hard ...
research
11/01/2018

Score-Matching Representative Approach for Big Data Analysis with Generalized Linear Models

We propose a fast and efficient strategy, called the representative appr...
research
10/13/2022

We need to talk about nonprobability samples

It is well known that, in most circumstances, probability sampling is th...

Please sign up or login with your details

Forgot password? Click here to reset