On the selection of optimal subdata for big data regression based on leverage scores

05/02/2023
by   Vasilis Chasiotis, et al.
0

Regression can be really difficult in case of big datasets, since we have to dealt with huge volumes of data. The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper we consider an approach based on leverages scores, already existing in the current literature. The aforementioned approach proposed in order to select subdata for linear model discrimination. However, we highlight its importance on the selection of data points that are the most informative for estimating unknown parameters. We conclude that the approach based on leverage scores improves existing approaches, providing simulation experiments as well as a real data application.

READ FULL TEXT

page 5

page 8

page 9

page 10

page 11

page 12

research
04/29/2023

Subdata selection for big data regression: an improved approach

In the big data era researchers face a series of problems. Even standard...
research
08/08/2020

Scalable model selection for spatial additive mixed modeling: application to crime analysis

A rapid growth in spatial open datasets has led to a huge demand for reg...
research
03/31/2020

An Approach for Selecting Cloud Service Adequate to Big Data Case Study: E-health Context

The expanding Cloud computing's services offers great opportunities for ...
research
05/30/2021

Orthogonal Subsampling for Big Data Linear Regression

The dramatic growth of big datasets presents a new challenge to data sto...
research
11/26/2017

Obtaining the coefficients of a Vector Autoregression Model through minimization of parameter criteria

VAR models are a type of multi-equation model that have been widely appl...
research
05/05/2015

On the Feasibility of Distributed Kernel Regression for Big Data

In modern scientific research, massive datasets with huge numbers of obs...
research
10/15/2018

Bounding Entities within Dense Subtensors

Group-based fraud detection is a promising methodology to catch frauds o...

Please sign up or login with your details

Forgot password? Click here to reset