Multiple Learning for Regression in big data

03/03/2019
by   Xiang Liu, et al.
0

Regression problems that have closed-form solutions are well understood and can be easily implemented when the dataset is small enough to be all loaded into the RAM. Challenges arise when data is too big to be stored in RAM to compute the closed form solutions. Many techniques were proposed to overcome or alleviate the memory barrier problem but the solutions are often local optimal. In addition, most approaches require accessing the raw data again when updating the models. Parallel computing clusters are also expected if multiple models need to be computed simultaneously. We propose multiple learning approaches that utilize an array of sufficient statistics (SS) to address this big data challenge. This memory oblivious approach breaks the memory barrier when computing regressions with closed-form solutions, including but not limited to linear regression, weighted linear regression, linear regression with Box-Cox transformation (Box-Cox regression) and ridge regression models. The computation and update of the SS array can be handled at per row level or per mini-batch level. And updating a model is as easy as matrix addition and subtraction. Furthermore, multiple SS arrays for different models can be easily computed simultaneously to obtain multiple models at one pass through the dataset. We implemented our approaches on Spark and evaluated over the simulated datasets. Results showed our approaches can achieve closed-form solutions of multiple models at the cost of half training time of the traditional methods for a single model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2021

Conjugate priors for count and rounded data regression

Discrete data are abundant and often arise as counts or rounded data. Ye...
research
05/27/2021

Towards a Better Understanding of Linear Models for Recommendation

Recently, linear regression models, such as EASE and SLIM, have shown to...
research
09/01/2023

Information-based Optimal Subdata Selection for Clusterwise Linear Regression

Mixture-of-Experts models are commonly used when there exist distinct cl...
research
05/30/2021

Orthogonal Subsampling for Big Data Linear Regression

The dramatic growth of big datasets presents a new challenge to data sto...
research
12/05/2016

Support vector regression model for BigData systems

Nowadays Big Data are becoming more and more important. Many sectors of ...
research
05/05/2020

One-step regression and classification with crosspoint resistive memory arrays

Machine learning has been getting a large attention in the recent years,...
research
05/18/2021

Achieving Fairness with a Simple Ridge Penalty

Estimating a fair linear regression model subject to a user-defined leve...

Please sign up or login with your details

Forgot password? Click here to reset