Local Uncertainty Sampling for Large-Scale Multi-Class Logistic Regression

04/27/2016
by   Lei Han, et al.
0

A major challenge for building statistical models in the big data era is that the available data volume may exceed the computational capability. A common approach to solve this problem is to employ a subsampled dataset that can be handled by the available computational resources. In this paper, we propose a general subsampling scheme for large-scale multi-class logistic regression, and examine the variance of the resulting estimator. We show that asymptotically, the proposed method always achieves a smaller variance than that of the uniform random sampling. Moreover, when the classes are conditional imbalanced, significant improvement over uniform sampling can be achieved. Empirical performance of the proposed method is compared to other methods on both simulated and real-world datasets, and these results match and confirm our theoretical analysis.

READ FULL TEXT
research
05/07/2019

F-measure Maximizing Logistic Regression

Logistic regression is a widely used method in several fields. When appl...
research
05/22/2018

On Coresets for Logistic Regression

Coresets are one of the central methods to facilitate the analysis of la...
research
05/17/2021

Classifying variety of customer's online engagement for churn prediction with mixed-penalty logistic regression

Using big data to analyze consumer behavior can provide effective decisi...
research
01/13/2015

Random Bits Regression: a Strong General Predictor for Big Data

To improve accuracy and speed of regressions and classifications, we pre...
research
07/31/2020

Performance of Multi-group DIF Methods in Assessing Cross-Country Score Comparability of International Large-Scale Assessments

Standardized large-scale testing can be a debatable topic, in which test...
research
06/16/2013

Local case-control sampling: Efficient subsampling in imbalanced data sets

For classification problems with significant class imbalance, subsamplin...
research
04/16/2013

Learning Heteroscedastic Models by Convex Programming under Group Sparsity

Popular sparse estimation methods based on ℓ_1-relaxation, such as the L...

Please sign up or login with your details

Forgot password? Click here to reset