Distributed Logistic Regression for Massive Data with Rare Events

04/05/2023
by   Xuetong Li, et al.
0

Large-scale rare events data are commonly encountered in practice. To tackle the massive rare events data, we propose a novel distributed estimation method for logistic regression in a distributed system. For a distributed framework, we face the following two challenges. The first challenge is how to distribute the data. In this regard, two different distribution strategies (i.e., the RANDOM strategy and the COPY strategy) are investigated. The second challenge is how to select an appropriate type of objective function so that the best asymptotic efficiency can be achieved. Then, the under-sampled (US) and inverse probability weighted (IPW) types of objective functions are considered. Our results suggest that the COPY strategy together with the IPW objective function is the best solution for distributed logistic regression with rare events. The finite sample performance of the distributed methods is demonstrated by simulation studies and a real-world Sweden Traffic Sign dataset.

READ FULL TEXT

page 3

page 25

page 28

research
06/01/2020

Logistic Regression for Massive Data with Rare Events

This paper studies binary logistic regression for rare events data, or i...
research
04/16/2016

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

Scaling multinomial logistic regression to datasets with very large numb...
research
10/09/2020

Sparse network asymptotics for logistic regression

Consider a bipartite network where N consumers choose to buy or not to b...
research
01/19/2021

Firth's logistic regression with rare events: accurate effect estimates AND predictions?

Firth-type logistic regression has become a standard approach for the an...
research
08/14/2019

Least Squares Approximation for a Distributed System

In this work we develop a distributed least squares approximation (DLSA)...
research
05/30/2023

Predicting Rare Events by Shrinking Towards Proportional Odds

Training classifiers is difficult with severe class imbalance, but many ...
research
08/07/2018

A distributed regression analysis application based on SAS software. Part I: Linear and logistic regression

Previous work has demonstrated the feasibility and value of conducting d...

Please sign up or login with your details

Forgot password? Click here to reset