Scalable Econometrics on Big Data – The Logistic Regression on Spark

06/18/2021
by   Aurélien Ouattara, et al.
0

Extra-large datasets are becoming increasingly accessible, and computing tools designed to handle huge amount of data efficiently are democratizing rapidly. However, conventional statistical and econometric tools are still lacking fluency when dealing with such large datasets. This paper dives into econometrics on big datasets, specifically focusing on the logistic regression on Spark. We review the robustness of the functions available in Spark to fit logistic regression and introduce a package that we developed in PySpark which returns the statistical summary of the logistic regression, necessary for statistical inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2021

SIHR: An R Package for Statistical Inference in High-dimensional Linear and Logistic Regression Models

We introduce and illustrate through numerical examples the R package wh...
research
12/09/2019

Logistic regression models for aggregated data

Logistic regression models are a popular and effective method to predict...
research
05/17/2021

Classifying variety of customer's online engagement for churn prediction with mixed-penalty logistic regression

Using big data to analyze consumer behavior can provide effective decisi...
research
12/14/2021

Data-driven chimney fire risk prediction using machine learning and point process tools

Chimney fires constitute one of the most commonly occurring fire types. ...
research
03/11/2022

Sampling Bias Correction for Supervised Machine Learning: A Bayesian Inference Approach with Practical Applications

Given a supervised machine learning problem where the training set has b...
research
08/07/2018

A distributed regression analysis application based on SAS software. Part I: Linear and logistic regression

Previous work has demonstrated the feasibility and value of conducting d...
research
08/19/2023

High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation Comparison

We present a versatile GPU-based parallel version of Logistic Regression...

Please sign up or login with your details

Forgot password? Click here to reset