Distributed Parameter Map-Reduce

10/03/2015
by   Qi Li, et al.
0

This paper describes how to convert a machine learning problem into a series of map-reduce tasks. We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent and each sample is assigned a probability. Parameters are obtained by maxmizing the product of all sample probabilities. Rapid expansion of training samples brings challenges to machine learning method. Training samples are so many that they can be only stored in distributed file system and driven by map-reduce style programs. The main step of logistic regression is inference. According to map-reduce spirit, each sample makes inference through a separate map procedure. But the premise of inference is that the map procedure holds parameters for all features in the sample. In this paper, we propose Distributed Parameter Map-Reduce, in which not only samples, but also parameters are distributed in nodes of distributed filesystem. Through a series of map-reduce tasks, we assign each sample parameters for its features, make inference for the sample and update paramters of the model. The above processes are excuted looply until convergence. We test the proposed algorithm in actual hadoop production environment. Experiments show that the acceleration of the algorithm is in linear relationship with the number of cluster nodes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2014

Distributed Coordinate Descent for L1-regularized Logistic Regression

Solving logistic regression with L1-regularization in distributed settin...
research
07/03/2014

Structured Learning via Logistic Regression

A successful approach to structured learning is to write the learning ob...
research
06/28/2016

A Learning Algorithm for Relational Logistic Regression: Preliminary Results

Relational logistic regression (RLR) is a representation of conditional ...
research
08/19/2022

Meta Learning for High-dimensional Ising Model Selection Using ℓ_1-regularized Logistic Regression

In this paper, we consider the meta learning problem for estimating the ...
research
05/29/2022

A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension

Identifying the relevant variables for a classification model with corre...
research
04/16/2019

An efficient stochastic Newton algorithm for parameter estimation in logistic regressions

Logistic regression is a well-known statistical model which is commonly ...
research
08/07/2018

A distributed regression analysis application based on SAS software. Part I: Linear and logistic regression

Previous work has demonstrated the feasibility and value of conducting d...

Please sign up or login with your details

Forgot password? Click here to reset