Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

by   Maurits Kaptein, et al.

In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We propose a method for estimating a finite mixture of logistic regression models which can be used to cluster customers based on a continuous stream of responses. This method, which we coin oFMLR, allows segments to be identified in data streams or extremely large static datasets. Contrary to black box algorithms, oFMLR provides model estimates that are directly interpretable. We first introduce oFMLR, explaining in passing general topics such as online estimation and the EM algorithm, making this paper a high level overview of possible methods of dealing with large data streams in marketing practice. Next, we discuss model convergence, identifiability, and relations to alternative, Bayesian, methods; we also identify more general issues that arise from dealing with continuously augmented data sets. Finally, we introduce the oFMLR [R] package and evaluate the method by numerical simulation and by analyzing a large customer clickstream dataset.


page 1

page 2

page 3

page 4


Liu-type Shrinkage Estimators for Mixture of Logistic Regressions: An Osteoporosis Study

The logistic regression model is one of the most powerful statistical me...

Conjugate priors and bias reduction for logistic regression models

Logistic regression models for binomial responses are routinely used in ...

Qini-based Uplift Regression

Uplift models provide a solution to the problem of isolating the marketi...

Volumes of logistic regression models with applications to model selection

Logistic regression models with n observations and q linearly-independen...

Achieving Approximate Soft Clustering in Data Streams

In recent years, data streaming has gained prominence due to advances in...

Learning state machines via efficient hashing of future traces

State machines are popular models to model and visualize discrete system...

Please sign up or login with your details

Forgot password? Click here to reset