Algorithms for Generalized Cluster-wise Linear Regression

07/05/2016
by   Young Woong Park, et al.
0

Cluster-wise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. We generalize the CLR problem by allowing each entity to have more than one observation, and refer to it as generalized CLR. We propose an exact mathematical programming based approach relying on column generation, a column generation based heuristic algorithm that clusters predefined groups of entities, a metaheuristic genetic algorithm with adapted Lloyd's algorithm for K-means clustering, a two-stage approach, and a modified algorithm of Späth Spath1979 for solving generalized CLR. We examine the performance of our algorithms on a stock keeping unit (SKU) clustering problem employed in forecasting halo and cannibalization effects in promotions using real-world retail data from a large supermarket chain. In the SKU clustering problem, the retailer needs to cluster SKUs based on their seasonal effects in response to promotions. The seasonal effects are the results of regressions with predictors being promotion mechanisms and seasonal dummies performed over clusters generated. We compare the performance of all proposed algorithms for the SKU problem with real-world and synthetic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2020

Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis

The goal of co-clustering is to simultaneously identify a clustering of ...
research
12/29/2022

Cluster-level Group Representativity Fairness in k-means Clustering

There has been much interest recently in developing fair clustering algo...
research
04/30/2021

Performance evaluation results of evolutionary clustering algorithm star for clustering heterogeneous datasets

This article presents the data used to evaluate the performance of evolu...
research
01/24/2023

Generating Multidimensional Clusters With Support Lines

Synthetic data is essential for assessing clustering techniques, complem...
research
05/07/2023

A Generalized Framework for Predictive Clustering and Optimization

Clustering is a powerful and extensively used data science tool. While c...
research
09/14/2020

Performance Evaluation of Linear Regression Algorithm in Cluster Environment

Cluster computing was introduced to replace the superiority of super com...
research
03/23/2011

Clustered regression with unknown clusters

We consider a collection of prediction experiments, which are clustered ...

Please sign up or login with your details

Forgot password? Click here to reset