Dynamic User Segmentation and Usage Profiling

05/27/2023
by   Animesh Mitra, et al.
0

Usage data of a group of users distributed across a number of categories, such as songs, movies, webpages, links, regular household products, mobile apps, games, etc. can be ultra-high dimensional and massive in size. More often this kind of data is categorical and sparse in nature making it even more difficult to interpret any underlying hidden patterns such as clusters of users. However, if this information can be estimated accurately, it will have huge impacts in different business areas such as user recommendations for apps, songs, movies, and other similar products, health analytics using electronic health record (EHR) data, and driver profiling for insurance premium estimation or fleet management. In this work, we propose a clustering strategy of such categorical big data, utilizing the hidden sparsity of the dataset. Most traditional clustering methods fail to give proper clusters for such data and end up giving one big cluster with small clusters around it irrespective of the true structure of the data clusters. We propose a feature transformation, which maps the binary-valued usage vector to a lower dimensional continuous feature space in terms of groups of usage categories, termed as covariate classes. The lower dimensional feature representations in terms of covariate classes can be used for clustering. We implemented the proposed strategy and applied it to a large sized very high-dimensional song playlist dataset for the performance validation. The results are impressive as we achieved similar-sized user clusters with minimal between-cluster overlap in the feature space (8 average). As the proposed strategy has a very generic framework, it can be utilized as the analytic engine of many of the above-mentioned business use cases allowing an intelligent and dynamic personal recommendation system or a support system for smart business decision-making.

READ FULL TEXT
research
09/04/2019

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

Estimating the number of clusters (K) is a critical and often difficult ...
research
06/07/2019

Adaptive Nonparametric Variational Autoencoder

Clustering is used to find structure in unlabeled data by grouping simil...
research
11/17/2020

Peer groups for organisational learning: clustering with practical constraints

Peer-grouping is used in many sectors for organisational learning, polic...
research
06/21/2022

BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space

The lack of freely available (real-life or synthetic) high or ultra-high...
research
08/15/2023

Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems

Ensuring the reliability of cloud systems is critical for both cloud ven...

Please sign up or login with your details

Forgot password? Click here to reset