Representative Selection in Non Metric Datasets

02/26/2015
by   Elad Liebman, et al.
0

This paper considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements. This subset needs to inherently reflect the type of information contained in the entire set, while minimizing redundancy. For such purposes, clustering may seem like a natural approach. However, existing clustering methods are not ideally suited for representative selection, especially when dealing with non-metric data, where only a pairwise similarity measure exists. In this paper we propose δ-medoids, a novel approach that can be viewed as an extension to the k-medoids algorithm and is specifically suited for sample representative selection from non-metric data. We empirically validate δ-medoids in two domains, namely music analysis and motion analysis. We also show some theoretical bounds on the performance of δ-medoids and the hardness of representative selection in general.

READ FULL TEXT
research
08/19/2021

Clustering-Based Subset Selection in Evolutionary Multiobjective Optimization

Subset selection is an important component in evolutionary multiobjectiv...
research
07/03/2021

Cluster Representatives Selection in Non-Metric Spaces for Nearest Prototype Classification

The nearest prototype classification is a less computationally intensive...
research
05/07/2014

Representative Selection for Big Data via Sparse Graph and Geodesic Grassmann Manifold Distance

This paper addresses the problem of identifying a very small subset of d...
research
07/25/2014

Dissimilarity-based Sparse Subset Selection

Finding an informative subset of a large collection of data points or mo...
research
03/12/2020

A Multi-criteria Approach for Fast and Outlier-aware Representative Selection from Manifolds

The problem of representative selection amounts to sampling few informat...
research
12/06/2022

An Unsupervised Machine Learning Approach for Ground Motion Clustering and Selection

Clustering analysis of sequence data continues to address many applicati...
research
07/19/2020

GRMR: Generalized Regret-Minimizing Representatives

Extracting a small subset of representative tuples from a large database...

Please sign up or login with your details

Forgot password? Click here to reset