A Multi-criteria Approach for Fast and Outlier-aware Representative Selection from Manifolds

03/12/2020
by   Mahlagha Sedghi, et al.
0

The problem of representative selection amounts to sampling few informative exemplars from large datasets. This paper presents MOSAIC, a novel representative selection approach from high-dimensional data that may exhibit non-linear structures. Resting upon a novel quadratic formulation, Our method advances a multi-criteria selection approach that maximizes the global representation power of the sampled subset, ensures diversity, and rejects disruptive information by effectively detecting outliers. Through theoretical analyses we characterize the obtained sketch and reveal that the sampled representatives maximize a well-defined notion of data coverage in a transformed space. In addition, we present a highly scalable randomized implementation of the proposed algorithm shown to bring about substantial speedups. MOSAIC's superiority in achieving the desired characteristics of a representative subset all at once while exhibiting remarkable robustness to various outlier types is demonstrated via extensive experiments conducted on both real and synthetic data with comparisons to state-of-the-art algorithms.

READ FULL TEXT

page 5

page 6

research
12/24/2022

Unsupervised Instance and Subnetwork Selection for Network Data

Unlike tabular data, features in network data are interconnected within ...
research
04/15/2020

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

Benchmarking unsupervised outlier detection is difficult. Outliers are r...
research
06/03/2023

DOS: Diverse Outlier Sampling for Out-of-Distribution Detection

Modern neural networks are known to give overconfident prediction for ou...
research
02/26/2015

Representative Selection in Non Metric Datasets

This paper considers the problem of representative selection: choosing a...
research
09/12/2014

10,000+ Times Accelerated Robust Subset Selection (ARSS)

Subset selection from massive data with noised information is increasing...
research
11/25/2014

Similarity- based approach for outlier detection

This paper presents a new approach for detecting outliers by introducing...
research
11/18/2016

Robust and Scalable Column/Row Sampling from Corrupted Big Data

Conventional sampling techniques fall short of drawing descriptive sketc...

Please sign up or login with your details

Forgot password? Click here to reset