Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions

08/26/2022
by   Leonidas Tsepenekas, et al.
0

Similarity functions measure how comparable pairs of elements are, and play a key role in a wide variety of applications, e.g., Clustering problems and considerations of Individual Fairness. However, access to an accurate similarity function should not always be considered guaranteed. Specifically, when the elements to be compared are produced by different distributions, or in other words belong to different “demographic” groups, knowledge of their true similarity might be very difficult to obtain. In this work, we present a sampling framework that learns these across-groups similarity functions, using only a limited amount of experts' feedback. We show analytical results with rigorous bounds, and empirically validate our algorithms via a large suite of experiments.

READ FULL TEXT

page 8

page 9

page 12

page 14

page 15

page 16

page 17

research
04/29/2016

An expressive dissimilarity measure for relational clustering using neighbourhood trees

Clustering is an underspecified task: there are no universal criteria fo...
research
06/22/2020

Distributional Individual Fairness in Clustering

In this paper, we initiate the study of fair clustering that ensures dis...
research
06/23/2017

Query Complexity of Clustering with Side Information

Suppose, we are given a set of n elements to be clustered into k (unknow...
research
10/22/2019

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Clustering is a difficult and widely-studied data mining task, with many...
research
01/07/2022

Generalized quantum similarity learning

The similarity between objects is significant in a broad range of areas....
research
02/18/2020

An Overview of Distance and Similarity Functions for Structured Data

The notions of distance and similarity play a key role in many machine l...
research
07/29/2022

Similarity matrix average for aggregating multiplex networks

We introduce a methodology based on averaging similarity matrices with t...

Please sign up or login with your details

Forgot password? Click here to reset