Semi-Supervised Clustering with Inaccurate Pairwise Annotations

04/05/2021
by   Daniel Gribel, et al.
0

Pairwise relational information is a useful way of providing partial supervision in domains where class labels are difficult to acquire. This work presents a clustering model that incorporates pairwise annotations in the form of must-link and cannot-link relations and considers possible annotation inaccuracies (i.e., a common setting when experts provide pairwise supervision). We propose a generative model that assumes Gaussian-distributed data samples along with must-link and cannot-link relations generated by stochastic block models. We adopt a maximum-likelihood approach and demonstrate that, even when supervision is weak and inaccurate, accounting for relational information significantly improves clustering performance. Relational information also helps to detect meaningful groups in real-world datasets that do not fit the original data-distribution assumptions. Additionally, we extend the model to integrate prior knowledge of experts' accuracy and discuss circumstances in which the use of this knowledge is beneficial.

READ FULL TEXT

page 9

page 12

research
01/18/2020

A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints

In this paper, we introduce a neural network framework for semi-supervis...
research
10/29/2018

Semi-crowdsourced Clustering with Deep Generative Models

We consider the semi-supervised clustering problem where crowdsourcing p...
research
11/27/2020

Relation Clustering in Narrative Knowledge Graphs

When coping with literary texts such as novels or short stories, the ext...
research
05/26/2021

Exploring dual information in distance metric learning for clustering

Distance metric learning algorithms aim to appropriately measure similar...
research
04/26/2023

Diffsurv: Differentiable sorting for censored time-to-event data

Survival analysis is a crucial semi-supervised task in machine learning ...
research
07/28/2019

Probabilistic Models of Relational Implication

Relational data in its most basic form is a static collection of known f...
research
05/30/2023

Deep Clustering with Incomplete Noisy Pairwise Annotations: A Geometric Regularization Approach

The recent integration of deep learning and pairwise similarity annotati...

Please sign up or login with your details

Forgot password? Click here to reset