Pair-Wise Cluster Analysis

09/19/2010
by   David R. Hardoon, et al.
0

This paper studies the problem of learning clusters which are consistently present in different (continuously valued) representations of observed data. Our setup differs slightly from the standard approach of (co-) clustering as we use the fact that some form of `labeling' becomes available in this setup: a cluster is only interesting if it has a counterpart in the alternative representation. The contribution of this paper is twofold: (i) the problem setting is explored and an analysis in terms of the PAC-Bayesian theorem is presented, (ii) a practical kernel-based algorithm is derived exploiting the inherent relation to Canonical Correlation Analysis (CCA), as well as its extension to multiple views. A content based information retrieval (CBIR) case study is presented on the multi-lingual aligned Europal document dataset which supports the above findings.

READ FULL TEXT

page 2

page 11

research
04/12/2018

Learning Multilingual Embeddings for Cross-Lingual Information Retrieval in the Presence of Topically Aligned Corpora

Cross-lingual information retrieval is a challenging task in the absence...
research
05/24/2018

An experimental comparison of label selection methods for hierarchical document clusters

The focus of this paper is on the evaluation of sixteen labeling methods...
research
04/15/2019

Multiple kernel learning for integrative consensus clustering of genomic datasets

Diverse applications - particularly in tumour subtyping - have demonstra...
research
11/29/2018

Robust Bayesian Cluster Enumeration

A major challenge in cluster analysis is that the number of data cluster...
research
09/02/2010

A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

We formulate weighted graph clustering as a prediction problem: given a ...
research
08/28/2012

Document Clustering Evaluation: Divergence from a Random Baseline

Divergence from a random baseline is a technique for the evaluation of d...
research
04/23/2015

svcR: An R Package for Support Vector Clustering improved with Geometric Hashing applied to Lexical Pattern Discovery

We present a new R package which takes a numerical matrix format as data...

Please sign up or login with your details

Forgot password? Click here to reset