Adjacency-constrained hierarchical clustering of a band similarity matrix with application to Genomics

02/05/2019
by   Christophe Ambroise, et al.
0

Motivation: Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. A major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of 10^4 to 10^5 for each chromosome. Results: By assuming that the similarity between physically distant objects is negligible, we propose an implementation of this adjacency-constrained HAC with quasi-linear complexity. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds. Availability and Implementation: Software and sample data are available as an R package, adjclust, that can be downloaded from the Comprehensive R Archive Network (CRAN).

READ FULL TEXT
research
05/09/2015

Relations Between Adjacency and Modularity Graph Partitioning

In this paper the exact linear relation between the leading eigenvector ...
research
01/13/2022

The R Package HCV for Hierarchical Clustering from Vertex-links

The HCV package implements the hierarchical clustering for spatial data....
research
06/20/2020

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Goal-conditioned hierarchical reinforcement learning (HRL) is a promisin...
research
01/10/2018

A Polynomial Algorithm for Balanced Clustering via Graph Partitioning

The objective of clustering is to discover natural groups in datasets an...
research
06/15/2017

Face Clustering: Representation and Pairwise Constraints

Clustering face images according to their identity has two important app...
research
09/05/2023

Data Aggregation for Hierarchical Clustering

Hierarchical Agglomerative Clustering (HAC) is likely the earliest and m...
research
11/12/2021

A comprehensive study of clustering a class of 2D shapes

The paper concerns clustering with respect to the shape and size of 2D c...

Please sign up or login with your details

Forgot password? Click here to reset