Probabilistically Sampled and Spectrally Clustered Plant Genotypes using Phenotypic Characteristics

09/18/2020
by   Aditya A. Shastri, et al.
0

Clustering genotypes based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently given promising results for genome data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant genotypes, we test it on the phenotypic data obtained from about 2400 Soybean genotypes. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45 clustering accuracy. The computational complexity of our algorithm is more than a magnitude lesser than HC.

READ FULL TEXT

page 1

page 12

research
09/30/2018

Vector Quantized Spectral Clustering applied to Soybean Whole Genome Sequences

We develop a Vector Quantized Spectral Clustering (VQSC) algorithm that ...
research
08/23/2021

Cube Sampled K-Prototype Clustering for Featured Data

Clustering large amount of data is becoming increasingly important in th...
research
09/15/2020

Approximate spectral clustering using both reference vectors and topology of the network generated by growing neural gas

Spectral clustering (SC) is one of the most popular clustering methods a...
research
10/06/2015

Large-scale subspace clustering using sketching and validation

The nowadays massive amounts of generated and communicated data present ...
research
07/22/2017

Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents uni...
research
07/21/2020

Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling

Spectral clustering has shown a superior performance in analyzing the cl...
research
05/25/2023

Efficient Approximation Algorithms for Spanning Centrality

Given a graph 𝒢, the spanning centrality (SC) of an edge e measures the ...

Please sign up or login with your details

Forgot password? Click here to reset