Context binning, model clustering and adaptivity for data compression of genetic data

01/13/2022
by   Jarek Duda, et al.
0

Rapid growth of genetic databases means huge savings from improvements in their data compression, what requires better inexpensive statistical models. This article proposes automatized optimizations e.g. of Markov-like models, especially context binning and model clustering. While it is popular to cut low bits of context, proposed context binning optimizes such reduction as tabled: state=bin[context] determining probability distribution, this way extracting nearly all useful information also from very large contexts, into a small number of states. Model clustering uses k-means clustering in space of general statistical models, allowing to optimize a few models (as cluster centroids) to be chosen e.g. separately for each read. There are also briefly discussed some adaptivity techniques to include data non-stationarity. This article is work in progress, to be expanded in the future.

READ FULL TEXT
research
11/27/2019

K-MACE and Kernel K-MACE Clustering

Determining the correct number of clusters (CNC) is an important task in...
research
05/28/2019

Parametric context adaptive Laplace distribution for multimedia compression

Data compression often subtracts predictor and encodes the difference (r...
research
07/05/2018

Model-based Clustering

Mixture models extend the toolbox of clustering methods available to the...
research
12/01/2022

Clustering – Basic concepts and methods

We review clustering as an analysis tool and the underlying concepts fro...
research
08/09/2020

Generalized k-Means in GLMs with Applications to the Outbreak of COVID-19 in the United States

Generalized k-means can be incorporated with any similarity or dissimila...
research
01/11/2018

Quantization/clustering: when and why does k-means work?

Though mostly used as a clustering algorithm, k-means are originally des...
research
12/17/2004

Map Segmentation by Colour Cube Genetic K-Mean Clustering

Segmentation of a colour image composed of different kinds of texture re...

Please sign up or login with your details

Forgot password? Click here to reset