Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data

06/12/2021
by   Aaron Schein, et al.
0

We present a new non-negative matrix factorization model for (0,1) bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is particularly useful for modeling DNA methylation datasets, which are typically highly dispersed and multi-modal; however, the model structure is sufficiently general that it can be adapted to many other domains where latent representations of (0,1) bounded-support data are of interest. Although the DNCB distribution lacks a closed-form conjugate prior, several augmentations let us derive an efficient posterior inference algorithm composed entirely of analytic updates. Our model improves out-of-sample predictive performance on both real and synthetic DNA methylation datasets over state-of-the-art methods in bioinformatics. In addition, our model yields meaningful latent representations that accord with existing biological knowledge.

READ FULL TEXT

page 4

page 6

page 8

research
11/07/2014

Beta Process Non-negative Matrix Factorization with Stochastic Structured Mean-Field Variational Inference

Beta process is the standard nonparametric Bayesian prior for latent fac...
research
07/15/2020

Prediction of Cancer Microarray and DNA Methylation Data using Non-negative Matrix Factorization

Over the past few years, there has been a considerable spread of microar...
research
09/12/2022

Population-Based Hierarchical Non-negative Matrix Factorization for Survey Data

Motivated by the problem of identifying potential hierarchical populatio...
research
08/09/2023

Multi-modal Multi-view Clustering based on Non-negative Matrix Factorization

By combining related objects, unsupervised machine learning techniques a...
research
02/13/2020

On Contamination of Symbolic Datasets

Data taking values on discrete sample spaces are the embodiment of moder...
research
11/03/2022

betaclust: a family of mixture models for beta valued DNA methylation data

The DNA methylation process has been extensively studied for its role in...
research
04/20/2022

A majorization-minimization algorithm for nonnegative binary matrix factorization

This paper tackles the problem of decomposing binary data using matrix f...

Please sign up or login with your details

Forgot password? Click here to reset