betaclust: a family of mixture models for beta valued DNA methylation data

11/03/2022
by   Koyel Majumdar, et al.
0

The DNA methylation process has been extensively studied for its role in cancer. Promoter cytosine-guanine dinucleotide (CpG) island hypermethylation has been shown to silence tumour suppressor genes. Identifying the differentially methylated CpG (DMC) sites between benign and tumour samples can help understand the disease. The EPIC microarray quantifies the methylation level at a CpG site as a beta value which lies within [0,1). There is a lack of suitable methods for modelling the beta values in their innate form. The DMCs are identified via multiple t-tests but this can be computationally expensive. Also, arbitrary thresholds are often selected and used to identify the methylation state of a CpG site. We propose a family of novel beta mixture models (BMMs) which use a model-based clustering approach to cluster the CpG sites in their innate beta form to (i) objectively identify methylation state thresholds and (ii) identify the DMCs between different samples. The family of BMMs employs different parameter constraints that are applicable to different study settings. Parameter estimation proceeds via an EM algorithm, with a novel approximation during the M-step providing tractability and computational feasibility. Performance of the BMMs is assessed through a thorough simulation study, and the BMMs are used to analyse a prostate cancer dataset and an esophageal squamous cell carcinoma dataset. The BMM approach objectively identifies methylation state thresholds and identifies more DMCs between the benign and tumour samples in both cancer datasets than conventional methods, in a computationally efficient manner. The empirical cumulative distribution function of the DMCs related to genes implicated in carcinogenesis indicates hypermethylation of CpG sites in the tumour samples in both cancer settings. An R package betaclust is provided to facilitate the use of the developed BMMs.

READ FULL TEXT

page 26

page 27

page 29

page 30

page 31

page 32

page 33

page 37

research
10/12/2019

Identifying Epigenetic Signature of Breast Cancer with Machine Learning

The research reported in this paper identifies the epigenetic biomarker ...
research
04/11/2022

Nonparametric Bayes Differential Analysis of Multigroup DNA Methylation Data

DNA methylation datasets in cancer studies are comprised of sample measu...
research
12/21/2018

Pan-Cancer Epigenetic Biomarker Selection from Blood Samples Using SAS

A key focus in current cancer research is the discovery of cancer biomar...
research
09/29/2020

On a new test of fit to the beta distribution

We propose a new L^2-type goodness-of-fit test for the family of beta di...
research
06/12/2021

Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data

We present a new non-negative matrix factorization model for (0,1) bound...
research
07/20/2020

i6mA-CNN: a convolution based computational approach towards identification of DNA N6-methyladenine sites in rice genome

Motivation: DNA N6-methylation (6mA) in Adenine nucleotide is a post rep...

Please sign up or login with your details

Forgot password? Click here to reset