A flexible model-based framework for robust estimation of mutational signatures

07/06/2022
by   Ragnhild Laursen, et al.
0

Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. The estimation procedure is based on the expectation–maximization (EM) algorithm and regression in the log-linear quasi–Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework on three data sets of somatic mutation counts from cancer patients.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Model selection for robust learning of mutational signatures using Negative Binomial non-negative matrix factorization

The spectrum of mutations in a collection of cancer genomes can be descr...
research
11/19/2021

Identifying Population Movements with Non-Negative Matrix Factorization from Wi-Fi User Counts in Smart and Connected Cities

Non-Negative Matrix Factorization (NMF) is a valuable matrix factorizati...
research
05/16/2019

Non-negative matrix factorization based on generalized dual divergence

A theoretical framework for non-negative matrix factorization based on g...
research
12/09/2020

Data embedding and prediction by sparse tropical matrix factorization

Matrix factorization methods are linear models, with limited capability ...
research
02/22/2018

The iisignature library: efficient calculation of iterated-integral signatures and log signatures

Iterated-integral signatures and log signatures are vectors calculated f...
research
05/07/2019

Somatic mutations render human exome and pathogen DNA more similar

Immunotherapy has recently shown important clinical successes in a subst...

Please sign up or login with your details

Forgot password? Click here to reset