Survival analysis of DNA mutation motifs with penalized proportional hazards

11/11/2017
by   Jean Feng, et al.
0

Antibodies, an essential part of our immune system, develop in an intricate process to guarantee a broad diversity of antibodies that are able to bind a continually diversifying array of pathogens. This process involves randomly mutating the DNA sequences that encode antibodies to find variants with improved binding. These mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with "mutation motifs", which are short DNA subsequences that affect how likely a given site is to experience a mutation. Quantifying the effect of motifs on mutation rates is challenging: a large number of possible motifs makes this statistical problem high dimensional, while the unobserved history of the mutation process leads to a nontrivial missing data problem. We introduce an ℓ_1-penalized proportional hazards model to infer mutation motifs and their effects. In order to estimate model parameters, our method uses a Monte Carlo EM algorithm to marginalize over the unknown ordering of mutations. We show that our method performs better on simulated data compared to current methods and leads to more parsimonious models. The application of proportional hazards to analyses of mutation processes is, to our knowledge, novel and formalizes the current methods in a statistical framework that can be easily extended to analyze the effect of other biological features on mutation rates.

READ FULL TEXT
research
07/20/2021

Study of the Parent-of-origin effect in monogenic diseases with variable age of onset. Application on ATTRv

In genetic diseases with variable age of onset, an accurate estimation o...
research
12/01/2020

DNA mixture deconvolution using an evolutionary algorithm with multiple populations, hill-climbing, and guided mutation

DNA samples crime cases analysed in forensic genetics, frequently contai...
research
11/25/2022

Synthesis Cost-Optimal Targeted Mutant Protein Libraries

Protein variant libraries produced by site-directed mutagenesis are a us...
research
07/07/2021

Assessing the forensic value of DNA evidence from Y chromosomes and mitogenomes

Y-chromosomal and mitochondrial DNA profiles have been used as evidence ...
research
03/21/2013

Model Based Framework for Estimating Mutation Rate of Hepatitis C Virus in Egypt

Hepatitis C virus (HCV) is a widely spread disease all over the world. H...
research
05/07/2019

Somatic mutations render human exome and pathogen DNA more similar

Immunotherapy has recently shown important clinical successes in a subst...
research
06/30/2023

Analyzing Generalized Pólya Urn Models using Martingales, with an Application to Viral Evolution

The randomized play-the-winner (RPW) model is a generalized Pólya Urn pr...

Please sign up or login with your details

Forgot password? Click here to reset