Interlacing Personal and Reference Genomes for Machine Learning Disease-Variant Detection

11/26/2018
by   Luke R Harries, et al.
0

DNA sequencing to identify genetic variants is becoming increasingly valuable in clinical settings. Assessment of variants in such sequencing data is commonly implemented through Bayesian heuristic algorithms. Machine learning has shown great promise in improving on these variant calls, but the input for these is still a standardized "pile-up" image, which is not always best suited. In this paper, we present a novel method for generating images from DNA sequencing data, which interlaces the human reference genome with personalized sequencing output, to maximize usage of sequencing reads and improve machine learning algorithm performance. We demonstrate the success of this in improving standard germline variant calling. We also furthered this approach to include somatic variant calling across tumor/normal data with Siamese networks. These approaches can be used in machine learning applications on sequencing data with the hope of improving clinical outcomes, and are freely available for noncommercial use at www.ccg.ai.

READ FULL TEXT
research
12/06/2019

Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

The emerging field of precision oncology relies on the accurate pinpoint...
research
11/26/2018

A Framework for Implementing Machine Learning on Omics Data

The potential benefits of applying machine learning methods to -omics da...
research
09/24/2019

LitGen: Genetic Literature Recommendation Guided by Human Explanations

As genetic sequencing costs decrease, the lack of clinical interpretatio...
research
12/31/2019

Transform-Domain Classification of Human Cells based on DNA Methylation Datasets

A novel method to classify human cells is presented in this work based o...
research
10/22/2020

Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

Missing genotypes can affect the efficacy of machine learning approaches...
research
07/11/2012

Efficient Prediction of DNA-Binding Proteins Using Machine Learning

DNA-binding proteins are a class of proteins which have a specific or ge...
research
04/17/2023

Lossy Compressor preserving variant calling through Extended BWT

A standard format used for storing the output of high-throughput sequenc...

Please sign up or login with your details

Forgot password? Click here to reset