Unsupervised Representation Learning of DNA Sequences

06/07/2019
by   Vishal Agarwal, et al.
0

Recently several deep learning models have been used for DNA sequence based classification tasks. Often such tasks require long and variable length DNA sequences in the input. In this work, we use a sequence-to-sequence autoencoder model to learn a latent representation of a fixed dimension for long and variable length DNA sequences in an unsupervised manner. We evaluate both quantitatively and qualitatively the learned latent representation for a supervised task of splice site classification. The quantitative evaluation is done under two different settings. Our experiments show that these representations can be used as features or priors in closely related tasks such as splice site classification. Further, in our qualitative analysis, we use a model attribution technique Integrated Gradients to infer significant sequence signatures influencing the classification accuracy. We show the identified splice signatures resemble well with the existing knowledge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2017

Encoding DNA sequences by integer chaos game representation

DNA sequences are fundamental for encoding genetic information. The gene...
research
02/22/2017

Memory Matching Networks for Genomic Sequence Classification

When analyzing the genome, researchers have discovered that proteins bin...
research
05/11/2021

Constrained Consensus Sequence Algorithm for DNA Archiving

The paper describes an algorithm to compute a consensus sequence from a ...
research
07/14/2013

Map of Life: Measuring and Visualizing Species' Relatedness with "Molecular Distance Maps"

We propose a novel combination of methods that (i) portrays quantitative...
research
11/24/2021

Deep metric learning improves lab of origin prediction of genetically engineered plasmids

Genome engineering is undergoing unprecedented development and is now be...
research
12/30/2019

A New Burrows Wheeler Transform Markov Distance

Prior work inspired by compression algorithms has described how the Burr...
research
02/07/2018

Spectral Learning of Binomial HMMs for DNA Methylation Data

We consider learning parameters of Binomial Hidden Markov Models, which ...

Please sign up or login with your details

Forgot password? Click here to reset