DeepMotif
Deep Motif (ICLR16)/ Deep Motif Dashboard (PSB17): Visualizing Genomic Sequence Classifications
view repo
Deep neural network (DNN) models have recently obtained stateoftheart prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutionalrecurrent networks. Our first visualization method is finding a test sequence's saliency map which uses firstorder derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a classspecific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutionalrecurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNNRNN makes predictions by modeling both motifs as well as dependencies among them.
READ FULL TEXT VIEW PDFDeep Motif (ICLR16)/ Deep Motif Dashboard (PSB17): Visualizing Genomic Sequence Classifications
Deep neural networks implemented in TensorFlow & Python for predicting whether transcription factors will bind to given DNA sequences
In recent years, there has been an explosion of deep learning models which have lead to groundbreaking results in many fields such as computer vision
krizhevsky2012imagenet sutskever2014sequence , and computational biology alipanahi2015predicting ; quang2015danq ; zhou2015predicting ; kelley2016basset ; lanchantin2016motif ; Singh01092016 . However, although these models have proven to be very accurate, they have widely been viewed as “black boxes” due to their complexity, making them hard to understand. This is particularly unfavorable in the biomedical domain, where understanding a model’s predictions is extremely important for doctors and researchers trying to use the model.Aiming to open up the black box, we present the “Deep Motif Dashboard^{1}^{1}1Dashboard normally refers to a user interface that gives a current summary, usually in graphic, easytoread form, of key information relating to performancedashboard .” (DeMo Dashboard), to understand the inner workings of deep neural network models for a genomic sequence classification task. We do this by introducing a suite of different neural models and visualization strategies to see which ones perform the best and understand how they make their predictions.^{2}^{2}2
We implemented our model in Torch, and it is made available at deepmotif.org
Understanding genetic sequences is one of the fundamental tasks of health advancements due to the high correlation of genes with diseases and drugs. An important problem within genetic sequence understanding is related to transcription factors (TFs), which are regulatory proteins that bind to DNA. Each different TF binds to specific transcription factor binding sites (TFBSs) on the genome to regulate cell machinery. Given an input DNA sequence, classifying whether or not there is a binding site for a particular TF is a core task of bioinformatics
stormo2000dna .For our task, we follow a two step approach. First, given a particular TF of interest and a dataset containing samples of positive and negative TFBS sequences, we construct three deep learning architectures to classify the sequences. Section 2
introduces the three different DNN structures that we use: a convolutional neural network (
CNN), a recurrent neural network (
RNN), and a convolutionalrecurrent neural network (CNNRNN).Once we have our trained models to predict binding sites, the second step of our approach is to understand why the models perform the way they do. As explained in section 3, we do this by introducing three different visualization strategies for interpreting the models:
Measuring nucleotide importance with Saliency Maps.
Measuring critical sequence positions for the classifier using Temporal Output Scores.
Generating classspecific motif patterns with Class Optimization.
We test and evaluate our models and visualization strategies on a large scale benchmark TFBS dataset. Section 4 provides experimental results for understanding and visualizing the three DNN architectures. We find that the CNNRNN outperforms the other models. From the visualizations, we observe that the CNNRNN tends to focus its predictions on the traditional motifs, as well as modeling long range dependencies among motifs.
Chromatin immunoprecipitation (ChIPseq) technologies and databases such as ENCODE encode2012integrated have made binding site locations available for hundreds of different TFs. Despite these advancements, there are two major drawbacks: (1) ChIPseq experiments are slow and expensive, (2) although ChIPseq experiments can find the binding site locations, they cannot find patterns that are common across all of the positive binding sites which can give insight as to why TFs bind to those locations. Thus, there is a need for large scale computational methods that can not only make accurate binding site classifications, but also identify and understand patterns that influence the binding site locations.
In order to computationally predict TFBSs on a DNA sequence, researchers initially used consensus sequences and position weight matrices to match against a test sequence stormo2000dna . Simple neural network classifiers were then proposed to differentiate positive and negative binding sites, but did not show significant improvements over the weight matrix matching methods horton1992assessment . Later, SVM techniques outperformed the generative methods by using kmer features ghandi2014enhanced ; setty2015seqgl , but string kernel based SVM systems are limited by expensive computational cost proportional to the number of training and testing sequences. Most recently, convolutional neural network models have shown stateoftheart results on the TFBS task and are scalable to a large number of genomic sequences alipanahi2015predicting ; lanchantin2016motif , but it remains unclear which neural architectures work best.
To find which neural models work the best on the TFBS classification task, we examine several different types of models. Inspired by their success across different fields, we explore variations of two popular deep learning architectures: convolutional neural networks (CNNs), and recurrent neural networks (RNNs). CNNs have dominated the field of computer vision in recent years, obtaining stateoftheart results in many tasks due to their ability to automatically extract translationinvariant features. On the other hand, RNNs have emerged as one of the most powerful models for sequential data tasks such as natural language processing due to their ability to learn long range dependencies. Specifically, on the TFBS prediction task, we explore three distinct architectures: (1) CNN, (2) RNN, and (3) a combination of the two, CNNRNN. Figure 1 shows an overview of the models.
While the body of the three architectures we use differ, each implemented model follows a similar endtoend framework which we use to easily compare and contrast results. We use the raw nucleotide characters (A,C,G,T) as inputs, where each character is converted into a onehot encoding (a binary vector with the matching character entry being a
and the rest as s). This encoding matrix is used as the input to a convolutional, recurrent, or convolutionalrecurrent module that each outputs a vector of fixed dimension. The output vector of each model is linearly fed to a softmax function as the last layer which learns the mapping from the hidden space to the output class label space. The final output is a probability indicating whether an input is a positive or a negative binding site (binary classification task). The parameters of the network are trained endtoend by minimizing the negative loglikelihood over the training set. The minimization of the loss function is obtained via the stochastic gradient algorithm Adam
kingma2014adam , with a minibatch size of 256 sequences. We use dropout srivastava2014dropout as a regularization method for each model.In genomic sequences, it is believed that regulatory mechanisms such as transcription factor binding are influenced by local sequential patterns known as “motifs”. Motifs can be viewed as the temporal equivalent of spatial patterns in images such as eyes on a face, which is what CNNs are able to automatically learn and achieve stateofthe art results on computer vision tasks. As a result, a temporal convolutional neural network is a fitting model to automatically extract these motifs. A temporal convolution with filter (or kernel) size takes an input data matrix of size , with length and input layer size , and outputs a matrix of size , where is the output layer size. Specifically, , where
(1) 
where and are the trainable parameters of the convolution filter, and
is a function enforcing elementwise nonlinearity. We use rectified linear units (ReLU) as the nonlinearity:
(2) 
After the convolution and nonlinearity, CNNs typically use maxpooling, which is a dimension reduction technique to provide translation invariance and to extract higher level features from a wider range of the input sequence. Temporal maxpooling on a matrix with a pooling size of results in output matrix . Formally, , where
(3) 
Our CNN implementation involves a progression of convolution, nonlinearity, and maxpooling. This is represented as one convolutional layer in the network, and we test up to 4 layer deep CNNs. The final layer involves a maxpool across the entire temporal domain so that we have a fixedsize vector which can be fed into a softmax classifier.
Figure 1
(a) shows our CNN model with two convolutional layers. The input onehot encoded matrix is convolved with several filters (not shown) and fed through a ReLU nonlinearity to produce a matrix of convolution activations. We then perform a maxpool on the activation matrix. The output of the first maxpool is fed through another convolution, ReLU, and maxpooled across the entire length resulting in a vector. This vector is then transposed and fed through a linear and softmax layer for classification.
Designed to handle sequential data, Recurrent neural networks (RNNs) have become the main neural model for tasks such as natural language understanding. The key advantage of RNNs over CNNs is that they are able to find long range patterns in the data which are highly dependent on the ordering of the sequence for the prediction task.
Given an input matrix of size , an RNN produces matrix of size , where is the RNN embedding size. At each timestep , an RNN takes an input column vector and the previous hidden state vector and produces the next hidden state by applying the following recursive operation:
(4) 
where are the trainable parameters of the model, and
is an elementwise nonlinearity. Due to their recursive nature, RNNs can model the full conditional distribution of any sequential data and find dependencies over time, where each position in a sequence is a timestep on an imaginary time coordinate running in a certain direction. To handle the “vanishing gradients” problem of training basic RNNs on long sequences, Hochreiter and Schmidhuber
hochreiter1997longproposed an RNN variant called the Long Shortterm Memory (LSTM) network (for simplicity, we refer to LSTMs as RNNs in this paper), which can handle long term dependencies by using gating functions. These gates can control when information is written to, read from, and forgotten. Specifically, LSTM “cells” take inputs
, and , and produce , and :where , , and are elementwise sigmoid, hyperbolic tangent, and multiplication functions, respectively. , , and are the input, forget, and output gates, respectively.
RNNs produce an output vector at each timestep of the input sequence. In order to use them on a classification task, we take the mean of all vectors , and use the mean vector as input to the softmax layer.
Since there is no innate direction in genomic sequences, we use a bidirectional LSTM as our RNN model. In the bidirectional LSTM, the input sequence gets fed through two LSTM networks, one in each direction, and then the output vectors of each direction get concatenated together in the temporal direction and fed through a linear classifier.
Figure 1 (b) shows our RNN model. The input onehot encoded matrix is fed through an LSTM in both the forward and backward direction which each produce a matrix of column vectors representing the LSTM output embedding at each timestep. These vectors are then averaged to create one vector for each direction representing the LSTM output. The forward and backward output vectors are then concatenated and fed to the softmax for classification.
Considering convolutional networks are designed to extract motifs, and recurrent networks are designed to extract temporal features, we implement a combination of the two in order to find temporal patterns between the motifs. Given an input matrix , the output of the CNN is . Each column vector of gets fed into the RNN one at a time in the same way that the onehot encoded vectors get input to the regular RNN model. The resulting output of the RNN , where is the LSTM embedding size, is then averaged across the temporal domain (in the same way as the regular RNN), and fed to a softmax classifier.
Figure 1 (c) shows our CNNRNN model. The input onehot encoded matrix is fed through one layer of convolution to produce a convolution activation matrix. This matrix is then input to the LSTM, as done in the regular RNN model from the original onehot matrix. The output of the LSTM is averaged, concatenated, and fed to the softmax, similar to the RNN.
The previous section explained the deep models we use for the TFBS classification task, where we can evaluate which models perform the best. While making accurate predictions is important in biomedical tasks, it is equally important to understand why models make their predictions. Accurate, but uninterpretable models are often very slow to emerge in practice due to the inability to understand their predictions, making biomedical domain experts reluctant to use them. Consequently, we aim to obtain a better understanding of why certain models work better than others, and investigate how they make their predictions by introducing several visualization techniques. The proposed DeMo Dashboard allows us visualize and understand DNNs in three different ways: Saliency Maps, Temporal Output Scores, and Class Optimizations.
For a certain DNA sequence and a model’s classification, a logical question may be: “which which parts of the sequence are most influential for the classification?” To do this, we seek to visualize the influence of each position (i.e. nucleotide) on the prediction. Our approach is similar to the methods used on images by Simonyan et al.simonyan2013deep and Baehrens et al.baehrens2010explain . Given a sequence of length , and class , a DNN model provides a score function . We rank the nucleotides of based on their influence on the score . Since is a highly nonlinear function of with deep neural nets, it is hard to directly see the influence of each nucleotide of on . Mathematically, around the point , can be approximated by a linear function by computing the firstorder Taylor expansion:
(5) 
where is the derivative of with respect to the sequence variable at the point :
(6) 
This derivative is simply one step of backpropagation in the DNN model, and is therefore easy to compute. We do a pointwise multiplication of the saliency map with the onehot encoded sequence to get the derivative values for the actual nucleotide characters of the sequence (A,T,C, or G) so we can see the influence of the character at each position on the output score. Finally, we take the elementwise magnitude of the resulting derivative vector to visualize how important each character is regardless of derivative direction. We call the resulting vector a “saliency map
simonyan2013deep ” because it tells us which nucleotides need to be changed the least in order to affect the class score the most. As we can see from equation 5, the saliency map is simply a weighted sum of the input nucleotides, where the each weight, , indicates the influence of that nucleotide position on the output score.Since DNA is sequential (i.e. can be read in a certain direction), it can be insightful to visualize the output scores at each timestep (position) of a sequence, which we call the temporal output scores. Here we assume an imaginary time direction running from left to right on a given sequence, so each position in the sequence is a timestep in such an imagined time coordinate. In other words, we check the RNN’s prediction scores when we vary the input of the RNN. The input series is constructed by using subsequences of an input running along the imaginary time coordinate, where the subsequences start from just the first nucleotide (position), and ends with the entire sequence . This way we can see exactly where in the sequence the recurrent model changes its decision from negative to positive, or vice versa. Since our recurrent models are bidirectional, we also use the same technique on the reverse sequence. CNNs process the entire sequence at once, thus we can’t view its output as a temporal sequence, so we use this visualization on just the RNN and CNNRNN.
The previous two visualization methods listed are representative of a specific testing sample (i.e. sequencespecific). Now we introduce an approach to extract a classspecific visualization for a DNN model, where we attempt to find the best sequence which maximizes the probability of a positive TFBS, which we call class optimization. Formally, we optimize the following equation where is the probability (or score) of an input sequence (matrix in our case) being a positive TFBS computed by the softmax equation of our trained DNN model for a specific TF:
(7) 
where is the regularization parameter. We find a locally optimal
through stochastic gradient descent, where the optimization is with respect to the input sequence. In this optimization, the model weights remain unchanged. This is similar to the methods used in Simonyan et al.
simonyan2013deep to optimize toward a specific image class. This visualization method depicts the notion of a positive TFBS class for a particular TF and is not specific to any test sequence.Our three proposed visualization techniques allow us to manually inspect how the models make their predictions. In order to automatically find patterns from the techniques, we also propose methods to extract motifs, or consensus subsequences that represent the positive binding sites. We extract motifs from each of our three visualization methods in the following ways: (1) From each positive test sequence (thus, 500 total for each TF dataset) we extract a motif from the saliency map by selecting the contiguous length9 subsequence that achieves the highest sum of contiguous length9 saliency map values. (2) For each positive test sequence, we extract a motif from the temporal output scores by selecting the length9 subsequence that shows the strongest score change from negative to positive output score. (3) For each different TF, we can directly use the classoptimized sequence as a motif.
Neural networks have produced stateoftheart results on several important benchmark tasks related to genomic sequence classification alipanahi2015predicting ; zhou2015predicting ; quang2015danq , making them a good candidate to use. However, why these models work well has been poorly understood. Recent works have attempted to uncover the properties of these models, in which most of the work has been done on understanding image classifications using convolutional neural networks. Zeiler and Fergus zeiler2014visualizing used a “deconvolution” approach to map hidden layer representations back to the input space for a specific example, showing the features of the image which were important for classification. Simonyan et al.simonyan2013deep explored a similar approach by using a firstorder Taylor expansion to linearly approximate the network and find the input features most relevant, and also tried optimizing image classes. Many similar techniques later followed to understand convolutional models mahendran2016visualizing ; bach2015pixel . Most importantly, researchers have found that CNNs are able to extract layers of translationalinvariant feature maps, which may indicate why CNNs have been successfully used in genomic sequence predictions which are believed to be triggered by motifs.
On textbased tasks, there have been fewer visualization studies for DNNs. Karpathy et al.karpathy2015visualizing
explored the interpretability of RNNs for language modeling and found that there exist interpretable neurons which are able to focus on certain language structure such as quotes. Li et al.
li2015visualizingvisualized how RNNs achieve compositionality in natural language for sentiment analysis by visualizing RNN embedding vectors as well as measuring the influence of input words on classification. Both studies show examples that can be validated by our understanding of natural language linguistics. Contrarily, we are interested in understanding DNA “linguistics” given DNNs (the opposite direction of Karpathy et al.
karpathy2015visualizing and Li et al.li2015visualizing ).The main difference between our work and previous works on images and natural language is that instead of trying to understand the DNNs given human understanding of such human perception tasks, we attempt to uncover critical signals in DNA sequences given our understanding of DNNs.
For TFBS prediction, Alipanahi et al.alipanahi2015predicting was the first to implement a visualization method on a DNN model. They visualize their CNN model by extracting motifs based on the input subsequence corresponding to the strongest activation location for each convolutional filter (which we call convolution activation). Since they only have one convolutional layer, it is trivial to map the activations back, but this method does not work as well with deeper models. We attempted this technique on our models and found that our approach using saliency maps outperforms it in finding motif patterns (details in section 4). Quang and Xie quang2015danq use the same visualization method on their convolutionalrecurrent model for noncoding variant prediction.
In order to evaluate our DNN models and visualizations, we train and test on the 108 K562 cell ENCODE ChIPSeq TF datasets used in Alipanahi et al.alipanahi2015predicting . Each TF dataset has an average of 30,819 training sequences (with an even positive/negative split), and each sequence consists of 101 DNAbase characters (A,C,G,T). Every dataset has 1,000 testing sequences (with an even positive/negative split). Positive sequences are extracted from the hg19 genome centered at the reported ChIPSeq peak. Negative sequences are generated by dinucleotidepreserving shuffle of the positive sequences. Due to the separate train/test data for each TF, we train a separate model for each individual TF dataset.
We implement several variations of each DNN architecture by varying hyperparameters. Table
1 shows the different hyperparameters in each architecture. We trained many different hyperparameters for each architecture, but we show the best performing model for each type, surrounded by a larger and smaller version to show that it isn’t underfitting or overfitting.Model 







Small RNN  N/A  N/A  N/A  N/A  1  16  
Medium RNN  N/A  N/A  N/A  N/A  1  32  
Large RNN  N/A  N/A  N/A  N/A  2  32  
Small CNN  2  64  9,5  2  N/A  N/A  
Medium CNN  3  64  9,5,3  2  N/A  N/A  
Large CNN  4  64  9,5,3,3  2  N/A  N/A  
Small CNNRNN  1  64  5  N/A  2  32  
Medium CNNRNN  1  128  9  N/A  1  32  
Large CNNRNN  2  128  9,5  2  1  32 
We use the “MEMEChIP machanick2011meme sum” results from Alipanahi et al.alipanahi2015predicting as one prediction performance baseline. These results are from applying MEMEChIP to the top 500 positive training sequences, deriving five PWMs, and scoring test sequences using the sum of scores using all five PWMs. We also compare against the CNN model proposed in Alipanahi et al.alipanahi2015predicting . To evaluate motif extraction, we compare against the “convolution activation” method used in Alipanahi et al.alipanahi2015predicting and Quang and Xie quang2015danq , where we map the strongest first layer convolution filter activation back to the input sequence to find the most influential length9 subsequence.
Table 3 shows the mean area under the ROC curve (AUC) scores for each of the tested models (from Table 1
). As expected, the CNN models outperform the standard RNN models. This validates our hypothesis that positive binding sites are mainly triggered by local patterns or “motifs” that CNNs can easily find. Interestingly, the CNNRNN achieves the best performance among the three deep architectures. To check the statistical significance of such comparisons, we apply a pairwise ttest using the AUC scores for each TF and report the two tailed pvalues in Table
3. We apply the ttest on each of the best performing (based on AUC) models for each model type. All deep models are significantly better than the MEME baseline. The CNN is significantly better than the RNN and the CNNRNN is significantly better than the CNN. In order to understand why the CNNRNN performs the best, we turn to the dashboard visualizations.Model  Mean AUC  Median AUC  STDEV 
MEMEChIP machanick2011meme  0.834  0.868  0.127 
DeepBind alipanahi2015predicting (CNN)  0.903  0.931  0.091 
Small RNN  0.860  0.881  106 
Med RNN  0.876  0.905  0.116 
Large RNN  0.808  0.860  0.175 
Small CNN  0.896  0.918  0.098 
Med CNN  0.902  0.922  0.085 
Large CNN  0.880  0.890  0.093 
Small CNNRNN  0.917  0.943  0.079 
Med CNNRNN  0.925  0.947  0.073 
Large CNNRNN  0.918  0.944  0.081 
Model Comparison^{3}^{3}3to compare models, we select the best performing model for each class  pvalue 
RNN vs MEME  5.15E05 
CNN vs MEME  1.87E19 
CNNRNN vs MEME  4.84E24 
CNN vs RNN  5.08E04 
CNNRNN vs RNN  7.99E10 
CNNRNN vs CNN  4.79E22 
To evaluate the dashboard visualization methods, we first manually inspect the dashboard visualizations to look for interpretable signals. Figure 2 shows examples of the DeMo Dashboard for three different TFs and positive TFBS sequences. We apply the visualizations on the best performing models of each of the three DNN architectures. Each dashboard snapshot is for a specific TF and contains (1) JASPARmathelier2015jaspar motifs for that TF, which are the “gold standard” motifs generated by biomedical researchers, (2) the positive TFBS classoptimized sequence for each architecture (for the given TF of interest), (3) the positive TFBS test sequence of interest, where the JASPAR motifs in the test sequences are highlighted using a pink box, (4) the saliency map from each DNN model on the test sequence, and (5) forward and backward temporal output scores from the recurrent architectures on the test sequence. In the saliency maps, the more red a position is, the more influential it is for the prediction. In the temporal outputs, blue indicates a negative TFBS prediction while red indicates positive. The saliency map and temporal output visualizations are on the same positive test sequence (as shown twice). The numbers next to the model names in the saliency map section indicate the score outputs of that DNN model on the specified test sequence.
By visual inspection, we can see from the saliency maps that CNNs tend to focus on short contiguous subsequences when predicting positive bindings. In other words, CNNs clearly model “motifs” that are the most influential for prediction. The saliency maps of RNNs tend to be spread out more across the entire sequence, indicating that they focus on all nucleotides together, and infer relationships among them. The CNNRNNs have strong saliency map values around motifs, but we can also see that there are other nucleotides further away from the motifs that are influential for the model’s prediction. For example, the CNNRNN model is 99% confident in its GATA1 TFBS prediction, but the prediction is also influenced by nucleotides outside the motif. In the MAFK saliency maps, we can see that the CNNRNN and RNN focus on a very wide range of nucleotides to make their predictions, and the RNN doesn’t even focus on the known JASPAR motif to make its high confidence prediction.
Saliency Map (out of 500)  Conv. Activationsalipanahi2015predicting ; quang2015danq (out of 500)  Temporal Output (out of 500)  Class Optimization (out of 57)  
CNN  243.9  173.4  N/A  19 
RNN  138.6  N/A  53.5  11 
CNNRNN  168.1  74.2  113.2  13 
For most of the sequences that we tested, the positions that trigger the model to switch from a negative TFBS prediction to positive are near the JASPAR motifs. We did not observe clear differences between the forward and backward temporal output patterns.
In certain cases, it’s interesting to look at the temporal output scores and saliency maps together. An important case study from our examples is the NFYB example, where the CNN and RNN perform poorly, but the CNNRNN makes the correct prediction. We observe that the CNNRNN is able to switch its classification from negative to positive, while the RNN never does. To understand why this may have happened, we can see from the saliency maps that the CNNRNN focuses on two distinct regions, one of which is where it flips its classification from negative to positive. However, the RNN doesn’t focus on either of the same areas, and may be the reason why it’s never able to classify it as a positive sequence. The fact that the CNN is not able to classify it as a positive sequence, but focuses on the same regions as the CNNRNN (from the saliency map), may indicate that it is the temporal dependencies between these regions which influence the binding. In addition, the fact that there is no clear JASPAR motif in this sequence may show that the traditional motif approach is not always the best way to model TFBSs.
Class optimization on the CNN model generates concise representations which often resemble the known motifs for that particular TF. For the recurrent models, the TFBS positive optimizations are less clear, though some aspects stand out (like “AT” followed by “TC” in the GATA1 TF for the CNNRNN). We notice that for certain DNN models, their class optimized sequences optimize the reverse complement motif (e.g. NFYB CNN optimization). The class optimizations can be useful for getting a general idea of what triggers a positive TFBS for a certain TF.
In order to evaluate each DNN’s capability to automatically extract motifs, we compare the found motifs of each method (introduced in section 3.4) to the corresponding JASPAR motif, for the TF of interest. We do the comparison using the Tomtomgupta2007quantifying tool, which searches a query motif against a given motif database (and their reverse complements), and returns significant matches ranked by pvalue indicating motifmotif similarity. Table 4 summarizes the motif matching results comparing visualizationderived motifs against known motifs in the JASPAR database. We are limited to a comparison of 57 out of our 108 TF datasets by the TFs which JASPAR has motifs for. We compare four visualization approaches: Saliency Map, Convolution Activationalipanahi2015predicting ; quang2015danq , Temporal Output Scores and Class Optimizations. The first three techniques are sequence specific, therefore we report the average number of motif matches out of 500 positive sequences (then averaged across 57 TF datasets). The last technique is for a particular TFBS positive class.
We can see from Table 4 that across multiple visualization techniques, the CNN finds motifs the best, followed by the CNNRNN and the RNN. However, since CNNs perform worse than CNNRNNs by AUC scores, we hypothesize that this demonstrates that it is also important to model sequential interactions among motifs. In the CNNRNN combination, CNN acts like a “motif finder” and the RNN finds dependencies among motifs. This analysis shows that visualizing the DNN classifications can lead to a better understanding of DNNs for TFBSs.
Deep neural networks (DNNs) have shown to be the most accurate models for TFBS classification. However, DNN models are hard to interpret, and thus their adaptation in practice is slow. In this work, we propose the Deep Motif (DeMo) Dashboard to explore three different DNN architectures on TFBS prediction, and introduce three visualization methods to shed light on how these models work. Although our visualization methods still require a human practitioner to examine the dashboard, it is a start to understand these models and we hope that this work will invoke further studies on visualizing and understanding DNN based genomic sequences analysis. Furthermore, DNN models have recently shown to provide excellent results for epigenomic analysis Singh01092016 . We plan to extend our DeMo Dashboard to related applications.