Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification

10/30/2017
by   Jack Lanchantin, et al.
0

One of the fundamental tasks in understanding genomics is the problem of predicting Transcription Factor Binding Sites (TFBSs). With more than hundreds of Transcription Factors (TFs) as labels, genomic-sequence based TFBS prediction is a challenging multi-label classification task. There are two major biological mechanisms for TF binding: (1) sequence-specific binding patterns on genomes known as "motifs" and (2) interactions among TFs known as co-binding effects. In this paper, we propose a novel deep architecture, the Prototype Matching Network (PMN) to mimic the TF binding mechanisms. Our PMN model automatically extracts prototypes ("motif"-like features) for each TF through a novel prototype-matching loss. Borrowing ideas from few-shot matching models, we use the notion of support set of prototypes and an LSTM to learn how TFs interact and bind to genomic sequences. On a reference TFBS dataset with 2.1 million genomic sequences, PMN significantly outperforms baselines and validates our design choices empirically. To our knowledge, this is the first deep learning architecture that introduces prototype learning and considers TF-TF interactions for large-scale TFBS prediction. Not only is the proposed architecture accurate, but it also models the underlying biology.

READ FULL TEXT

page 3

page 14

research
06/13/2018

SGM: Sequence Generation Model for Multi-label Classification

Multi-label classification is an important yet challenging task in natur...
research
10/26/2022

OTSeq2Set: An Optimal Transport Enhanced Sequence-to-Set Model for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMTC) is the task of finding th...
research
09/10/2018

A Deep Reinforced Sequence-to-Set Model for Multi-Label Text Classification

Multi-label text classification (MLTC) aims to assign multiple labels to...
research
11/19/2015

Structured Prediction Energy Networks

We introduce structured prediction energy networks (SPENs), a flexible f...
research
05/29/2021

Multi-Label Few-Shot Learning for Aspect Category Detection

Aspect category detection (ACD) in sentiment analysis aims to identify t...
research
02/13/2018

Predict and Constrain: Modeling Cardinality in Deep Structured Prediction

Many machine learning problems require the prediction of multi-dimension...
research
04/11/2019

Adapting RNN Sequence Prediction Model to Multi-label Set Prediction

We present an adaptation of RNN sequence models to the problem of multi-...

Please sign up or login with your details

Forgot password? Click here to reset