
-
Cross-Thought for Sentence Encoder Pre-training
In this paper, we propose Cross-Thought, a novel approach to pre-trainin...
read it
-
Contrastive Distillation on Intermediate Representations for Language Model Compression
Existing language model compression methods mostly use a simple L2 loss ...
read it
-
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Transformer has become ubiquitous in the deep learning field. One of the...
read it
-
Accelerating Real-Time Question Answering via Question Generation
Existing approaches to real-time question answering (RTQA) rely on learn...
read it
-
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
Large-scale cross-lingual language models (LM), such as mBERT, Unicoder ...
read it
-
Unsupervised and Supervised Structure Learning for Protein Contact Prediction
Protein contacts provide key information for the understanding of protei...
read it
-
Hierarchical Graph Network for Multi-hop Question Answering
In this paper, we present Hierarchical Graph Network (HGN) for multi-hop...
read it
-
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
We present a large, tunable neural conversational response generation mo...
read it
-
FreeLB: Enhanced Adversarial Training for Language Understanding
Adversarial training, which minimizes the maximal risk for label-preserv...
read it
-
Patient Knowledge Distillation for BERT Model Compression
Pre-trained language models such as BERT have proven to be highly effect...
read it
-
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Recently exciting progress has been made on protein contact prediction, ...
read it
-
Graphical Model Sketch
Structured high-cardinality data arises in many domains, and poses a maj...
read it
-
AUC-maximized Deep Convolutional Neural Fields for Sequence Labeling
Deep Convolutional Neural Networks (DCNN) has shown excellent performanc...
read it
-
Learning Nonparametric Forest Graphical Models with Prior Information
We present a framework for incorporating prior information into nonparam...
read it
-
Learning Scale-Free Networks by Dynamic Node-Specific Degree Prior
Learning the network structure underlying data is an important problem i...
read it