
-
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Vision-and-language pre-training has achieved impressive success in lear...
read it
-
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
This work concerns video-language pre-training and representation learni...
read it
-
The Elastic Lottery Ticket Hypothesis
Lottery Ticket Hypothesis raises keen attention to identifying sparse tr...
read it
-
Adversarial Feature Augmentation and Normalization for Visual Recognition
Recent advances in computer vision take advantage of adversarial data au...
read it
-
Space Mapping of Spline Spaces over Hierarchical T-meshes
In this paper, we construct a bijective mapping between a biquadratic sp...
read it
-
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
Multimodal pre-training has propelled great advancement in vision-and-la...
read it
-
Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly
Training generative adversarial networks (GANs) with limited data genera...
read it
-
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
The canonical approach to video-and-language learning (e.g., video quest...
read it
-
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
Deep, heavily overparameterized language models such as BERT, XLNet and ...
read it
-
Wasserstein Contrastive Representation Distillation
The primary goal of knowledge distillation (KD) is to encapsulate the in...
read it
-
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Large-scale pre-trained multimodal transformers, such as ViLBERT and UNI...
read it
-
Cross-Thought for Sentence Encoder Pre-training
In this paper, we propose Cross-Thought, a novel approach to pre-trainin...
read it
-
Multi-Fact Correction in Abstractive Text Summarization
Pre-trained neural abstractive summarization systems have dominated extr...
read it
-
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
Large-scale language models such as BERT have achieved state-of-the-art ...
read it
-
Efficient Robust Training via Backward Smoothing
Adversarial training is so far the most effective strategy in defending ...
read it
-
Contrastive Distillation on Intermediate Representations for Language Model Compression
Existing language model compression methods mostly use a simple L2 loss ...
read it
-
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Transformer has become ubiquitous in the deep learning field. One of the...
read it
-
Accelerating Real-Time Question Answering via Question Generation
Existing approaches to real-time question answering (RTQA) rely on learn...
read it
-
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
Large-scale cross-lingual language models (LM), such as mBERT, Unicoder ...
read it
-
Graph Optimal Transport for Cross-Domain Alignment
Cross-domain alignment between two sets of entities (e.g., objects in an...
read it
-
Adaptive Learning Rates with Maximum Variation Averaging
Adaptive gradient methods such as RMSProp and Adam use exponential movin...
read it
-
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
We present VILLA, the first known effort on large-scale adversarial trai...
read it
-
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Recent Transformer-based large-scale pre-trained models have revolutioni...
read it
-
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
We present HERO, a Hierarchical EncodeR for Omni-representation learning...
read it
-
Contextual Text Style Transfer
We introduce a new task, Contextual Text Style Transfer - translating a ...
read it
-
APo-VAE: Text Generation in Hyperbolic Space
Natural language often exhibits inherent hierarchical structure ingraine...
read it
-
BachGAN: High-Resolution Image Synthesis from Salient Object Layout
We propose a new task towards more practical application for image gener...
read it
-
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
We introduce a new task, Video-and-Language Inference, for joint multimo...
read it
-
Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection
Unsupervised domain adaptation (UDA) has achieved unprecedented success ...
read it
-
Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching
Transformer has been successfully applied to many natural language proce...
read it
-
Distilling the Knowledge of BERT for Text Generation
Large-scale pre-trained language model, such as BERT, has recently achie...
read it
-
Hierarchical Graph Network for Multi-hop Question Answering
In this paper, we present Hierarchical Graph Network (HGN) for multi-hop...
read it
-
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
We present a large, tunable neural conversational response generation mo...
read it
-
Discourse-Aware Neural Extractive Model for Text Summarization
Recently BERT has been adopted in state-of-the-art text summarization mo...
read it
-
Meta Module Network for Compositional Visual Reasoning
There are two main lines of research on visual reasoning: neural module ...
read it
-
FreeLB: Enhanced Adversarial Training for Language Understanding
Adversarial training, which minimizes the maximal risk for label-preserv...
read it
-
UNITER: Learning UNiversal Image-TExt Representations
Joint image-text embedding is the bedrock for most Vision-and-Language (...
read it
-
What Makes A Good Story? Designing Composite Rewards for Visual Storytelling
Previous storytelling approaches mostly focused on optimizing traditiona...
read it
-
Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation
Recent unsupervised approaches to domain adaptation primarily focus on m...
read it
-
Patient Knowledge Distillation for BERT Model Compression
Pre-trained language models such as BERT have proven to be highly effect...
read it
-
Adversarial Domain Adaptation for Machine Reading Comprehension
In this paper, we focus on unsupervised domain adaptation for Machine Re...
read it
-
A Hybrid Retrieval-Generation Neural Conversation Model
Intelligent personal assistant systems, with either text-based or voice-...
read it
-
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning
Commonsense reasoning is fundamental to natural language understanding. ...
read it
-
Relation-aware Graph Attention Network for Visual Question Answering
In order to answer semantically-complicated questions about an image, a ...
read it
-
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
We present FAST NAVIGATOR, a general framework for action decoding, whic...
read it
-
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
This paper presents Recurrent Dual Attention Network (ReDAN) for visual ...
read it
-
Sequential Attention GAN for Interactive Image Editing via Dialogue
In this paper, we introduce a new task - interactive image editing via c...
read it
-
StoryGAN: A Sequential Conditional GAN for Story Visualization
In this work we propose a new task called Story Visualization. Given a m...
read it
-
Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning
Training task-completion dialogue agents with reinforcement learning usu...
read it
-
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension
We present a large-scale dataset, ReCoRD, for machine reading comprehens...
read it