
-
BN-invariant sharpness regularizes the training model to better generalization
It is arguably believed that flatter minima can generalize better. Howev...
read it
-
Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations
Multi-agent reinforcement learning (MARL) has been increasingly explored...
read it
-
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
While neural-based text to speech (TTS) models can synthesize natural an...
read it
-
Latent Causal Invariant Model
Current supervised learning can learn spurious correlation during the da...
read it
-
Learning Causal Semantic Representation for Out-of-Distribution Prediction
Conventional supervised learning methods, especially deep ones, are foun...
read it
-
COSEA: Convolutional Code Search with Layer-wise Attention
Semantic code search, which aims to retrieve code snippets relevant to a...
read it
-
Qlib: An AI-oriented Quantitative Investment Platform
Quantitative investment aims to maximize the return and minimize the ris...
read it
-
DualLip: A System for Joint Lip Reading and Generation
Lip reading aims to recognize text from talking lip, while lip generatio...
read it
-
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training
Normalization plays an important role in the optimization of deep neural...
read it
-
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
High-fidelity singing voices usually require higher sampling rate (e.g.,...
read it
-
PopMAG: Pop Music Accompaniment Generation
In pop music, accompaniments are usually played by multiple instruments ...
read it
-
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Speech synthesis (text to speech, TTS) and recognition (automatic speech...
read it
-
Taking Notes on the Fly Helps BERT Pre-training
How to make unsupervised language pre-training more efficient and less r...
read it
-
Membership Inference with Privately Augmented Data Endorses the Benign while Suppresses the Adversary
Membership inference (MI) in machine learning decides whether a given ex...
read it
-
Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation
Non-autoregressive translation (NAT) achieves faster inference speed but...
read it
-
Learning to Match Distributions for Domain Adaptation
When the training and test data are from different distributions, domain...
read it
-
Learn to Use Future Information in Simultaneous Translation
Simultaneous neural machine translation (briefly, NMT) has attracted muc...
read it
-
Neural Architecture Search with GBDT
Neural architecture search (NAS) with an accuracy predictor that predict...
read it
-
Learning to Teach with Deep Interactions
Machine teaching uses a meta/teacher model to guide the training of a st...
read it
-
DeepSinger: Singing Voice Synthesis with Data Mined From the Web
In this paper, we develop DeepSinger, a multi-lingual multi-singer singi...
read it
-
Rethinking Positional Encoding in Language Pre-training
How to explicitly encode positional information into neural networks is ...
read it
-
Rethinking the Positional Encoding in Language Pre-training
How to explicitly encode positional information into neural networks is ...
read it
-
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
Stochastic gradient descent (SGD) and its variants are mainstream method...
read it
-
Modeling Lost Information in Lossy Image Compression
Lossy image compression is one of the most commonly used operators for d...
read it
-
Multi-branch Attentive Transformer
While the multi-branch architecture is one of the key ingredients to the...
read it
-
UWSpeech: Speech to Speech Translation for Unwritten Languages
Existing speech to speech translation systems heavily rely on the text o...
read it
-
MC-BERT: Efficient Language Pre-Training via a Meta Controller
Pre-trained contextual representations (e.g., BERT) have become the foun...
read it
-
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Transformer-based text to speech (TTS) model (e.g., Transformer TTS <cit...
read it
-
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Advanced text to speech (TTS) models such as FastSpeech can synthesize s...
read it
-
Dual Learning: Theoretical Study and an Algorithmic Extension
Dual learning has been successfully applied in many machine learning app...
read it
-
Invertible Image Rescaling
High-resolution digital images are usually downscaled to fit various dis...
read it
-
SEEK: Segmented Embedding of Knowledge Graphs
In recent years, knowledge graph embedding becomes a pretty hot research...
read it
-
LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning
While pre-training and fine-tuning, e.g., BERT <cit.>, GPT-2 <cit.>, hav...
read it
-
A Study of Non-autoregressive Model for Sequence Generation
Non-autoregressive (NAR) models generate all the tokens of a sequence in...
read it
-
MPNet: Masked and Permuted Pre-training for Language Understanding
BERT adopts masked language modeling (MLM) for pre-training and is one o...
read it
-
Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling by Exploring Energy of the Discriminator
Generative Adversarial Networks (GANs) have shown great promise in model...
read it
-
Suphx: Mastering Mahjong with Deep Reinforcement Learning
Artificial Intelligence (AI) has achieved great success in many domains,...
read it
-
Semi-Supervised Neural Architecture Search
Neural architecture search (NAS) relies on a good controller to generate...
read it
-
Incorporating BERT into Neural Machine Translation
The recently proposed BERT has shown great power on a variety of natural...
read it
-
On Layer Normalization in the Transformer Architecture
The Transformer is widely used in natural language processing tasks. To ...
read it
-
A Study of Multilingual Neural Machine Translation
Multilingual neural machine translation (NMT) has recently been investig...
read it
-
Gradient Perturbation is Underrated for Differentially Private Convex Optimization
Gradient perturbation, widely used for differentially private optimizati...
read it
-
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation
Non-autoregressive translation (NAT) models remove the dependence on pre...
read it
-
Microsoft Research Asia's Systems for WMT19
We Microsoft Research Asia made submissions to 11 language directions in...
read it
-
Distributional Reward Decomposition for Reinforcement Learning
Many reinforcement learning (RL) tasks have specific properties that can...
read it
-
Hint-Based Training for Non-Autoregressive Machine Translation
Due to the unparallelizable nature of the autoregressive factorization, ...
read it
-
Self-paced Ensemble for Highly Imbalanced Massive Data Classification
Many real-world applications reveal difficulties in learning classifiers...
read it
-
Training Effective Ensemble on Imbalanced Data by Self-paced Harmonizing Classification Hardness
Many real-world applications reveal difficulties in learning classifiers...
read it
-
Multilingual Neural Machine Translation with Language Clustering
Multilingual neural machine translation (NMT), which translates multiple...
read it
-
Representation Degeneration Problem in Training Natural Language Generation Models
We study an interesting problem in training neural network-based models ...
read it