This paper explores the effectiveness of model-generated signals in impr...
A big convergence of language, multimodal perception, action, and world
...
Position modeling plays a critical role in Transformers. In this paper, ...
Large Transformers have achieved state-of-the-art performance across man...
In this paper, we elaborate upon recipes for building multilingual
repre...
A big convergence of model architectures across language, vision, speech...
Sparse mixture of experts provides larger model capacity while requiring...
We present an efficient method of pretraining large-scale autoencoding
l...
We present a new framework AMOS that pretrains text encoders with an
Adv...
Pretrained general-purpose language models can achieve state-of-the-art
...
This report describes Microsoft's machine translation systems for the WM...
Compared to monolingual models, cross-lingual models usually require a m...
In this paper, we introduce ELECTRA-style tasks to cross-lingual languag...
While pretrained encoders have achieved success in various natural langu...
Fine-tuning pre-trained cross-lingual language models can transfer
task-...
We consider the problem of scaling automated suggested replies for Outlo...
We present COCO-LM, a new self-supervised learning framework that pretra...
Multilingual machine translation enables a single model to translate bet...
In this work, we formulate cross-lingual language model pre-training as
...
How much knowledge do pretrained language models hold? Recent research
o...
This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a
...
Axiomatic information retrieval (IR) seeks a set of principle properties...
When a bilingual student learns to solve word problems in math, we expec...
Deep neural networks have recently shown promise in the ad-hoc retrieval...
This paper presents our recent work on the design and development of a n...