Recent empirical evidence indicates that transformer based in-context
le...
Disaggregated memory is a promising approach that addresses the limitati...
Recent research in robust optimization has shown an overfitting-like
phe...
Effective scaling and a flexible task interface enable large language mo...
Visual Question Answering (VQA) has benefited from increasingly sophisti...
With the increasing abundance of pretrained models in recent years, the
...
Despite recent advances in its theoretical understanding, there still re...
The research community has proposed copious modifications to the Transfo...
The availability of large-scale image captioning and visual question
ans...
Recent advances in automatic evaluation metrics for text have shown that...
Sequence generation models trained with teacher-forcing suffer from issu...
Models based on the Transformer architecture have achieved better accura...
We introduce "talking-heads attention" - a variation on multi-head atten...
Pairwise sequence alignment is one of the most computationally intensive...
In this report, the method for the iqiyi submission to the task of
Activ...
Supervised training of abstractive language generation models results in...
We introduce a new multi-modal task for computer systems, posed as a com...
We present a family of neural-network--inspired models for computing
con...
We present a dual contribution to the task of machine reading-comprehens...
Recent advances in Bayesian learning with large-scale data have witnesse...
Stochastic gradient MCMC (SG-MCMC) has played an important role in
large...
We develop dependent hierarchical normalized random measures and apply t...
This paper presents theory for Normalized Random Measures (NRMs), Normal...