Multi-query attention (MQA), which only uses a single key-value head,
dr...
Many natural language processing tasks benefit from long inputs, but
pro...
Training large, deep neural networks to convergence can be prohibitively...
We combine the capacity of sparsely gated Mixture-of-Experts (MoE) with ...
Recent neural network-based language models have benefited greatly from
...
We present ShopTalk, a multi-turn conversational faceted search system f...
We show that Transformer encoder architectures can be massively sped up,...