James Lee-Thorp

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Yi Tay
76 publications
Colin Raffel
57 publications
Neil Houlsby
44 publications
Mostafa Dehghani
43 publications
Noam Shazeer
36 publications
Tao Lei
34 publications
Santiago Ontañón
32 publications
Adam Roberts
32 publications
Joshua Ainslie
25 publications
Sharan Narang
24 publications
Xavier Garcia
22 publications

research

∙ 05/22/2023

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Multi-query attention (MQA), which only uses a single key-value head, dr...

0 Joshua Ainslie, et al. ∙

research

∙ 03/17/2023

CoLT5: Faster Long-Range Transformers with Conditional Computation

Many natural language processing tasks benefit from long inputs, but pro...

0 Joshua Ainslie, et al. ∙

research

∙ 12/09/2022

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Training large, deep neural networks to convergence can be prohibitively...

0 Aran Komatsuzaki, et al. ∙

research

∙ 05/24/2022

Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT

We combine the capacity of sparsely gated Mixture-of-Experts (MoE) with ...

0 James Lee-Thorp, et al. ∙

research

∙ 03/31/2022

Scaling Up Models and Data with and

Recent neural network-based language models have benefited greatly from ...

8 Adam Roberts, et al. ∙

research

∙ 09/02/2021

ShopTalk: A System for Conversational Faceted Search

We present ShopTalk, a multi-turn conversational faceted search system f...

0 Gurmeet Manku, et al. ∙

research

∙ 05/09/2021

FNet: Mixing Tokens with Fourier Transforms

We show that Transformer encoder architectures can be massively sped up,...

0 James Lee-Thorp, et al. ∙

Success!

An error occurred

James Lee-Thorp

Featured Co-authors

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

CoLT5: Faster Long-Range Transformers with Conditional Computation

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT

Scaling Up Models and Data with and

ShopTalk: A System for Conversational Faceted Search

FNet: Mixing Tokens with Fourier Transforms

Sign in with Google

Consider DeepAI Pro