Recent neural network-based language models have benefited greatly from
...
Large Transformer models yield impressive results on many tasks, but are...
Sample efficiency and performance in the offline setting have emerged as...
We introduce Performers, Transformer architectures which can estimate re...