Recent studies have shown that using an external Language Model (LM) ben...
Building pretrained language models is considered expensive and
data-int...
Transformer models cannot easily scale to long sequences due to their O(...
In almost all text generation applications, word sequences are construct...