Transformer-based models have made tremendous impacts in natural languag...
Transformer model with multi-head attention requires caching intermediat...
Now, the pre-training technique is ubiquitous in natural language proces...
In this paper, we propose a novel data augmentation method, referred to ...
Reading long documents to answer open-domain questions remains challengi...
In this paper, we present a new sequence-to-sequence pre-training model
...