Generative Pre-trained Transformer (GPT) models have shown remarkable
ca...
This paper presents Domain-Specific Sub-network (DoSS). It uses a set of...
This paper proposes a simple yet effective method to improve direct (X-t...
This paper describes our submission to the constrained track of WMT21 sh...
The Mixture of Experts (MoE) models are an emerging class of sparsely
ac...
This paper describes our submission to the WMT20 sentence filtering task...