Behavioral testing in NLP allows fine-grained evaluation of systems by
e...
The Transformer architecture has two main non-embedding components: Atte...
We propose extending the Sequence-level Knowledge Distillation (Kim and ...
Self-training has been shown to be helpful in addressing data scarcity f...
Code switching (CS) refers to the phenomenon of interchangeably using wo...
The conventional paradigm in speech translation starts with a speech
rec...
Variational Neural Machine Translation (VNMT) is an attractive framework...