Collecting audio-text pairs is expensive; however, it is much easier to
...
Although frame-based models, such as CTC and transducers, have an affini...
There has been an increased interest in the integration of pretrained sp...
Recently there have been efforts to introduce new benchmark tasks for sp...
This paper describes our system for the low-resource domain adaptation t...
Most human interactions occur in the form of spoken conversations where ...
Disfluency detection has mainly been solved in a pipeline approach, as
p...
Connectionist temporal classification (CTC) -based models are attractive...
Connectionist temporal classification (CTC) -based models are attractive...
In automatic speech recognition (ASR) rescoring, the hypothesis with the...
Attention-based sequence-to-sequence (seq2seq) models have achieved prom...