
-
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
Achieving super-human performance in recognizing human speech has been a...
read it
-
Relative Positional Encoding for Speech Recognition and Direct Translation
Transformer models are powerful sequence-to-sequence architectures that ...
read it
-
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
Recently sequence-to-sequence models have started to achieve state-of-th...
read it
-
Low Latency ASR for Simultaneous Speech Translation
User studies have shown that reducing the latency of our simultaneous le...
read it
-
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
Sequence-to-Sequence (S2S) models recently started to show state-of-the-...
read it
-
Learning Shared Encoding Representation for End-to-End Speech Recognition Models
In this work, we learn a shared encoding representation for a multi-task...
read it
-
Using multi-task learning to improve the performance of acoustic-to-word and conventional hybrid models
Acoustic-to-word (A2W) models that allow direct mapping from acoustic si...
read it
-
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop
We summarize the accomplishments of a multi-disciplinary workshop explor...
read it