Conventional end-to-end Automatic Speech Recognition (ASR) models primar...
Sparse Mixture-of-Experts models (MoEs) have recently gained popularity ...
Conformer models maintain a large number of internal states, the vast
ma...
On-device end-to-end (E2E) models have shown improvements over a convent...
We present the design of a new large scale orchestration layer for
accel...
Language model fusion helps smart assistants recognize words which are r...
In learning action recognition, models are typically pre-trained on obje...
Pretraining language models with next-token prediction on massive text
c...
We summarize the results of a host of efforts using giant automatic spee...
Motivated by the success of masked language modeling (MLM) in pre-traini...
We present GSPMD, an automatic, compiler-based parallelization system fo...
Building ASR models across many language families is a challenging multi...
Streaming end-to-end automatic speech recognition (ASR) systems are wide...
Neural Architecture Search (NAS), together with model scaling, has shown...
Interactive speech recognition systems must generate words quickly while...
End-to-end (E2E) models have shown to outperform state-of-the-art
conven...
End-to-end (E2E) automatic speech recognition (ASR) models, by now, have...
We present an approach for unsupervised learning of speech representatio...
Streaming end-to-end automatic speech recognition (ASR) models are widel...
Streaming automatic speech recognition (ASR) aims to emit each hypothesi...
We employ a combination of recent developments in semi-supervised learni...
Streaming automatic speech recognition (ASR) aims to emit each hypothesi...
Recent advances of end-to-end models have outperformed conventional mode...
End-to-end (E2E) automatic speech recognition (ASR) systems lack the dis...
In automatic speech recognition (ASR), model pruning is a widely adopted...
Recently Transformer and Convolution neural network (CNN) based models h...
In recent years, all-neural end-to-end approaches have obtained
state-of...
Convolutional neural networks (CNN) have shown promising results for
end...
Thus far, end-to-end (E2E) models have not been shown to outperform
stat...
Neural architecture search (NAS) has shown promising results discovering...
End-to-end (E2E) models have made rapid progress in automatic speech
rec...
Model efficiency has become increasingly important in computer vision. I...
End-to-end automatic speech recognition (ASR) models, including both
att...
The requirements for many applications of state-of-the-art speech recogn...
Simultaneous machine translation begins to translate each source sentenc...
We present the next generation of MobileNets based on a combination of
c...
Current state-of-the-art convolutional architectures for object detectio...
Lingvo is a Tensorflow framework offering a complete solution for
collab...
Transfer learning is a widely used method to build high performing compu...
End-to-end (E2E) models, which directly predict output character sequenc...
This paper proposes a neural end-to-end text-to-speech (TTS) model which...
Designing convolutional neural networks (CNN) models for mobile devices ...
We describe a neural network-based system for text-to-speech (TTS) synth...
This paper describes Tacotron 2, a neural network architecture for speec...