Ruoming Pang

research

∙ 09/18/2023

Instruction-Following Speech Recognition

Conventional end-to-end Automatic Speech Recognition (ASR) models primar...

0 Cheng-I Jeff Lai, et al. ∙

research

∙ 09/08/2023

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Sparse Mixture-of-Experts models (MoEs) have recently gained popularity ...

0 Floris Weers, et al. ∙

research

∙ 03/31/2023

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

Conformer models maintain a large number of internal states, the vast ma...

0 Rami Botros, et al. ∙

research

∙ 08/29/2022

A Language Agnostic Multilingual Streaming On-Device ASR System

On-device end-to-end (E2E) models have shown improvements over a convent...

1 Bo Li, et al. ∙

research

∙ 03/23/2022

Pathways: Asynchronous Distributed Dataflow for ML

We present the design of a new large scale orchestration layer for accel...

14 Paul Barham, et al. ∙

research

∙ 03/09/2022

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Language model fusion helps smart assistants recognize words which are r...

5 W. Ronny Huang, et al. ∙

research

∙ 12/14/2021

Co-training Transformer with Videos and Images Improves Action Recognition

In learning action recognition, models are typically pre-trained on obje...

0 Bowen Zhang, et al. ∙

research

∙ 10/09/2021

Vector-quantized Image Modeling with Improved VQGAN

Pretraining language models with next-token prediction on massive text c...

0 Jiahui Yu, et al. ∙

research

∙ 09/27/2021

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic spee...

1 Yu Zhang, et al. ∙

research

∙ 08/07/2021

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Motivated by the success of masked language modeling (MLM) in pre-traini...

0 Yu-An Chung, et al. ∙

research

∙ 05/10/2021

GSPMD: General and Scalable Parallelization for ML Computation Graphs

We present GSPMD, an automatic, compiler-based parallelization system fo...

4 Yuanzhong Xu, et al. ∙

research

∙ 04/30/2021

Scaling End-to-End Models for Large-Scale Multilingual ASR

Building ASR models across many language families is a challenging multi...

14 Bo Li, et al. ∙

research

∙ 04/25/2021

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

Streaming end-to-end automatic speech recognition (ASR) systems are wide...

0 Thibault Doutre, et al. ∙

research

∙ 02/10/2021

Searching for Fast Model Families on Datacenter Accelerators

Neural Architecture Search (NAS), together with model scaling, has shown...

0 Sheng Li, et al. ∙

research

∙ 01/27/2021

Transformer Based Deliberation for Two-Pass Speech Recognition

Interactive speech recognition systems must generate words quickly while...

0 Ke Hu, et al. ∙

research

∙ 11/21/2020

A Better and Faster End-to-End Model for Streaming ASR

End-to-end (E2E) models have shown to outperform state-of-the-art conven...

0 Bo Li, et al. ∙

research

∙ 10/27/2020

Cascaded encoders for unifying streaming and non-streaming ASR

End-to-end (E2E) automatic speech recognition (ASR) models, by now, have...

0 Arun Narayanan, et al. ∙

research

∙ 10/24/2020

Unsupervised Learning of Disentangled Speech Content and Style Representation

We present an approach for unsupervised learning of speech representatio...

0 Andros Tjandra, et al. ∙

research

∙ 10/22/2020

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

Streaming end-to-end automatic speech recognition (ASR) models are widel...

0 Thibault Doutre, et al. ∙

research

∙ 10/21/2020

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

Streaming automatic speech recognition (ASR) aims to emit each hypothesi...

5 Jiahui Yu, et al. ∙

research

∙ 10/20/2020

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

We employ a combination of recent developments in semi-supervised learni...

0 Yu Zhang, et al. ∙

research

∙ 10/12/2020

Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling

Streaming automatic speech recognition (ASR) aims to emit each hypothesi...

8 Jiahui Yu, et al. ∙

research

∙ 08/30/2020

Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition

Recent advances of end-to-end models have outperformed conventional mode...

0 Wei Li, et al. ∙

research

∙ 08/24/2020

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

End-to-end (E2E) automatic speech recognition (ASR) systems lack the dis...

0 Cal Peyser, et al. ∙

research

∙ 05/16/2020

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

In automatic speech recognition (ASR), model pruning is a widely adopted...

0 Zhaofeng Wu, et al. ∙

research

∙ 05/16/2020

Conformer: Convolution-augmented Transformer for Speech Recognition

Recently Transformer and Convolution neural network (CNN) based models h...

0 Anmol Gulati, et al. ∙

research

∙ 05/07/2020

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

In recent years, all-neural end-to-end approaches have obtained state-of...

0 Chung-Cheng Chiu, et al. ∙

research

∙ 05/07/2020

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Convolutional neural networks (CNN) have shown promising results for end...

0 Wei Han, et al. ∙

research

∙ 03/28/2020

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

Thus far, end-to-end (E2E) models have not been shown to outperform stat...

0 Tara N. Sainath, et al. ∙

research

∙ 03/24/2020

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

Neural architecture search (NAS) has shown promising results discovering...

6 Jiahui Yu, et al. ∙

research

∙ 03/17/2020

Deliberation Model Based Two-Pass End-to-End Speech Recognition

End-to-end (E2E) models have made rapid progress in automatic speech rec...

0 Ke Hu, et al. ∙

research

∙ 11/20/2019

EfficientDet: Scalable and Efficient Object Detection

Model efficiency has become increasingly important in computer vision. I...

0 Mingxing Tan, et al. ∙

research

∙ 11/06/2019

A comparison of end-to-end models for long-form speech recognition

End-to-end automatic speech recognition (ASR) models, including both att...

0 Chung-Cheng Chiu, et al. ∙

research

∙ 08/29/2019

Two-Pass End-to-End Speech Recognition

The requirements for many applications of state-of-the-art speech recogn...

0 Tara N. Sainath, et al. ∙

research

∙ 06/12/2019

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Simultaneous machine translation begins to translate each source sentenc...

0 Naveen Arivazhagan, et al. ∙

research

∙ 05/06/2019

Searching for MobileNetV3

We present the next generation of MobileNets based on a combination of c...

53 Andrew Howard, et al. ∙

research

∙ 04/16/2019

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Current state-of-the-art convolutional architectures for object detectio...

0 Golnaz Ghiasi, et al. ∙

research

∙ 02/21/2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collab...

13 Jonathan Shen, et al. ∙

research

∙ 11/16/2018

Domain Adaptive Transfer Learning with Specialist Models

Transfer learning is a widely used method to build high performing compu...

0 Jiquan Ngiam, et al. ∙

research

∙ 11/15/2018

Streaming End-to-end Speech Recognition For Mobile Devices

End-to-end (E2E) models, which directly predict output character sequenc...

0 Yanzhang He, et al. ∙

research

∙ 10/16/2018

Hierarchical Generative Modeling for Controllable Speech Synthesis

This paper proposes a neural end-to-end text-to-speech (TTS) model which...

0 Wei-Ning Hsu, et al. ∙

research

∙ 07/31/2018

MnasNet: Platform-Aware Neural Architecture Search for Mobile

Designing convolutional neural networks (CNN) models for mobile devices ...

0 Mingxing Tan, et al. ∙

research

∙ 06/12/2018

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

We describe a neural network-based system for text-to-speech (TTS) synth...

0 Ye Jia, et al. ∙

research

∙ 12/16/2017

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

This paper describes Tacotron 2, a neural network architecture for speec...

0 Jonathan Shen, et al. ∙

Ruoming Pang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro