Christian Fuegen

research

∙ 09/19/2023

End-to-End Speech Recognition Contextualization with Large Language Models

In recent years, Large Language Models (LLMs) have garnered significant ...

0 Egor Lakomkin, et al. ∙

research

∙ 07/21/2023

Prompting Large Language Models with Speech Recognition Abilities

Large language models have proven themselves highly flexible, able to so...

0 Yassir Fathullah, et al. ∙

research

∙ 03/30/2023

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Recently reported state-of-the-art results in visual speech recognition ...

0 Xubo Liu, et al. ∙

research

∙ 11/03/2022

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Recognizing a word shortly after it is spoken is an important requiremen...

0 Pingchuan Ma, et al. ∙

research

∙ 04/19/2022

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

The two most popular loss functions for streaming end-to-end automatic s...

0 Niko Moritz, et al. ∙

research

∙ 11/10/2021

Scaling ASR Improves Zero and Few Shot Learning

With 4.5 million hours of English speech from 10 different sources acros...

0 Alex Xiao, et al. ∙

research

∙ 10/11/2021

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

Measuring automatic speech recognition (ASR) system quality is critical ...

0 Suyoun Kim, et al. ∙

research

∙ 06/21/2021

Do sound event representations generalize to other audio tasks? A case study in audio transfer learning

Transfer learning is critical for efficient information transfer across ...

0 Anurag Kumar, et al. ∙

research

∙ 04/06/2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios

Often, the storage and computational constraints of embeddeddevices dema...

0 Jay Mahadeokar, et al. ∙

research

∙ 04/06/2021

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

As speech-enabled devices such as smartphones and smart speakers become ...

0 Yuan Shangguan, et al. ∙

research

∙ 04/05/2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

How to leverage dynamic contextual information in end-to-end speech reco...

0 Duc Le, et al. ∙

research

∙ 04/05/2021

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

We propose a dynamic encoder transducer (DET) for on-device speech recog...

0 Yangyang Shi, et al. ∙

research

∙ 04/05/2021

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

Word Error Rate (WER) has been the predominant metric used to evaluate t...

0 Suyoun Kim, et al. ∙

research

∙ 03/09/2021

Contrastive Semi-supervised Learning for ASR

Pseudo-labeling is the most adopted method for pre-training automatic sp...

0 Alex Xiao, et al. ∙

research

∙ 02/23/2021

Memory-efficient Speech Recognition on Smart Devices

Recurrent transducer models have emerged as a promising solution for spe...

0 Ganesh Venkatesh, et al. ∙

research

∙ 11/16/2020

Deep Shallow Fusion for RNN-T Personalization

End-to-end models in general, and Recurrent Neural Network Transducer (R...

0 Duc Le, et al. ∙

research

∙ 11/05/2020

Alignment Restricted Streaming Recurrent Neural Network Transducer

There is a growing interest in the speech community in developing Recurr...

0 Jay Mahadeokar, et al. ∙

research

∙ 10/26/2020

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer

Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech...

0 Suyoun Kim, et al. ∙

research

∙ 05/18/2020

Weak-Attention Suppression For Transformer Based Speech Recognition

Transformers, originally proposed for natural language processing (NLP) ...

0 Yangyang Shi, et al. ∙

research

∙ 05/16/2020

Large scale weakly and semi-supervised learning for low-resource video ASR

Many semi- and weakly-supervised approaches have been investigated for o...

0 Kritika Singh, et al. ∙

research

∙ 12/17/2019

Libri-Light: A Benchmark for ASR with Limited or No Supervision

We introduce a new collection of spoken English audio suitable for train...

0 Jacob Kahn, et al. ∙

research

∙ 11/05/2019

Spatial Attention for Far-field Speech Recognition with Deep Beamforming Neural Networks

In this paper, we introduce spatial attention for refining the informati...

0 Weipeng He, et al. ∙

research

∙ 11/05/2019

RNN-T For Latency Controlled ASR With Improved Beam Search

Neural transducer-based systems such as RNN Transducers (RNN-T) for auto...

0 Mahaveer Jain, et al. ∙

research

∙ 10/28/2019

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

We explore options to use Transformer networks in neural transducer for ...

0 Ching-Feng Yeh, et al. ∙

research

∙ 10/22/2019

G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR

Grapheme-based acoustic modeling has recently been shown to outperform p...

0 Duc Le, et al. ∙

research

∙ 10/22/2019

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

We propose and evaluate transformer-based acoustic models (AMs) for hybr...

0 Yongqiang Wang, et al. ∙

research

∙ 12/05/2018

End-to-end contextual speech recognition using class language models and a token passing decoder

End-to-end modeling (E2E) of automatic speech recognition (ASR) blends a...

0 Zhehuai Chen, et al. ∙

research

∙ 02/23/2018

Towards end-to-end spoken language understanding

Spoken language understanding system is traditionally designed as a pipe...

0 Dmitriy Serdyuk, et al. ∙

Christian Fuegen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro