Emiru Tsunoo

research

∙ 09/16/2023

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

Collecting audio-text pairs is expensive; however, it is much easier to ...

0 Emiru Tsunoo, et al. ∙

research

∙ 07/24/2023

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

Although frame-based models, such as CTC and transducers, have an affini...

0 Emiru Tsunoo, et al. ∙

research

∙ 07/20/2023

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

There has been an increased interest in the integration of pretrained sp...

0 Siddhant Arora, et al. ∙

research

∙ 05/02/2023

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

Recently there have been efforts to introduce new benchmark tasks for sp...

0 Siddhant Arora, et al. ∙

research

∙ 05/02/2023

The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge

This paper describes our system for the low-resource domain adaptation t...

0 Hayato Futami, et al. ∙

research

∙ 05/01/2023

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History

Most human interactions occur in the form of spoken conversations where ...

0 Siddhant Arora, et al. ∙

research

∙ 11/16/2022

Streaming Joint Speech Recognition and Disfluency Detection

Disfluency detection has mainly been solved in a pipeline approach, as p...

0 Hayato Futami, et al. ∙

research

∙ 06/15/2022

Residual Language Model for End-to-end Speech Recognition

End-to-end automatic speech recognition suffers from adaptation to unkno...

0 Emiru Tsunoo, et al. ∙

research

∙ 02/03/2022

Joint Speech Recognition and Audio Captioning

Speech samples recorded in both indoor and outdoor environments are ofte...

0 Chaitanya Narisetty, et al. ∙

research

∙ 01/25/2022

Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR

A streaming style inference of encoder-decoder automatic speech recognit...

0 Emiru Tsunoo, et al. ∙

research

∙ 01/24/2022

Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end

Although end-to-end text-to-speech (TTS) models can generate natural spe...

0 Rem Hida, et al. ∙

research

∙ 10/14/2021

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

Sound event localization and detection (SELD) involves identifying the d...

0 Kazuki Shimada, et al. ∙

research

∙ 10/13/2021

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

Recording and annotating real sound events for a sound event localizatio...

0 Yuichiro Koyama, et al. ∙

research

∙ 06/21/2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

This report describes our systems submitted to the DCASE2021 challenge t...

0 Kazuki Shimada, et al. ∙

research

∙ 06/07/2021

Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios

Although end-to-end automatic speech recognition (E2E ASR) has achieved ...

0 Emiru Tsunoo, et al. ∙

research

∙ 02/18/2021

Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition

Self-attention (SA) based models have recently achieved significant perf...

0 Yosuke Kashiwagi, et al. ∙

research

∙ 06/25/2020

Streaming Transformer ASR with Blockwise Synchronous Inference

The Transformer self-attention network has recently shown promising perf...

0 Emiru Tsunoo, et al. ∙

research

∙ 10/25/2019

Towards Online End-to-end Transformer Automatic Speech Recognition

The Transformer self-attention network has recently shown promising perf...

0 Emiru Tsunoo, et al. ∙

research

∙ 10/16/2019

Transformer ASR with Contextual Block Processing

The Transformer self-attention network has recently shown promising perf...

0 Emiru Tsunoo, et al. ∙

research

∙ 05/17/2019

End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System

An on-device DNN-HMM speech recognition system efficiently works with a ...

0 Emiru Tsunoo, et al. ∙

Emiru Tsunoo

Featured Co-authors

Sign in with Google

Consider DeepAI Pro