Hirofumi Inaguma

research

∙ 08/22/2023

SeamlessM4T-Massively Multilingual Multimodal Machine Translation

What does it take to create the Babel Fish, a tool that can help individ...

0 Seamless Communication, et al. ∙

research

∙ 06/01/2023

Exploration on HuBERT with Multiple Resolutions

Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL...

0 Jiatong Shi, et al. ∙

research

∙ 05/04/2023

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

Transducer and Attention based Encoder-Decoder (AED) are two widely used...

0 Yun Tang, et al. ∙

research

∙ 04/10/2023

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

It has been known that direct speech-to-speech translation (S2ST) models...

0 Jiatong Shi, et al. ∙

research

∙ 04/10/2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...

0 Brian Yan, et al. ∙

research

∙ 12/15/2022

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Direct speech-to-speech translation (S2ST), in which all components can ...

2 Hirofumi Inaguma, et al. ∙

research

∙ 11/11/2022

Speech-to-Speech Translation For A Real-world Unwritten Language

We study speech-to-speech translation (S2ST) that translates speech from...

0 Peng-Jen Chen, et al. ∙

research

∙ 10/21/2022

Named Entity Detection and Injection for Direct Speech Translation

In a sentence, certain words are critical for its semantic. Among them, ...

1 Marco Gaido, et al. ∙

research

∙ 10/18/2022

Simple and Effective Unsupervised Speech Translation

The amount of labeled data to train models for speech tasks is limited f...

0 Changhan Wang, et al. ∙

research

∙ 09/08/2022

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

Connectionist temporal classification (CTC) -based models are attractive...

0 Hayato Futami, et al. ∙

research

∙ 09/05/2022

Distilling the Knowledge of BERT for CTC-based ASR

Connectionist temporal classification (CTC) -based models are attractive...

0 Hayato Futami, et al. ∙

research

∙ 01/14/2022

A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies

In this study, we present recent developments of models trained with the...

0 Florian Boyer, et al. ∙

research

∙ 10/11/2021

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Non-autoregressive (NAR) models simultaneously generate multiple outputs...

0 Yosuke Higuchi, et al. ∙

research

∙ 10/05/2021

ASR Rescoring and Confidence Estimation with ELECTRA

In automatic speech recognition (ASR) rescoring, the hypothesis with the...

0 Hayato Futami, et al. ∙

research

∙ 09/27/2021

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

The multi-decoder (MD) end-to-end speech translation model has demonstra...

0 Hirofumi Inaguma, et al. ∙

research

∙ 09/09/2021

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

This article describes an efficient end-to-end speech translation (E2E-S...

0 Hirofumi Inaguma, et al. ∙

research

∙ 07/15/2021

VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording

In this work, we propose novel decoding algorithms to enable streaming a...

0 Hirofumi Inaguma, et al. ∙

research

∙ 07/01/2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System

This paper describes the ESPnet-ST group's IWSLT 2021 submission in the ...

0 Hirofumi Inaguma, et al. ∙

research

∙ 07/01/2021

StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR

While attention-based encoder-decoder (AED) models have been successfull...

0 Hirofumi Inaguma, et al. ∙

research

∙ 04/13/2021

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

A conventional approach to improving the performance of end-to-end speec...

0 Hirofumi Inaguma, et al. ∙

research

∙ 02/28/2021

Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition

This article describes an efficient training method for online streaming...

0 Hirofumi Inaguma, et al. ∙

research

∙ 12/23/2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

This paper describes the recent development of ESPnet (https://github.co...

0 Shinji Watanabe, et al. ∙

research

∙ 10/26/2020

Recent Developments on ESPnet Toolkit Boosted by Conformer

In this study, we present recent developments on ESPnet: End-to-End Spee...

0 Pengcheng Guo, et al. ∙

research

∙ 10/26/2020

Improved Mask-CTC for Non-Autoregressive End-to-End ASR

For real-world deployment of automatic speech recognition (ASR), the sys...

0 Yosuke Higuchi, et al. ∙

research

∙ 10/25/2020

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

Fast inference speed is an important goal towards real-world deployment ...

0 Hirofumi Inaguma, et al. ∙

research

∙ 08/09/2020

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

Attention-based sequence-to-sequence (seq2seq) models have achieved prom...

0 Hayato Futami, et al. ∙

research

∙ 05/19/2020

Enhancing Monotonic Multihead Attention for Streaming ASR

We investigate a monotonic multihead attention (MMA) by extending hard m...

0 Hirofumi Inaguma, et al. ∙

research

∙ 05/10/2020

CTC-synchronous Training for Monotonic Attention Model

Monotonic chunkwise attention (MoChA) has been studied for the online st...

0 Hirofumi Inaguma, et al. ∙

research

∙ 04/23/2020

End-to-end speech-to-dialog-act recognition

Spoken language understanding, which extracts intents and/or semantic co...

0 Viet-Trung Dang, et al. ∙

research

∙ 04/21/2020

ESPnet-ST: All-in-One Speech Translation Toolkit

We present ESPnet-ST, which is designed for the quick development of spe...

0 Hirofumi Inaguma, et al. ∙

research

∙ 04/10/2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR

Recently, a few novel streaming attention-based sequence-to-sequence (S2...

0 Hirofumi Inaguma, et al. ∙

research

∙ 10/01/2019

Multilingual End-to-End Speech Translation

In this paper, we propose a simple yet effective framework for multiling...

0 Hirofumi Inaguma, et al. ∙

research

∙ 09/22/2019

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR

Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) sys...

0 Hirofumi Inaguma, et al. ∙

research

∙ 09/13/2019

A Comparative Study on Transformer vs RNN in Speech Applications

Sequence-to-sequence models have been widely used in end-to-end speech p...

0 Shigeki Karita, et al. ∙

research

∙ 11/06/2018

Language model integration based on memory control for sequence to sequence speech recognition

In this paper, we explore several new schemes to train a seq2seq model t...

0 Jaejin Cho, et al. ∙

research

∙ 11/06/2018

Transfer learning of language-independent end-to-end ASR with language model fusion

This work explores better adaptation methods to low-resource languages u...

0 Hirofumi Inaguma, et al. ∙

Hirofumi Inaguma

Featured Co-authors

Sign in with Google

Consider DeepAI Pro