Takuya Yoshioka

research

∙ 09/14/2023

DiariST: Streaming Speech Translation with Speaker Diarization

End-to-end speech translation (ST) for conversation recordings involves ...

0 Mu Yang, et al. ∙

research

∙ 05/23/2023

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Artificial General Intelligence (AGI) requires comprehensive understandi...

0 Yuwei Fang, et al. ∙

research

∙ 05/21/2023

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

The convergence of text, visual, and audio data is a key step towards hu...

0 ZiYi Yang, et al. ∙

research

∙ 11/18/2022

Exploring WavLM on Speech Enhancement

There is a surge in interest in self-supervised learning approaches for ...

0 Hyungchan Song, et al. ∙

research

∙ 11/11/2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts

Several trade-offs need to be balanced when employing monaural speech se...

0 Xiaofei Wang, et al. ∙

research

∙ 11/09/2022

Speech separation with large-scale self-supervised learning

Self-supervised learning (SSL) methods such as WavLM have shown promisin...

0 Zhuo Chen, et al. ∙

research

∙ 11/05/2022

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Personalized speech enhancement (PSE) models achieve promising results c...

0 Hassan Taherian, et al. ∙

research

∙ 11/04/2022

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net

Personalized speech enhancement (PSE), a process of estimating a clean t...

0 Sefik Emre Eskimez, et al. ∙

research

∙ 11/04/2022

Real-Time Target Sound Extraction

We present the first neural network model to achieve real-time and strea...

0 Bandhav Veluri, et al. ∙

research

∙ 10/27/2022

Simulating realistic speech overlaps improves multi-talker ASR

Multi-talker automatic speech recognition (ASR) has been studied to gene...

0 Muqiao Yang, et al. ∙

research

∙ 09/12/2022

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

This paper presents a novel streaming automatic speech recognition (ASR)...

6 Naoyuki Kanda, et al. ∙

research

∙ 08/27/2022

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

This paper describes a speaker diarization model based on target speaker...

0 Dongmei Wang, et al. ∙

research

∙ 05/03/2022

i-Code: An Integrative and Composable Multimodal Learning Framework

Human intelligence is multimodal; we integrate visual, linguistic, and a...

1 ZiYi Yang, et al. ∙

research

∙ 04/27/2022

Ultra Fast Speech Separation Model with Teacher Student Learning

Transformer has been successfully applied to speech separation recently ...

0 Sanyuan Chen, et al. ∙

research

∙ 04/07/2022

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Existing multi-channel continuous speech separation (CSS) models are hea...

0 Xiaofei Wang, et al. ∙

research

∙ 04/02/2022

Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation

This paper investigates how to improve the runtime speed of personalized...

0 Manthan Thakker, et al. ∙

research

∙ 03/30/2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

This paper presents a streaming speaker-attributed automatic speech reco...

0 Naoyuki Kanda, et al. ∙

research

∙ 02/27/2022

ICASSP 2022 Deep Noise Suppression Challenge

The Deep Noise Suppression (DNS) challenge is designed to foster innovat...

0 Harishchandra Dubey, et al. ∙

research

∙ 02/02/2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

This paper proposes a token-level serialized output training (t-SOT), a ...

0 Naoyuki Kanda, et al. ∙

research

∙ 01/24/2022

PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

This paper proposes PickNet, a neural network model for real-time channe...

0 Takuya Yoshioka, et al. ∙

research

∙ 10/28/2021

Continuous Speech Separation with Recurrent Selective Attention Network

While permutation invariant training (PIT) based continuous speech separ...

0 Yixuan Zhang, et al. ∙

research

∙ 10/27/2021

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Multi-talker conversational speech processing has drawn many interests f...

0 Wangyou Zhang, et al. ∙

research

∙ 10/26/2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Self-supervised learning (SSL) achieves great success in speech recognit...

0 Sanyuan Chen, et al. ∙

research

∙ 10/20/2021

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

With the recent surge of video conferencing tools usage, providing high-...

0 Hassan Taherian, et al. ∙

research

∙ 10/18/2021

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Personalized speech enhancement (PSE) models utilize additional cues, su...

0 Sefik Emre Eskimez, et al. ∙

research

∙ 10/13/2021

All-neural beamformer for continuous speech separation

Continuous speech separation (CSS) aims to separate overlapping voices f...

0 Zhuohuang Zhang, et al. ∙

research

∙ 10/12/2021

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Continuous speech separation using a microphone array was shown to be pr...

0 Takuya Yoshioka, et al. ∙

research

∙ 10/07/2021

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

This paper presents Transcribe-to-Diarize, a new approach for neural spe...

0 Naoyuki Kanda, et al. ∙

research

∙ 07/06/2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

Speaker-attributed automatic speech recognition (SA-ASR) is a task to re...

0 Naoyuki Kanda, et al. ∙

research

∙ 07/05/2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

Speech separation has been successfully applied as a frontend processing...

0 Jian Wu, et al. ∙

research

∙ 04/05/2021

End-to-End Speaker-Attributed ASR with Transformer

This paper presents our recent effort on end-to-end speaker-attributed a...

0 Naoyuki Kanda, et al. ∙

research

∙ 03/31/2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

Transcribing meetings containing overlapped speech with only a single di...

0 Naoyuki Kanda, et al. ∙

research

∙ 03/03/2021

Continuous Speech Separation with Ad Hoc Microphone Arrays

Speech separation has been shown effective for multi-talker speech recog...

0 Dongmei Wang, et al. ∙

research

∙ 01/06/2021

Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings

An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-...

0 Xuankai Chang, et al. ∙

research

∙ 11/05/2020

Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription

Joint optimization of multi-channel front-end and automatic speech recog...

0 Xiaofei Wang, et al. ∙

research

∙ 11/03/2020

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

Recently, an end-to-end speaker-attributed automatic speech recognition ...

0 Naoyuki Kanda, et al. ∙

research

∙ 11/03/2020

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

Multi-speaker speech recognition of unsegmented recordings has diverse a...

0 Desh Raj, et al. ∙

research

∙ 10/23/2020

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

With its strong modeling capacity that comes from a multi-head and multi...

0 Sanyuan Chen, et al. ∙

research

∙ 10/22/2020

Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020

This paper describes the Microsoft speaker diarization system for monaur...

0 Xiong Xiao, et al. ∙

research

∙ 09/07/2020

An End-to-end Architecture of Online Multi-channel Speech Separation

Multi-speaker speech recognition has been one of the keychallenges in co...

0 Jian Wu, et al. ∙

research

∙ 08/11/2020

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

Recently, an end-to-end (E2E) speaker-attributed automatic speech recogn...

0 Naoyuki Kanda, et al. ∙

research

∙ 06/19/2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

We propose an end-to-end speaker-attributed automatic speech recognition...

0 Naoyuki Kanda, et al. ∙

research

∙ 04/28/2020

Neural Speech Separation Using Spatially Distributed Microphones

This paper proposes a neural network based speech separation method usin...

0 Dongmei Wang, et al. ∙

research

∙ 03/28/2020

Serialized Output Training for End-to-End Overlapped Speech Recognition

This paper proposes serialized output training (SOT), a novel framework ...

0 Naoyuki Kanda, et al. ∙

research

∙ 01/30/2020

Continuous speech separation: dataset and analysis

This paper describes a dataset and protocols for evaluating continuous s...

0 Zhuo Chen, et al. ∙

research

∙ 12/10/2019

Advances in Online Audio-Visual Meeting Transcription

This paper describes a system that generates speaker-annotated transcrip...

15 Takuya Yoshioka, et al. ∙

research

∙ 10/30/2019

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

An important problem in ad-hoc microphone speech separation is how to gu...

0 Yi Luo, et al. ∙

research

∙ 10/14/2019

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation

Recent studies in deep learning-based speech separation have proven the ...

0 Yi Luo, et al. ∙

research

∙ 09/17/2019

DOVER: A Method for Combining Diarization Outputs

Speech recognition and other natural language tasks have long benefited ...

0 Andreas Stolcke, et al. ∙

research

∙ 05/03/2019

Meeting Transcription Using Virtual Microphone Arrays

We describe a system that generates speaker-annotated transcripts of mee...

0 Takuya Yoshioka, et al. ∙

Takuya Yoshioka

Featured Co-authors

Sign in with Google

Consider DeepAI Pro