Meeting Transcription Using Virtual Microphone Arrays

05/03/2019
by   Takuya Yoshioka, et al.
0

We describe a system that generates speaker-annotated transcripts of meetings by using a virtual microphone array, a set of spatially distributed asynchronous recording devices such as laptops and mobile phones. The system is composed of continuous audio stream alignment, blind beamforming, speech recognition, speaker diarization using prior speaker information, and system combination. With seven input audio streams, our system achieves a word error rate (WER) of 22.3 the non-overlapping speech segments. The speaker-attributed WER (SAWER) is 26.7 20.3 presented system achieves a 13.6 duration contains more than one speaker. The contribution of each component to the overall performance is also investigated.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2019

Advances in Online Audio-Visual Meeting Transcription

This paper describes a system that generates speaker-annotated transcrip...
research
03/24/2021

Blind Speech Separation and Dereverberation using Neural Beamforming

In this paper, we present the Blind Speech Separation and Dereverberatio...
research
07/01/2022

Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones

Speaker identification in noisy audio recordings, specifically those fro...
research
04/05/2021

End-to-End Speaker-Attributed ASR with Transformer

This paper presents our recent effort on end-to-end speaker-attributed a...
research
11/16/2022

Exploring Detection-based Method For Speaker Diarization @ Ego4D Audio-only Diarization Challenge 2022

We provide the technical report for Ego4D audio-only diarization challen...
research
10/26/2022

Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting

In the task of speaker diarization, the number of small-scale meetings a...
research
07/12/2019

Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams

Speaker diarization determines who spoke and when? in an audio stream. I...

Please sign up or login with your details

Forgot password? Click here to reset