Polish Read Speech Corpus for Speech Tools and Services

06/01/2017
by   Danijel Koržinek, et al.
0

This paper describes the speech processing activities conducted at the Polish consortium of the CLARIN project. The purpose of this segment of the project was to develop specific tools that would allow for automatic and semi-automatic processing of large quantities of acoustic speech data. The tools include the following: grapheme-to-phoneme conversion, speech-to-text alignment, voice activity detection, speaker diarization, keyword spotting and automatic speech transcription. Furthermore, in order to develop these tools, a large high-quality studio speech corpus was recorded and released under an open license, to encourage development in the area of Polish speech research. Another purpose of the corpus was to serve as a reference for studies in phonetics and pronunciation. All the tools and resources were released on the the Polish CLARIN website. This paper discusses the current status and future plans for the project.

READ FULL TEXT
research
07/07/2022

BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus

BibleTTS is a large, high-quality, open speech dataset for ten languages...
research
11/13/2018

Corpus Phonetics Tutorial

Corpus phonetics has become an increasingly popular method of research i...
research
08/07/2020

CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment

This paper describes the design and development of CUCHILD, a large-scal...
research
02/22/2018

Connecting KOSs and the LOD Cloud

This paper describes a specific project, the current situation leading t...
research
04/11/2021

NeMo Toolbox for Speech Dataset Construction

In this paper, we introduce a new toolbox for constructing speech datase...
research
02/08/2016

The "Sprekend Nederland" project and its application to accent location

This paper describes the data collection effort that is part of the proj...
research
08/04/2023

Adapting the NICT-JLE Corpus for Disfluency Detection Models

The detection of disfluencies such as hesitations, repetitions and false...

Please sign up or login with your details

Forgot password? Click here to reset