ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

12/12/2021
by   Holy Lovenia, et al.
7

Code-switching is a speech phenomenon when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data through read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong. We report ASCEND's design and procedure of collecting the speech data, including the annotations in this work. ASCEND includes 23 bilinguals that are fluent in both Chinese and English and consists of 10.62 hours clean speech corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2022

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

Code-switching automatic speech recognition becomes one of the most chal...
research
04/17/2021

GupShup: An Annotated Corpus for Abstractive Summarization of Open-Domain Code-Switched Conversations

Code-switching is the communication phenomenon where speakers switch bet...
research
05/19/2023

MultiTurnCleanup: A Benchmark for Multi-Turn Spoken Conversational Transcript Cleanup

Current disfluency detection models focus on individual utterances each ...
research
06/20/2022

Bilingual by default: Voice Assistants and the role of code-switching in creating a bilingual user experience

Conversational User Interfaces such as Voice Assistants are hugely popul...
research
03/01/2022

Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech

People spend a substantial portion of their lives engaged in conversatio...
research
03/24/2017

Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching

Code-switching is the phenomenon by which bilingual speakers switch betw...
research
12/07/2022

Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

In most cases, bilingual TTS needs to handle three types of input script...

Please sign up or login with your details

Forgot password? Click here to reset