Syntactic and Semantic Features For Code-Switching Factored Language Models

10/04/2017
by   Heike Adel, et al.
0

This paper presents our latest investigations on different features for factored language models for Code-Switching speech and their effect on automatic speech recognition (ASR) performance. We focus on syntactic and semantic features which can be extracted from Code-Switching text data and integrate them into factored language models. Different possible factors, such as words, part-of-speech tags, Brown word clusters, open class words and clusters of open class word embeddings are explored. The experimental results reveal that Brown word clusters, part-of-speech tags and open-class words are the most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME. In ASR experiments, the model containing Brown word clusters and part-of-speech tags and the model also including clusters of open class word embeddings yield the best mixed error rate results. In summary, the best language model can significantly reduce the perplexity on the SEAME evaluation set by up to 10.8 error rate by up to 3.4

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2018

Code-Switching Detection with Data-Augmented Acoustic and Language Models

In this paper, we investigate the code-switching detection performance o...
research
11/03/2017

Dual Language Models for Code Mixed Speech Recognition

In this work, we present a new approach to language modeling for bilingu...
research
06/16/2023

Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody

This paper investigates the use of word surprisal, a measure of the pred...
research
12/12/2021

Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

This paper presents our latest effort on improving Code-switching langua...
research
08/12/2016

Redefining part-of-speech classes with distributional semantic models

This paper studies how word embeddings trained on the British National C...
research
11/12/2021

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

NLP applications for code-mixed (CM) or mix-lingual text have gained a s...
research
05/20/2020

Investigation of Large-Margin Softmax in Neural Language Modeling

To encourage intra-class compactness and inter-class separability among ...

Please sign up or login with your details

Forgot password? Click here to reset