DeepAI AI Chat
Log In Sign Up

BERT-LID: Leveraging BERT to Improve Spoken Language Identification

by   Yuting Nie, et al.
Tsinghua University

Language identification is a task of automatically determining the identity of a language conveyed by a spoken segment. It has a profound impact on the multilingual interoperability of an intelligent speech system. Despite language identification attaining high accuracy on medium or long utterances (>3s), the performance on short utterances (<=1s) is still far from satisfactory. We propose an effective BERT-based language identification system (BERT-LID) to improve language identification performance, especially on short-duration speech segments. To adapt BERT into the LID pipeline, we drop in a conjunction network prior to BERT to accommodate the frame-level Phonetic Posteriorgrams(PPG) derived from the frontend phone recognizer and then fine-tune the conjunction network and BERT pre-trained model together. We evaluate several variations within this piped framework, including combining BERT with CNN, LSTM, DPCNN, and RCNN. The experimental results demonstrate that the best-performing model is RCNN-BERT. Compared with the prior works, our RCNN-BERT model can improve the accuracy by about 5 identification and 18 our model, especially on the short-segment task, demonstrates the applicability of our proposed BERT-based approach on language identification.


page 1

page 2

page 3

page 4


KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media

In this paper, we describe our approach to utilize pre-trained BERT mode...

To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding?

This paper addresses the question as to what degree a BERT-based multili...

Recognizing Arrow Of Time In The Short Stories

Recognizing arrow of time in short stories is a challenging task. i.e., ...

What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

Experiments with transfer learning on pre-trained language models such a...

Speech BERT Embedding For Improving Prosody in Neural TTS

This paper presents a speech BERT model to extract embedded prosody info...

Better than BERT but Worse than Baseline

This paper compares BERT-SQuAD and Ab3P on the Abbreviation Definition I...

Enhance Language Identification using Dual-mode Model with Knowledge Distillation

In this paper, we propose to employ a dual-mode framework on the x-vecto...