BERT-LID: Leveraging BERT to Improve Spoken Language Identification

by   Yuting Nie, et al.

Language identification is a task of automatically determining the identity of a language conveyed by a spoken segment. It has a profound impact on the multilingual interoperability of an intelligent speech system. Despite language identification attaining high accuracy on medium or long utterances (>3s), the performance on short utterances (<=1s) is still far from satisfactory. We propose an effective BERT-based language identification system (BERT-LID) to improve language identification performance, especially on short-duration speech segments. To adapt BERT into the LID pipeline, we drop in a conjunction network prior to BERT to accommodate the frame-level Phonetic Posteriorgrams(PPG) derived from the frontend phone recognizer and then fine-tune the conjunction network and BERT pre-trained model together. We evaluate several variations within this piped framework, including combining BERT with CNN, LSTM, DPCNN, and RCNN. The experimental results demonstrate that the best-performing model is RCNN-BERT. Compared with the prior works, our RCNN-BERT model can improve the accuracy by about 5 identification and 18 our model, especially on the short-segment task, demonstrates the applicability of our proposed BERT-based approach on language identification.


page 1

page 2

page 3

page 4


KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media

In this paper, we describe our approach to utilize pre-trained BERT mode...

To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding?

This paper addresses the question as to what degree a BERT-based multili...

Recognizing Arrow Of Time In The Short Stories

Recognizing arrow of time in short stories is a challenging task. i.e., ...

Towards spoken dialect identification of Irish

The Irish language is rich in its diversity of dialects and accents. Thi...

What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

Experiments with transfer learning on pre-trained language models such a...

Speech BERT Embedding For Improving Prosody in Neural TTS

This paper presents a speech BERT model to extract embedded prosody info...

AmberNet: A Compact End-to-End Model for Spoken Language Identification

We present AmberNet, a compact end-to-end neural network for Spoken Lang...

Please sign up or login with your details

Forgot password? Click here to reset