Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models

10/01/2020
by   Thai Binh Nguyen, et al.
0

Studies on the Named Entity Recognition (NER) task have shown outstanding results that reach human parity on input texts with correct text formattings, such as with proper punctuation and capitalization. However, such conditions are not available in applications where the input is speech, because the text is generated from a speech recognition system (ASR), and that the system does not consider the text formatting. In this paper, we (1) presented the first Vietnamese speech dataset for NER task, and (2) the first pre-trained public large-scale monolingual language model for Vietnamese that achieved the new state-of-the-art for the Vietnamese NER task by 1.3 comparing to the latest study. And finally, (3) we proposed a new pipeline for NER task from speech that overcomes the text formatting problem by introducing a text capitalization and punctuation recovery model (CaPu) into the pipeline. The model takes input text from an ASR system and performs two tasks at the same time, producing proper text formatting that helps to improve NER performance. Experimental results indicated that the CaPu model helps to improve by nearly 4

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2020

End-to-end Named Entity Recognition from English Speech

Named entity recognition (NER) from text has been a widely studied probl...
research
03/27/2019

ner and pos when nothing is capitalized

For those languages which use it, capitalization is an important signal ...
research
08/06/2021

Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Document digitization is essential for the digital transformation of our...
research
08/15/2021

DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants

Named entity recognition (NER) is usually developed and tested on text f...
research
04/10/2020

One Model to Recognize Them All: Marginal Distillation from NER Models with Different Tag Sets

Named entity recognition (NER) is a fundamental component in the modern ...
research
09/02/2020

ASTRAL: Adversarial Trained LSTM-CNN for Named Entity Recognition

Named Entity Recognition (NER) is a challenging task that extracts named...
research
02/27/2019

F10-SGD: Fast Training of Elastic-net Linear Models for Text Classification and Named-entity Recognition

Voice-assistants text classification and named-entity recognition (NER) ...

Please sign up or login with your details

Forgot password? Click here to reset