ViraPart: A Text Refinement Framework for ASR and NLP Tasks in Persian

10/18/2021
by   Narges Farokhshad, et al.
0

The Persian language is an inflectional SOV language. This fact makes Persian a more uncertain language. However, using techniques such as ZWNJ recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian following by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90 Experimental results show that our proposed approach is very effective in text refinement for the Persian language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2018

Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text

Yorùbá is a widely spoken West African language with a writing system ri...
research
07/04/2020

Robust Prediction of Punctuation and Truecasingfor Medical ASR

Automatic speech recognition (ASR) systems in the medical domain that fo...
research
07/04/2020

Robust Prediction of Punctuation and Truecasing for Medical ASR

Automatic speech recognition (ASR) systems in the medical domain that fo...
research
10/01/2021

Improving Punctuation Restoration for Speech Transcripts via External Data

Automatic Speech Recognition (ASR) systems generally do not produce punc...
research
02/14/2022

Punctuation restoration in Swedish through fine-tuned KB-BERT

Presented here is a method for automatic punctuation restoration in Swed...
research
04/15/2020

Coreferential Reasoning Learning for Language Representation

Language representation models such as BERT could effectively capture co...
research
01/08/2022

Beyond modeling: NLP Pipeline for efficient environmental policy analysis

As we enter the UN Decade on Ecosystem Restoration, creating effective i...

Please sign up or login with your details

Forgot password? Click here to reset