DeepAI
Log In Sign Up

Uzbek Cyrillic-Latin-Cyrillic Machine Transliteration

01/13/2021
by   B. Mansurov, et al.
0

In this paper, we introduce a data-driven approach to transliterating Uzbek dictionary words from the Cyrillic script into the Latin script, and vice versa. We heuristically align characters of words in the source script with sub-strings of the corresponding words in the target script and train a decision tree classifier that learns these alignments. On the test set, our Cyrillic to Latin model achieves a character level micro-averaged F1 score of 0.9992, and our Latin to Cyrillic model achieves the score of 0.9959. Our contribution is a novel method of producing machine transliterated texts for the low-resource Uzbek language.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/15/2022

ML_LTU at SemEval-2022 Task 4: T5 Towards Identifying Patronizing and Condescending Language

This paper describes the system used by the Machine Learning Group of LT...
11/12/2020

Inference-only sub-character decomposition improves translation of unseen logographic characters

Neural Machine Translation (NMT) on logographic source languages struggl...
09/20/2021

Language Identification with a Reciprocal Rank Classifier

Language identification is a critical component of language processing p...
02/05/2021

Spell Correction for Azerbaijani Language using Deep Neural Networks

Spell correction is used to detect and correct orthographic mistakes in ...
04/10/2022

Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts

Despite the advances in digital healthcare systems offering curated stru...
06/13/2021

SASICM A Multi-Task Benchmark For Subtext Recognition

Subtext is a kind of deep semantics which can be acquired after one or m...
09/12/2019

CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction

In this paper, we describe our systems submitted to the Building Educati...