Restoring Hebrew Diacritics Without a Dictionary

05/11/2021
by   Elazar Gershuni, et al.
0

We demonstrate that it is feasible to diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/11/2016

Recurrent Memory Array Structures

The following report introduces ideas augmenting standard Long Short Ter...
10/09/2020

Learning to Pronounce Chinese Without a Pronunciation Dictionary

We demonstrate a program that learns to pronounce Chinese text in Mandar...
06/28/2018

Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

Due to the fact that Korean is a highly agglutinative, character-rich la...
01/11/2020

Authorship Attribution in Bangla literature using Character-level CNN

Characters are the smallest unit of text that can extract stylometric si...
07/03/2017

Multiscale sequence modeling with a learned dictionary

We propose a generalization of neural network sequence models. Instead o...
06/02/2021

BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Adversarial attacks expose important blind spots of deep learning system...