LSTM Network for Inflected Abbreviation Expansion

08/20/2017
by   Piotr Żelasko, et al.
0

In this paper, the problem of recovery of morphological information lost in abbreviated forms is addressed with a focus on highly inflected languages. Evidence is presented that the correct inflected form of an expanded abbreviation can in many cases be deduced solely from morphosyntactic tags of the context. The prediction model is a deep bidirectional LSTM network with tag embedding. The network is trained on over 10 million words from the Polish Sejm Corpus and achieves 74.2% prediction accuracy on a smaller, but more general National Corpus of Polish. Analysis of errors suggests that performance in this task may improve if some prior knowledge about the abbreviated word is incorporated into the model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2016

Neural Morphological Tagging from Characters for Morphologically Rich Languages

This paper investigates neural character-based morphological tagging for...
research
11/21/2018

Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction

Morphological analysis is an important first step in downstream tasks li...
research
07/12/2019

Automated Word Stress Detection in Russian

In this study we address the problem of automated word stress detection ...
research
04/16/2020

Kvistur 2.0: a BiLSTM Compound Splitter for Icelandic

In this paper, we present a character-based BiLSTM model for splitting I...
research
06/05/2022

Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study in Polish

In stylometric investigations, frequencies of the most frequent words (M...
research
12/13/2019

Seizure Prediction Using Bidirectional LSTM

Approximately, 50 million people in the world are affected by epilepsy. ...

Please sign up or login with your details

Forgot password? Click here to reset