Stem-driven Language Models for Morphologically Rich Languages

10/25/2019
by   Yash Shah, et al.
0

Neural language models (LMs) have shown to benefit significantly from enhancing word vectors with subword-level information, especially for morphologically rich languages. This has been mainly tackled by providing subword-level information as an input; using subword units in the output layer has been far less explored. In this work, we propose LMs that are cognizant of the underlying stems in each word. We derive stems for words using a simple unsupervised technique for stem identification. We experiment with different architectures involving multi-task learning and mixture models over words and stems. We focus on four morphologically complex languages – Hindi, Tamil, Kannada and Finnish – and observe significant perplexity gains with using our stem-driven LMs when compared with other competitive baseline models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2018

Reusing Weights in Subword-aware Neural Language Models

We propose several ways of reusing subword embeddings and other weights ...
research
01/17/2022

Handling Compounding in Mobile Keyboard Input

This paper proposes a framework to improve the typing experience of mobi...
research
11/27/2021

Language models in word sense disambiguation for Polish

In the paper, we test two different approaches to the unsupervised word ...
research
11/03/2022

Logographic Information Aids Learning Better Representations for Natural Language Inference

Statistical language models conventionally implement representation lear...
research
04/19/2022

Impact of Tokenization on Language Models: An Analysis for Turkish

Tokenization is an important text preprocessing step to prepare input to...
research
08/18/2015

Probabilistic Modelling of Morphologically Rich Languages

This thesis investigates how the sub-structure of words can be accounted...
research
05/23/2023

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

While many languages possess processes of joining two or more words to c...

Please sign up or login with your details

Forgot password? Click here to reset