A Character-Word Compositional Neural Language Model for Finnish

12/10/2016
by   Matti Lankinen, et al.
0

Inspired by recent research, we explore ways to model the highly morphological Finnish language at the level of characters while maintaining the performance of word-level models. We propose a new Character-to-Word-to-Character (C2W2C) compositional language model that uses characters as input and output while still internally processing word level embeddings. Our preliminary experiments, using the Finnish Europarl V7 corpus, indicate that C2W2C can respond well to the challenges of morphologically rich languages such as high out of vocabulary rates, the prediction of novel words, and growing vocabulary size. Notably, the model is able to correctly score inflectional forms that are not present in the training data and sample grammatically and semantically correct Finnish sentences character by character.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2017

Character-Word LSTM Language Models

We present a Character-Word Long Short-Term Memory Language Model which ...
research
08/09/2015

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

We introduce a model for constructing vector representations of words by...
research
09/24/2020

Grounded Compositional Outputs for Adaptive Language Modeling

Language models have emerged as a central component across NLP, and a gr...
research
08/18/2017

Syllable-level Neural Language Model for Agglutinative Language

Language models for agglutinative languages have always been hindered in...
research
12/03/2018

Comparing Neural- and N-Gram-Based Language Models for Word Segmentation

Word segmentation is the task of inserting or deleting word boundary cha...
research
03/12/2019

Character Eyes: Seeing Language through Character-Level Taggers

Character-level models have been used extensively in recent years in NLP...
research
07/20/2017

A Sub-Character Architecture for Korean Language Processing

We introduce a novel sub-character architecture that exploits a unique c...

Please sign up or login with your details

Forgot password? Click here to reset