Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit

09/05/2018
by   Amrith Krishna, et al.
0

The configurational information in sentences of a free word order language such as Sanskrit is of limited use. Thus, the context of the entire sentence will be desirable even for basic processing tasks such as word segmentation. We propose a structured prediction framework that jointly solves the word segmentation and morphological tagging tasks in Sanskrit. We build an energy based model where we adopt approaches generally employed in graph based parsing techniques (McDonald et al., 2005a; Carreras, 2007). Our model outperforms the state of the art with an F-Score of 96.92 (percentage improvement of 7.06 while using less than one-tenth of the task-specific training data. We find that the use of a graph based ap- proach instead of a traditional lattice-based sequential labelling approach leads to a percentage gain of 12.6 for the segmentation task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2018

A neural joint model for Vietnamese word segmentation, POS tagging and dependency parsing

We propose the first joint model for Vietnamese word segmentation, part-...
research
01/30/2022

Word Segmentation and Morphological Parsing for Sanskrit

We describe our participation in the Word Segmentation and Morphological...
research
11/14/2017

From Word Segmentation to POS Tagging for Vietnamese

This paper presents an empirical comparison of two strategies for Vietna...
research
04/24/2017

A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

In this paper, we introduce a trie-structured Bayesian model for unsuper...
research
03/31/2021

Joint Khmer Word Segmentation and Part-of-Speech Tagging Using Deep Learning

Khmer text is written from left to right with optional space. Space is n...
research
12/16/2015

Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

Morpho-syntactic lexicons provide information about the morphological an...
research
02/17/2018

Building a Word Segmenter for Sanskrit Overnight

There is an abundance of digitised texts available in Sanskrit. However,...

Please sign up or login with your details

Forgot password? Click here to reset