Sanskrit Segmentation Revisited

05/13/2020
by   Sriram Krishnan, et al.
0

Computationally analyzing Sanskrit texts requires proper segmentation in the initial stages. There have been various tools developed for Sanskrit text segmentation. Of these, Gérard Huet's Reader in the Sanskrit Heritage Engine analyzes the input text and segments it based on the word parameters - phases like iic, ifc, Pr, Subst, etc., and sandhi (or transition) that takes place at the end of a word with the initial part of the next word. And it enlists all the possible solutions differentiating them with the help of the phases. The phases and their analyses have their use in the domain of sentential parsers. In segmentation, though, they are not used beyond deciding whether the words formed with the phases are morphologically valid. This paper tries to modify the above segmenter by ignoring the phase details (except for a few cases), and also proposes a probability function to prioritize the list of solutions to bring up the most valid solutions at the top.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2020

Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts

Texts obtained from web are noisy and do not necessarily follow the orth...
research
08/04/2016

Word Segmentation on Micro-blog Texts with External Lexicon and Heterogeneous Data

This paper describes our system designed for the NLPCC 2016 shared task ...
research
09/06/2011

Devnagari document segmentation using histogram approach

Document segmentation is one of the critical phases in machine recogniti...
research
11/28/2018

Phase Collaborative Network for Multi-Phase Medical Imaging Segmentation

Integrating multi-phase information is an effective way of boosting visu...
research
02/17/2018

Building a Word Segmenter for Sanskrit Overnight

There is an abundance of digitised texts available in Sanskrit. However,...
research
04/21/2016

Evaluation of the Effect of Improper Segmentation on Word Spotting

Word spotting is an important recognition task in historical document an...
research
04/06/2023

Affect as a proxy for literary mood

We propose to use affect as a proxy for mood in literary texts. In this ...

Please sign up or login with your details

Forgot password? Click here to reset