Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus

We introduced the contemporary Amharic corpus, which is automatically tagged for morpho-syntactic information. Texts are collected from 25,199 documents from different domains and about 24 million orthographic words are tokenized. Since it is partly a web corpus, we made some automatic spelling error correction. We have also modified the existing morphological analyzer, HornMorpho, to use it for the automatic tagging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2016

1.5 billion words Arabic Corpus

This study is an attempt to build a contemporary linguistic corpus for A...
research
11/18/2022

Corpus non alignés et ADT. Essai de comparaison entre les présidents français et brésiliens de l'ère contemporaine

Is there an ADT method that can deal with non-aligned bilingual corpora?...
research
07/16/2018

The EcoLexicon English Corpus as an open corpus in Sketch Engine

The EcoLexicon English Corpus (EEC) is a 23.1-million-word corpus of con...
research
05/20/2016

As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in Serbian

Similes are natural language expressions used to compare unlikely things...
research
11/22/2018

Creating a contemporary corpus of similes in Serbian by using natural language processing

Simile is a figure of speech that compares two things through the use of...
research
10/22/2017

How big is big enough? Unsupervised word sense disambiguation using a very large corpus

In this paper, the problem of disambiguating a target word for Polish is...
research
03/06/2016

Semi-Automatic Data Annotation, POS Tagging and Mildly Context-Sensitive Disambiguation: the eXtended Revised AraMorph (XRAM)

An extended, revised form of Tim Buckwalter's Arabic lexical and morphol...

Please sign up or login with your details

Forgot password? Click here to reset