Method of noun phrase detection in Ukrainian texts

10/22/2020
by   S. D. Pogorilyy, et al.
0

Introduction. The area of natural language processing considers AI-complete tasks that cannot be solved using traditional algorithmic actions. Such tasks are commonly implemented with the usage of machine learning methodology and means of computer linguistics. One of the preprocessing tasks of a text is the search of noun phrases. The accuracy of this task has implications for the effectiveness of many other tasks in the area of natural language processing. In spite of the active development of research in the area of natural language processing, the investigation of the search for noun phrases within Ukrainian texts are still at an early stage. Results. The different methods of noun phrases detection have been analyzed. The expediency of the representation of sentences as a tree structure has been justified. The key disadvantage of many methods of noun phrase detection is the severe dependence of the effectiveness of their detection from the features of a certain language. Taking into account the unified format of sentence processing and the availability of the trained model for the building of sentence trees for Ukrainian texts, the Universal Dependency model has been chosen. The complex method of noun phrases detection in Ukrainian texts utilizing Universal Dependencies means and named-entity recognition model has been suggested. Experimental verification of the effectiveness of the suggested method on the corpus of Ukrainian news has been performed. Different metrics of method accuracy have been calculated. Conclusions. The results obtained can indicate that the suggested method can be used to find noun phrases in Ukrainian texts. An accuracy increase of the method can be made with the usage of appropriate named-entity recognition models according to a subject area.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2023

A Survey of Resources and Methods for Natural Language Processing of Serbian Language

The Serbian language is a Slavic language spoken by over 12 million spea...
research
03/19/2020

NSURL-2019 Task 7: Named Entity Recognition (NER) in Farsi

NSURL-2019 Task 7 focuses on Named Entity Recognition (NER) in Farsi. Th...
research
09/01/2021

Latin writing styles analysis with Machine Learning: New approach to old questions

In the Middle Ages texts were learned by heart and spread using oral mea...
research
07/08/2020

Research on multi-dimensional end-to-end phrase recognition algorithm based on background knowledge

At present, the deep end-to-end method based on supervised learning is u...
research
07/07/2020

Research on Annotation Rules and Recognition Algorithm Based on Phrase Window

At present, most Natural Language Processing technology is based on the ...
research
08/09/2017

KeyXtract Twitter Model - An Essential Keywords Extraction Model for Twitter Designed using NLP Tools

Since a tweet is limited to 140 characters, it is ambiguous and difficul...
research
12/01/2021

Building astroBERT, a language model for Astronomy Astrophysics

The existing search tools for exploring the NASA Astrophysics Data Syste...

Please sign up or login with your details

Forgot password? Click here to reset