Context based lemmatizer for Polish language

07/23/2022
by   Michał Karwatowski, et al.
0

Lemmatization is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. In computational linguistics, lemmatisation is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatisation depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence. As a result, developing efficient lemmatisation algorithm is the complex task. In recent years it can be observed that deep learning models used for this task outperform other methods including machine learning algorithms. In this paper the polish lemmatizer based on Google T5 model is presented. The training was run with different context lengths. The model achieves the best results for polish language lemmatisation process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2022

HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection

The same multi-word expressions may have different meanings in different...
research
08/10/2015

Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning

Deep compositional models of meaning acting on distributional representa...
research
04/16/2021

LU-BZU at SemEval-2021 Task 2: Word2Vec and Lemma2Vec performance in Arabic Word-in-Context disambiguation

This paper presents a set of experiments to evaluate and compare between...
research
09/30/2018

Zero-training Sentence Embedding via Orthogonal Basis

We propose a simple and robust training-free approach for building sente...
research
08/22/2019

Unsupervised Lemmatization as Embeddings-Based Word Clustering

We focus on the task of unsupervised lemmatization, i.e. grouping togeth...
research
04/21/2022

An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications

Both humans and machines learn the meaning of unknown words through cont...
research
12/15/2022

Using Two Losses and Two Datasets Simultaneously to Improve TempoWiC Accuracy

WSD (Word Sense Disambiguation) is the task of identifying which sense o...

Please sign up or login with your details

Forgot password? Click here to reset