Evaluation of Sentence Representations in Polish

10/25/2019
by   Sławomir Dadas, et al.
0

Methods for learning sentence representations have been actively developed in recent years. However, the lack of pre-trained models and datasets annotated at the sentence level has been a problem for low-resource languages such as Polish which led to less interest in applying these methods to language-specific tasks. In this study, we introduce two new Polish datasets for evaluating sentence embeddings and provide a comprehensive evaluation of eight sentence representation methods including Polish and multilingual models. We consider classic word embedding models, recently developed contextual embeddings and multilingual sentence encoders, showing strengths and weaknesses of specific approaches. We also examine different methods of aggregating word vectors into a single sentence vector.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Exploring Multilingual Syntactic Sentence Representations

We study methods for learning sentence embeddings with syntactic structu...
research
08/15/2016

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

There is a lot of research interest in encoding variable length sentence...
research
06/16/2020

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Sentence encoders map sentences to real valued vectors for use in downst...
research
01/29/2019

No Training Required: Exploring Random Encoders for Sentence Classification

We explore various methods for computing sentence representations from p...
research
06/04/2019

Pitfalls in the Evaluation of Sentence Embeddings

Deep learning models continuously break new records across different NLP...
research
02/13/2018

Sentence Boundary Detection for French with Subword-Level Information Vectors and Convolutional Neural Networks

In this work we tackle the problem of sentence boundary detection applie...
research
02/16/2021

Revisiting Language Encoding in Learning Multilingual Representations

Transformer has demonstrated its great power to learn contextual word re...

Please sign up or login with your details

Forgot password? Click here to reset