Log In Sign Up

Exploring Multilingual Syntactic Sentence Representations

by   Chen Liu, et al.

We study methods for learning sentence embeddings with syntactic structure. We focus on methods of learning syntactic sentence-embeddings by using a multilingual parallel-corpus augmented by Universal Parts-of-Speech tags. We evaluate the quality of the learned embeddings by examining sentence-level nearest neighbours and functional dissimilarity in the embedding space. We also evaluate the ability of the method to learn syntactic sentence-embeddings for low-resource languages and demonstrate strong evidence for transfer learning. Our results show that syntactic sentence-embeddings can be learned while using less training data, fewer model parameters, and resulting in better evaluation metrics than state-of-the-art language models.


page 1

page 2

page 3

page 4


Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Existing models of multilingual sentence embeddings require large parall...

Evaluation of Sentence Representations in Polish

Methods for learning sentence representations have been actively develop...

drsphelps at SemEval-2022 Task 2: Learning idiom representations using BERTRAM

This paper describes our system for SemEval-2022 Task 2 Multilingual Idi...

A Stronger Baseline for Multilingual Word Embeddings

Levy, Søgaard and Goldberg's (2017) S-ID (sentence ID) method applies wo...

Hierarchical Document Encoder for Parallel Corpus Mining

We explore using multilingual document embeddings for nearest neighbor m...

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

We present an easy and efficient method to extend existing sentence embe...

Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax

In this paper, we present an approach to learn multilingual sentence emb...