Paraphrastic Representations at Scale

04/30/2021
by   John Wieting, et al.
0

We present a system that allows users to train their own state-of-the-art paraphrastic sentence representations in a variety of languages. We also release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese. We train these models on large amounts of data, achieving significantly improved performance from the original papers proposing the methods on a suite of monolingual semantic similarity, cross-lingual semantic similarity, and bitext mining tasks. Moreover, the resulting models surpass all prior work on unsupervised semantic textual similarity, significantly outperforming even BERT-based models like Sentence-BERT (Reimers and Gurevych, 2019). Additionally, our models are orders of magnitude faster than prior work and can be used on CPU with little difference in inference speed (even improved speed over GPU when using more CPU cores), making these models an attractive choice for users without access to GPUs or for use on embedded devices. Finally, we add significantly increased functionality to the code bases for training paraphrastic sentence models, easing their use for both inference and for training them for any desired language with parallel data. We also include code to automatically download and preprocess training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2023

L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT

The multilingual Sentence-BERT (SBERT) models map different languages to...
research
06/10/2021

Cross-lingual Emotion Detection

Emotion detection is of great importance for understanding humans. Const...
research
07/11/2018

Linear Transformations for Cross-lingual Semantic Textual Similarity

Cross-lingual semantic textual similarity systems estimate the degree of...
research
09/01/2021

Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast

In this paper, we propose to align sentence representations from differe...
research
03/16/2017

Neobility at SemEval-2017 Task 1: An Attention-based Sentence Similarity Model

This paper describes a neural-network model which performed competitivel...
research
04/09/2018

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

In this paper, we explore the learning of neural network embeddings for ...

Please sign up or login with your details

Forgot password? Click here to reset