Contextual Lensing of Universal Sentence Representations

02/20/2020
by   Jamie Kiros, et al.
0

What makes a universal sentence encoder universal? The notion of a generic encoder of text appears to be at odds with the inherent contextualization and non-permanence of language use in a dynamic world. However, mapping sentences into generic fixed-length vectors for downstream similarity and retrieval tasks has been fruitful, particularly for multilingual applications. How do we manage this dilemma? In this work we propose Contextual Lensing, a methodology for inducing context-oriented universal sentence vectors. We break the construction of universal sentence vectors into a core, variable length, sentence matrix representation equipped with an adaptable `lens' from which fixed-length vectors can be induced as a function of the lens context. We show that it is possible to focus notions of language similarity into a small number of lens parameters given a core universal matrix representation. For example, we demonstrate the ability to encode translation similarity of sentences across several languages into a single weight matrix, even when the core encoder has not seen parallel data.

READ FULL TEXT
research
02/03/2023

Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval

Recently multi-lingual pre-trained language models (PLM) such as mBERT a...
research
08/15/2016

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

There is a lot of research interest in encoding variable length sentence...
research
06/22/2015

Skip-Thought Vectors

We describe an approach for unsupervised learning of a generic, distribu...
research
04/18/2017

An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

End-to-end neural machine translation has overtaken statistical machine ...
research
04/22/2021

Universal Horn Sentences and the Joint Embedding Property

The finite models of a universal sentence Φ are the age of a structure i...
research
01/03/2018

Sentence Object Notation: Multilingual sentence notation based on Wordnet

The representation of sentences is a very important task. It can be used...
research
06/13/2019

A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics

Distributed representations of text can be used as features when trainin...

Please sign up or login with your details

Forgot password? Click here to reset