Skip-Thought Vectors

06/22/2015
by   Ryan Kiros, et al.
0

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets. The end result is an off-the-shelf encoder that can produce highly generic sentence representations that are robust and perform well in practice. We will make our encoder publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2017

Trimming and Improving Skip-thought Vectors

The skip-thought model has been proven to be effective at learning sente...
research
11/23/2016

Learning Generic Sentence Representations Using Convolutional Neural Networks

We propose a new encoder-decoder approach to learn distributed sentence ...
research
05/17/2017

Decoding Sentiment from Distributed Representations of Sentences

Distributed representations of sentences have been developed recently to...
research
02/20/2020

Contextual Lensing of Universal Sentence Representations

What makes a universal sentence encoder universal? The notion of a gener...
research
06/27/2016

Network-Efficient Distributed Word2vec Training System for Large Vocabularies

Word2vec is a popular family of algorithms for unsupervised training of ...
research
06/09/2017

Rethinking Skip-thought: A Neighborhood based Approach

We study the skip-thought model with neighborhood information as weak su...
research
04/18/2017

Representing Sentences as Low-Rank Subspaces

Sentences are important semantic units of natural language. A generic, d...

Please sign up or login with your details

Forgot password? Click here to reset