Semi-supervised Sequence Learning

11/04/2015
by   Andrew M. Dai, et al.
0

We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a "pretraining" step for a later supervised sequence learning algorithm. In other words, the parameters obtained from the unsupervised step can be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better. With pretraining, we are able to train long short term memory recurrent networks up to a few hundred timesteps, thereby achieving strong performance in many text classification tasks, such as IMDB, DBpedia and 20 Newsgroups.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2015

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Because of their superior ability to preserve sequence information over ...
research
08/16/2017

Deconvolutional Paragraph Representation Learning

Learning latent representations from long text sequences is an important...
research
11/05/2020

Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization

Training RNNs to learn long-term dependencies is difficult due to vanish...
research
07/02/2018

Punctuation Prediction Model for Conversational Speech

An ASR system usually does not predict any punctuation or capitalization...
research
07/25/2023

Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT

In the field of medical imaging, 3D deep learning models play a crucial ...
research
02/23/2017

Are Emojis Predictable?

Emojis are ideograms which are naturally combined with plain text to vis...
research
05/22/2023

RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text

The fixed-size context of Transformer makes GPT models incapable of gene...

Please sign up or login with your details

Forgot password? Click here to reset