
Coupled Recurrent Network (CRN)
Many semantic video analysis tasks can benefit from multiple, heterogeno...
read it

A Neural Transducer
Sequencetosequence models have achieved impressive results on various ...
read it

Quaternion Recurrent Neural Networks
Recurrent neural networks (RNNs) are powerful architectures to model seq...
read it

SparseGAN: Sparse Generative Adversarial Network for Text Generation
It is still a challenging task to learn a neural text generation model u...
read it

Learning Longerterm Dependencies in RNNs with Auxiliary Losses
Despite recent advances in training recurrent neural networks (RNNs), ca...
read it

SEARNN: Training RNNs with GlobalLocal Losses
We propose SEARNN, a novel training algorithm for recurrent neural netwo...
read it

Universal Transforming Geometric Network
The recurrent geometric network (RGN), the first endtoend differentiab...
read it
Alleviate Exposure Bias in Sequence Prediction with Recurrent Neural Networks
A popular strategy to train recurrent neural networks (RNNs), known as “teacher forcing” takes the ground truth as input at each time step and makes the later predictions partly conditioned on those inputs. Such training strategy impairs their ability to learn rich distributions over entire sequences because the chosen inputs hinders the gradients backpropagating to all previous states in an endtoend manner. We propose a fully differentiable training algorithm for RNNs to better capture longterm dependencies by recovering the probability of the whole sequence. The key idea is that at each time step, the network takes as input a “bundle” of similar words predicted at the previous step instead of a single ground truth. The representations of these similar words forms a convex hull, which can be taken as a kind of regularization to the input. Smoothing the inputs by this way makes the whole process trainable and differentiable. This design makes it possible for the model to explore more feasible combinations (possibly unseen sequences), and can be interpreted as a computationally efficient approximation to the beam search. Experiments on multiple sequence generation tasks yield performance improvements, especially in sequencelevel metrics, such as BLUE or ROUGE2.
READ FULL TEXT
Comments
There are no comments yet.