Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes

09/18/2019
by   Noémien Kocher, et al.
0

In sequence modeling tasks the token order matters, but this information can be partially lost due to the discretization of the sequence into data points. In this paper, we study the imbalance between the way certain token pairs are included in data points and others are not. We denote this a token order imbalance (TOI) and we link the partial sequence information loss to a diminished performance of the system as a whole, both in text and speech processing tasks. We then provide a mechanism to leverage the full token order information -Alleviated TOI- by iteratively overlapping the token composition of data points. For recurrent networks, we use prime numbers for the batch size to avoid redundancies when building batches from overlapped data points. The proposed method achieved state of the art performance in both text and speech related tasks.

READ FULL TEXT
research
12/20/2022

Empirical Analysis of Limits for Memory Distance in Recurrent Neural Networks

Common to all different kinds of recurrent neural networks (RNNs) is the...
research
10/22/2022

Information-Transport-based Policy for Simultaneous Translation

Simultaneous translation (ST) outputs translation while receiving the so...
research
04/18/2023

Token Imbalance Adaptation for Radiology Report Generation

Imbalanced token distributions naturally exist in text documents, leadin...
research
01/28/2022

Star Temporal Classification: Sequence Classification with Partially Labeled Data

We develop an algorithm which can learn from partially labeled and unseg...
research
09/04/2020

What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

In incremental text to speech synthesis (iTTS), the synthesizer produces...
research
03/18/2020

TTTTTackling WinoGrande Schemas

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrand...
research
08/22/2020

UTMN at SemEval-2020 Task 11: A Kitchen Solution to Automatic Propaganda Detection

The article describes a fast solution to propaganda detection at SemEval...

Please sign up or login with your details

Forgot password? Click here to reset