Infomaxformer: Maximum Entropy Transformer for Long Time-Series Forecasting Problem

01/04/2023
by   Peiwang Tang, et al.
0

The Transformer architecture yields state-of-the-art results in many tasks such as natural language processing (NLP) and computer vision (CV), since the ability to efficiently capture the precise long-range dependency coupling between input sequences. With this advanced capability, however, the quadratic time complexity and high memory usage prevents the Transformer from dealing with long time-series forecasting problem (LTFP). To address these difficulties: (i) we revisit the learned attention patterns of the vanilla self-attention, redesigned the calculation method of self-attention based the Maximum Entropy Principle. (ii) we propose a new method to sparse the self-attention, which can prevent the loss of more important self-attention scores due to random sampling.(iii) We propose Keys/Values Distilling method motivated that a large amount of feature in the original self-attention map is redundant, which can further reduce the time and spatial complexity and make it possible to input longer time-series. Finally, we propose a method that combines the encoder-decoder architecture with seasonal-trend decomposition, i.e., using the encoder-decoder architecture to capture more specific seasonal parts. A large number of experiments on several large-scale datasets show that our Infomaxformer is obviously superior to the existing methods. We expect this to open up a new solution for Transformer to solve LTFP, and exploring the ability of the Transformer architecture to capture much longer temporal dependencies.

READ FULL TEXT
research
12/14/2020

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Many real-world applications require the prediction of long sequence tim...
research
05/21/2020

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

End-to-end speech recognition has become popular in recent years, since ...
research
10/02/2022

Grouped self-attention mechanism for a memory-efficient Transformer

Time-series data analysis is important because numerous real-world tasks...
research
06/29/2019

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

Time series forecasting is an important problem across many domains, inc...
research
07/19/2021

Action Forecasting with Feature-wise Self-Attention

We present a new architecture for human action forecasting from videos. ...
research
07/06/2022

Astroconformer: Inferring Surface Gravity of Stars from Stellar Light Curves with Transformer

We introduce Astroconformer, a Transformer-based model to analyze stella...
research
10/24/2020

Blind Deinterleaving of Signals in Time Series with Self-attention Based Soft Min-cost Flow Learning

We propose an end-to-end learning approach to address deinterleaving of ...

Please sign up or login with your details

Forgot password? Click here to reset