Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

by   Junyan Wang, et al.

Video summarization aims to select representative frames to retain high-level information, which is usually solved by predicting the segment-wise importance score via a softmax function. However, softmax function suffers in retaining high-rank representations for complex visual or sequential information, which is known as the Softmax Bottleneck problem. In this paper, we propose a novel framework named Dual Mixture Attention (DMASum) model with Meta Learning for video summarization that tackles the softmax bottleneck problem, where the Mixture of Attention layer (MoA) effectively increases the model capacity by employing twice self-query attention that can capture the second-order changes in addition to the initial query-key attention, and a novel Single Frame Meta Learning rule is then introduced to achieve more generalization to small datasets with limited training sources. Furthermore, the DMASum significantly exploits both visual and sequential attention that connects local key-frame and global attention in an accumulative way. We adopt the new evaluation protocol on two public datasets, SumMe, and TVSum. Both qualitative and quantitative experiments manifest significant improvements over the state-of-the-art methods.



There are no comments yet.


page 8


Meta Learning for Task-Driven Video Summarization

Existing video summarization approaches mainly concentrate on sequential...

Sigsoftmax: Reanalysis of the Softmax Bottleneck

Softmax is an output activation function for modeling categorical probab...

Query-adaptive Video Summarization via Quality-aware Relevance Estimation

Although the problem of automatic video summarization has recently recei...

Use of Affective Visual Information for Summarization of Human-Centric Videos

Increasing volume of user-generated human-centric video content and thei...

Exploring Global Diversity and Local Context for Video Summarization

Video summarization aims to automatically generate a diverse and concise...

Meta Learning Deep Visual Words for Fast Video Object Segmentation

Meta learning has attracted a lot of attention recently. In this paper, ...

Breaking the Softmax Bottleneck for Sequential Recommender Systems with Dropout and Decoupling

The Softmax bottleneck was first identified in language modeling as a th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.