TDAF: Top-Down Attention Framework for Vision Tasks

12/14/2020
by   Bo Pang, et al.
14

Human attention mechanisms often work in a top-down manner, yet it is not well explored in vision research. Here, we propose the Top-Down Attention Framework (TDAF) to capture top-down attentions, which can be easily adopted in most existing models. The designed Recursive Dual-Directional Nested Structure in it forms two sets of orthogonal paths, recursive and structural ones, where bottom-up spatial features and top-down attention features are extracted respectively. Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner. Empirical evidence shows that our TDAF can capture effective stratified attention information and boost performance. ResNet with TDAF achieves 2.0 improvements on ImageNet. For object detection, the performance is improved by 2.7 for action recognition, the 3D-ResNet adopting TDAF achieves improvements of 1.7

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset