AutoScaler: Scale-Attention Networks for Visual Correspondence

by   Shenlong Wang, et al.

Finding visual correspondence between local features is key to many computer vision problems. While defining features with larger contextual scales usually implies greater discriminativeness, it could also lead to less spatial accuracy of the features. We propose AutoScaler, a scale-attention network to explicitly optimize this trade-off in visual correspondence tasks. Our network consists of a weight-sharing feature network to compute multi-scale feature maps and an attention network to combine them optimally in the scale space. This allows our network to have adaptive receptive field sizes over different scales of the input. The entire network is trained end-to-end in a siamese framework for visual correspondence tasks. Our method achieves favorable results compared to state-of-the-art methods on challenging optical flow and semantic matching benchmarks, including Sintel, KITTI and CUB-2011. We also show that our method can generalize to improve hand-crafted descriptors (e.g Daisy) on general visual correspondence tasks. Finally, our attention network can generate visually interpretable scale attention maps.



There are no comments yet.


page 4

page 5

page 6

page 8


Multi-scale Matching Networks for Semantic Correspondence

Deep features have been proven powerful in building accurate dense seman...

Deep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning

As an important and challenging problem in computer vision, learning bas...

Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification

Local features at neighboring spatial positions in feature maps have hig...

BARNet: Bilinear Attention Network with Adaptive Receptive Field for Surgical Instrument Segmentation

Surgical instrument segmentation is extremely important for computer-ass...

SAFE: Scale Aware Feature Encoder for Scene Text Recognition

In this paper, we address the problem of having characters with differen...

Progressive Sparse Local Attention for Video object detection

Transferring image-based object detectors to domain of videos remains a ...

Dynamic Context Correspondence Network for Semantic Alignment

Establishing semantic correspondence is a core problem in computer visio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.