Fine-grained Visual Categorization using PAIRS: Pose and Appearance Integration for Recognizing Subcategories

01/27/2018 ∙ by Pei Guo, et al. ∙ 0

In Fine-grained Visual Categorization (FGVC), the differences between similar categories are often highly localized to a small number of object parts, and significant pose variation therefore constitutes a great challenge for identification. To address this, we propose extracting image patches using pairs of predicted keypoint locations as anchor points. The benefits of this approach are two-fold: (1) it achieves explicit top-down visual attention on object parts, and (2) the extracted patches are pose-aligned and thus contain stable appearance features. We employ the popular Stacked Hourglass Network to predict keypoint locations, reporting state-of-the-art keypoint localization results on the challenging CUB-200-2011 dataset. Anchored by these predicted keypoints, an overcomplete basis of pose-aligned patches is extracted and a specialized appearance classification network is trained for each patch. An aggregating network is then applied to combine the patch networks' individual predictions, producing a final classification score. Our PAIRS algorithm attains an accuracy of 88.6 state-of-the-art. Enhancing the base PAIRS model with single-keypoint patches produces a further improvement, yielding a new state-of-the-art accuracy of 89.2 pose and appearance features.



There are no comments yet.


page 1

page 2

page 3

page 4

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.