Weakly-supervised Discriminative Patch Learning via CNN for Fine-grained Recognition
Research on fine-grained recognition has recently shifted from multistage frameworks to convolutional neural networks (CNN) that are trained end-to-end. Many previous end-to-end deep approaches typically consist of a recognition network and an auxiliary localization network trained with additional part annotations to detect semantic parts shared across classes. To avoid the cost of extra semantic part annotations, we learn class-specific discriminative patches within the CNN framework. We achieve this by designing a novel asymmetric two-stream network architecture with supervision on convolutional filters and a non-random way of layer initialization. Experimental results show that our approach is able to find high-quality discriminative patches and achieves state-of-the-art on two publicly available fine-grained recognition datasets.
READ FULL TEXT