MFNet: Multi-class Few-shot Segmentation Network with Pixel-wise Metric Learning
In visual recognition tasks, few-shot learning requires the ability to learn object categories with few support examples. Its recent resurgence in light of the deep learning development is mainly in image classification. This work focuses on few-shot semantic segmentation, which is still a largely unexplored field. A few recent advances are often restricted to single-class few-shot segmentation. In this paper, we first present a novel multi-way encoding and decoding architecture which effectively fuses multi-scale query information and multi-class support information into one query-support embedding; multi-class segmentation is directly decoded upon this embedding. In order for better feature fusion, a multi-level attention mechanism is proposed within the architecture, which includes the attention for support feature modulation and attention for multi-scale combination. Last, to enhance the embedding space learning, an additional pixel-wise metric learning module is devised with triplet loss formulated on the pixel-level embedding of the input image. Extensive experiments on standard benchmarks PASCAL-5^i and COCO-20^i show clear benefits of our method over the state of the art in few-shot segmentation.
READ FULL TEXT