Mutual Query Network for Multi-Modal Product Image Segmentation

06/26/2023
by   Yun Guo, et al.
0

Product image segmentation is vital in e-commerce. Most existing methods extract the product image foreground only based on the visual modality, making it difficult to distinguish irrelevant products. As product titles contain abundant appearance information and provide complementary cues for product image segmentation, we propose a mutual query network to segment products based on both visual and linguistic modalities. First, we design a language query vision module to obtain the response of language description in image areas, thus aligning the visual and linguistic representations across modalities. Then, a vision query language module utilizes the correlation between visual and linguistic modalities to filter the product title and effectively suppress the content irrelevant to the vision in the title. To promote the research in this field, we also construct a Multi-Modal Product Segmentation dataset (MMPS), which contains 30,000 images and corresponding titles. The proposed method significantly outperforms the state-of-the-art methods on MMPS.

READ FULL TEXT

page 1

page 3

page 5

page 6

research
04/21/2021

Comprehensive Multi-Modal Interactions for Referring Image Segmentation

We investigate Referring Image Segmentation (RIS), which outputs a segme...
research
12/27/2022

Position-Aware Contrastive Alignment for Referring Image Segmentation

Referring image segmentation aims to segment the target object described...
research
06/16/2021

CMF: Cascaded Multi-model Fusion for Referring Image Segmentation

In this work, we address the task of referring image segmentation (RIS),...
research
03/28/2020

BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions

We present BiLingUNet, a state-of-the-art model for image segmentation u...
research
03/30/2022

Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation

Referring video segmentation aims to segment the corresponding video obj...
research
10/01/2020

Referring Image Segmentation via Cross-Modal Progressive Comprehension

Referring image segmentation aims at segmenting the foreground masks of ...
research
03/11/2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

Referring image segmentation segments an image from a language expressio...

Please sign up or login with your details

Forgot password? Click here to reset