Cross-view Semantic Alignment for Livestreaming Product Recognition

08/09/2023
by   Wenjie Yang, et al.
0

Live commerce is the act of selling products online through live streaming. The customer's diverse demands for online products introduce more challenges to Livestreaming Product Recognition. Previous works have primarily focused on fashion clothing data or utilize single-modal input, which does not reflect the real-world scenario where multimodal data from various categories are present. In this paper, we present LPR4M, a large-scale multimodal dataset that covers 34 categories, comprises 3 modalities (image, video, and text), and is 50? larger than the largest publicly available dataset. LPR4M contains diverse videos and noise modality pairs while exhibiting a long-tailed distribution, resembling real-world problems. Moreover, a cRoss-vIew semantiC alignmEnt (RICE) model is proposed to learn discriminative instance features from the image and video views of the products. This is achieved through instance-level contrastive learning and cross-view patch-level feature propagation. A novel Patch Feature Reconstruction loss is proposed to penalize the semantic misalignment between cross-view patches. Extensive experiments demonstrate the effectiveness of RICE and provide insights into the importance of dataset diversity and expressivity. The dataset and code are available at https://github.com/adxcreative/RICE

READ FULL TEXT

page 4

page 5

page 8

research
09/09/2021

M5Product: A Multi-modal Pretraining Benchmark for E-commercial Product Downstream Tasks

In this paper, we aim to advance the research of multi-modal pre-trainin...
research
08/22/2023

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

We are concerned with a challenging scenario in unpaired multiview video...
research
08/10/2023

Cross-Domain Product Representation Learning for Rich-Content E-Commerce

The proliferation of short video and live-streaming platforms has revolu...
research
12/22/2022

Multi-queue Momentum Contrast for Microvideo-Product Retrieval

The booming development and huge market of micro-videos bring new e-comm...
research
07/30/2021

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

Nowadays, customer's demands for E-commerce are more diversified, which ...
research
06/26/2020

ProVe – Self-supervised pipeline for automated product replacement and cold-starting based on neural language models

In retail vertical industries, businesses are dealing with human limitat...
research
08/09/2023

Learning multi-domain feature relation for visible and Long-wave Infrared image patch matching

Recently, learning-based algorithms have achieved promising performance ...

Please sign up or login with your details

Forgot password? Click here to reset