Bootstrap Latent Representations for Multi-modal Recommendation

07/13/2022
by   Xin Zhou, et al.
1

This paper studies the multi-modal recommendation problem, where the item multi-modality information (eg. images and textual descriptions) is exploited to improve the recommendation accuracy. Besides the user-item interaction graph, existing state-of-the-art methods usually use auxiliary graphs (eg. user-user or item-item relation graph) to augment the learned representations of users and/or items. These representations are often propagated and aggregated on auxiliary graphs using graph convolutional networks, which can be prohibitively expensive in computation and memory, especially for large graphs. Moreover, existing multi-modal recommendation methods usually leverage randomly sampled negative examples in Bayesian Personalized Ranking (BPR) loss to guide the learning of user/item representations, which increases the computational cost on large graphs and may also bring noisy supervision signals into the training process. To tackle the above issues, we propose a novel self-supervised multi-modal recommendation model, dubbed BM3, which requires neither augmentations from auxiliary graphs nor negative samples. Specifically, BM3 first bootstraps latent contrastive views from the representations of users and items with a simple dropout augmentation. It then jointly optimizes three multi-modal objectives to learn the representations of users and items by reconstructing the user-item interaction graph and aligning modality features under both inter- and intra-modality perspectives. BM3 alleviates both the need for contrasting with negative examples and the complex graph augmentation from an additional target network for contrastive view generation. We show BM3 outperforms prior recommendation models on three datasets with number of nodes ranging from 20K to 200K, while achieving a 2-9X reduction in training time. Our code is available at https://github.com/enoche/BM3.

READ FULL TEXT

page 1

page 10

page 11

research
02/21/2023

Multi-Modal Self-Supervised Learning for Recommendation

The online emergence of multi-modal sharing platforms (eg, TikTok, Youtu...
research
01/28/2023

Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation

User interaction data in recommender systems is a form of dyadic relatio...
research
06/01/2022

CrossCBR: Cross-view Contrastive Learning for Bundle Recommendation

Bundle recommendation aims to recommend a bundle of related items to use...
research
10/13/2022

Multi-Modal Recommendation System with Auxiliary Information

Context-aware recommendation systems improve upon classical recommender ...
research
12/16/2021

Graph Augmentation-Free Contrastive Learning for Recommendation

Contrastive learning (CL) recently has received considerable attention i...
research
06/01/2023

A Multi-Modal Latent-Features based Service Recommendation System for the Social Internet of Things

The Social Internet of Things (SIoT), is revolutionizing how we interact...
research
11/13/2022

A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation

Multimodal recommender systems utilizing multimodal features (e.g. image...

Please sign up or login with your details

Forgot password? Click here to reset