Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

11/29/2016
by   Tom Zahavy, et al.
0

Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce. The high traffic of new products uploaded daily and the dynamic nature of the categories raise the need for machine learning models that can reduce the cost and time of human editors. In this paper, we propose a decision level fusion approach for multi-modal product classification using text and image inputs. We train input specific state-of-the-art deep neural networks for each input source, show the potential of forging them together into a multi-modal architecture and train a novel policy network that learns to choose between them. Finally, we demonstrate that our multi-modal network improves the top-1 accuracy large-scale product classification dataset that we collected fromWalmart.com. While we focus on image-text fusion that characterizes e-commerce domains, our algorithms can be easily applied to other modalities such as audio, video, physical sensors, etc.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2020

A Multimodal Late Fusion Model for E-Commerce Product Classification

The cataloging of product listings is a fundamental problem for most e-c...
research
02/09/2021

Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce

Nowadays, live-stream and short video shopping in E-commerce have grown ...
research
12/22/2022

Multi-queue Momentum Contrast for Microvideo-Product Retrieval

The booming development and huge market of micro-videos bring new e-comm...
research
12/20/2017

An Order Preserving Bilinear Model for Person Detection in Multi-Modal Data

We propose a new order preserving bilinear framework that exploits low-r...
research
06/30/2019

Multi-Label Product Categorization Using Multi-Modal Fusion Models

In this study, we investigated multi-modal approaches using images, desc...
research
02/06/2018

Efficient Large-Scale Multi-Modal Classification

While the incipient internet was largely text-based, the modern digital ...
research
07/15/2022

Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

With the prosperity of e-commerce industry, various modalities, e.g., vi...

Please sign up or login with your details

Forgot password? Click here to reset