Who wrote this book? A challenge for e-commerce

04/19/2019
by   Beranger Dumont, et al.
0

Modern e-commerce catalogs contain millions of references, associated with textual and visual information that is of paramount importance for the products to be found via search or browsing. Of particular significance is the book category, where the author name(s) field poses a significant challenge. Indeed, books written by a given author (such as F. Scott Fitzgerald) might be listed with different authors' names in a catalog due to abbreviations and spelling variants and mistakes, among others. To solve this problem at scale, we design a composite system involving open data sources for books as well as machine learning components leveraging deep learning-based techniques for natural language processing. In particular, we use Siamese neural networks for an approximate match with known author names, and direct correction of the provided author's name using sequence-to-sequence learning with neural networks. We evaluate this approach on product data from the e-commerce website Rakuten France, and find that the top proposal of the system is the normalized author name with 72

READ FULL TEXT
research
04/03/2023

Every Author as First Author

We propose a new standard for writing author names on papers and in bibl...
research
07/09/2021

Bib2Auth: Deep Learning Approach for Author Disambiguation using Bibliographic Data

Author name ambiguity remains a critical open problem in digital librari...
research
11/01/2022

A Bayesian Learning, Greedy agglomerative clustering approach and evaluation techniques for Author Name Disambiguation Problem

Author names often suffer from ambiguity owing to the same author appear...
research
11/19/2014

Efficient Media Retrieval from Non-Cooperative Queries

Text is ubiquitous in the artificial world and easily attainable when it...
research
06/27/2018

Evaluating author name disambiguation for digital libraries: A case of DBLP

Author name ambiguity in a digital library may affect the findings of re...
research
12/18/2006

Statistical mechanics of neocortical interactions: Portfolio of Physiological Indicators

There are several kinds of non-invasive imaging methods that are used to...
research
10/22/2019

Toward Automated Website Classification by Deep Learning

In recent years, the interest in Big Data sources has been steadily grow...

Please sign up or login with your details

Forgot password? Click here to reset