Contextual Multilingual Spellchecker for User Queries

05/01/2023
by   Sanat Sharma, et al.
0

Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2022

Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search

Improving the quality of search results can significantly enhance users ...
research
08/15/2023

End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

Conventional keyword search systems operate on automatic speech recognit...
research
08/30/2019

Learning to Ask: Question-based Sequential Bayesian Product Search

Product search is generally recognized as the first and foremost stage o...
research
08/05/2022

A Semantic Alignment System for Multilingual Query-Product Retrieval

This paper mainly describes our winning solution (team name: www) to Ama...
research
08/03/2023

Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces

Typographical errors are a major source of frustration for visitors of o...
research
11/14/2022

Learning to Answer Multilingual and Code-Mixed Questions

Question-answering (QA) that comes naturally to humans is a critical com...
research
04/28/2023

Training and Evaluation of a Multilingual Tokenizer for GPT-SW3

This paper provides a detailed discussion of the multilingual tokenizer ...

Please sign up or login with your details

Forgot password? Click here to reset