Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces

08/03/2023
by   Dayananda Ubrangala, et al.
0

Typographical errors are a major source of frustration for visitors of online marketplaces. Because of the domain-specific nature of these marketplaces and the very short queries users tend to search for, traditional spell cheking solutions do not perform well in correcting typos. We present a data augmentation method to address the lack of annotated typo data and train a recurrent neural network to learn context-limited domain-specific embeddings. Those embeddings are deployed in a real-time inferencing API for the Microsoft AppSource marketplace to find the closest match between a misspelled user query and the available product names. Our data efficient solution shows that controlled high quality synthetic data may be a powerful tool especially considering the current climate of large language models which rely on prohibitively huge and often uncontrolled datasets.

READ FULL TEXT
research
05/12/2023

Dr. LLaMA: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation

Large Language Models (LLMs) have made significant strides in natural la...
research
10/13/2021

Teaching Models new APIs: Domain-Agnostic Simulators for Task Oriented Dialogue

We demonstrate that large language models are able to simulate Task Orie...
research
04/02/2023

A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

Low-quality data can cause downstream problems in high-stakes applicatio...
research
04/11/2022

Towards Generalizable Semantic Product Search by Text Similarity Pre-training on Search Click Logs

Recently, semantic search has been successfully applied to e-commerce pr...
research
04/28/2023

Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain

We propose a novel approach to learn domain-specific plausible materials...
research
05/01/2023

Contextual Multilingual Spellchecker for User Queries

Spellchecking is one of the most fundamental and widely used search feat...
research
10/04/2018

A Query Tool for Efficiently Investigating Risky Software Behaviors

Advanced Persistent Threat (APT) attacks are sophisticated and stealthy,...

Please sign up or login with your details

Forgot password? Click here to reset