Maps Search Misspelling Detection Leveraging Domain-Augmented Contextual Representations

08/15/2021
by   Yutong Li, et al.
0

Building an independent misspelling detector and serve it before correction can bring multiple benefits to speller and other search components, which is particularly true for the most commonly deployed noisy-channel based speller systems. With rapid development of deep learning and substantial advancement in contextual representation learning such as BERTology, building a decent misspelling detector without having to rely on hand-crafted features associated with noisy-channel architecture becomes more-than-ever accessible. However BERTolgy models are trained with natural language corpus but Maps Search is highly domain specific, would BERTology continue its success. In this paper we design 4 stages of models for misspeling detection ranging from the most basic LSTM to single-domain augmented fine-tuned BERT. We found for Maps Search in our case, other advanced BERTology family model such as RoBERTa does not necessarily outperform BERT, and a classic cross-domain fine-tuned full BERT even underperforms a smaller single-domain fine-tuned BERT. We share more findings through comprehensive modeling experiments and analysis, we also briefly cover the data generation algorithm breakthrough.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

Investigating Cross-Domain Behaviors of BERT in Review Understanding

Review score prediction requires review text understanding, a critical r...
research
08/28/2022

Cross-domain Cross-architecture Black-box Attacks on Fine-tuned Models with Transferred Evolutionary Strategies

Fine-tuning can be vulnerable to adversarial attacks. Existing works abo...
research
12/07/2022

A Study on Extracting Named Entities from Fine-tuned vs. Differentially Private Fine-tuned BERT Models

Privacy preserving deep learning is an emerging field in machine learnin...
research
01/01/2022

Cross-Domain Deep Code Search with Few-Shot Meta Learning

Recently, pre-trained programming language models such as CodeBERT have ...
research
04/08/2020

Improving BERT with Self-Supervised Attention

One of the most popular paradigms of applying large, pre-trained NLP mod...
research
07/31/2023

Classifying multilingual party manifestos: Domain transfer across country, time, and genre

Annotating costs of large corpora are still one of the main bottlenecks ...
research
07/08/2022

ABB-BERT: A BERT model for disambiguating abbreviations and contractions

Abbreviations and contractions are commonly found in text across differe...

Please sign up or login with your details

Forgot password? Click here to reset