Improving Text Normalization by Optimizing Nearest Neighbor Matching

12/27/2017
by   Salman Ahmad Ansari, et al.
0

Text normalization is an essential task in the processing and analysis of social media that is dominated with informal writing. It aims to map informal words to their intended standard forms. Previously proposed text normalization approaches typically require manual selection of parameters for improved performance. In this paper, we present an automatic optimizationbased nearest neighbor matching approach for text normalization. This approach is motivated by the observation that text normalization is essentially a matching problem and nearest neighbor matching with an adaptive similarity function is the most direct procedure for it. Our similarity function incorporates weighted contributions of contextual, string, and phonetic similarity, and the nearest neighbor matching involves a minimum similarity threshold. These four parameters are tuned efficiently using grid search. We evaluate the performance of our approach on two benchmark datasets. The results demonstrate that parameter tuning on small sized labeled datasets produce state-of-the-art text normalization performances. Thus, this approach allows practically easy construction of evolving domain-specific normalization lexicons

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2020

SimPatch: A Nearest Neighbor Similarity Match between Image Patches

Measuring the similarity between patches in images is a fundamental buil...
research
05/05/2021

Uncertain Neighbors: Bayesian Propensity Score Matching For Causal Inference

We compare the performance of standard nearest-neighbor propensity score...
research
04/03/2020

Nearest neighbor representations of Boolean functions

A nearest neighbor representation of a Boolean function is a set of posi...
research
02/17/2023

Like a Good Nearest Neighbor: Practical Content Moderation with Sentence Transformers

Modern text classification systems have impressive capabilities but are ...
research
10/11/2016

Visual Place Recognition with Probabilistic Vertex Voting

We propose a novel scoring concept for visual place recognition based on...
research
09/24/2019

Situating Sentence Embedders with Nearest Neighbor Overlap

As distributed approaches to natural language semantics have developed a...

Please sign up or login with your details

Forgot password? Click here to reset