To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging

07/17/2017
by   Rob van der Goot, et al.
0

Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data? To the best of our knowledge, little is known on the actual impact of normalization in a real-world scenario, where gold error detection is not available. We investigate the effect of automatic normalization on POS tagging of tweets. We also compare normalization to strategies that leverage large amounts of unlabeled data kept in its raw form. Our results show that normalization helps, but does not add consistently beyond just word embedding layer initialization. The latter approach yields a tagging model that is competitive with a Twitter state-of-the-art tagger.

READ FULL TEXT

page 6

page 7

research
11/09/2016

When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twi...
research
10/26/2022

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

Features such as punctuation, capitalization, and formatting of entities...
research
03/31/2021

Joint Khmer Word Segmentation and Part-of-Speech Tagging Using Deep Learning

Khmer text is written from left to right with optional space. Space is n...
research
06/01/2020

Lexical Normalization for Code-switched Data and its Effect on POS-tagging

Social media provides an unfiltered stream of user-generated input, lead...
research
11/06/2019

Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form

Word embedding algorithms produce very reliable feature representations ...
research
10/09/2017

Does Normalization Methods Play a Role for Hyperspectral Image Classification?

For Hyperspectral image (HSI) datasets, each class have their salient fe...
research
09/07/2022

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Converting written texts into their spoken forms is an essential problem...

Please sign up or login with your details

Forgot password? Click here to reset