Domain adaptation for part-of-speech tagging of noisy user-generated text

05/21/2019
by   Luisa März, et al.
0

The performance of a Part-of-speech (POS) tagger is highly dependent on the domain ofthe processed text, and for many domains there is no or only very little training data available. This work addresses the problem of POS tagging noisy user-generated text using a neural network. We propose an architecture that trains an out-of-domain model on a large newswire corpus, and transfers those weights by using them as a prior for a model trained on the target domain (a data-set of German Tweets) for which there is very little an-notations available. The neural network has two standard bidirectional LSTMs at its core. However, we find it crucial to also encode a set of task-specific features, and to obtain reliable (source-domain and target-domain) word representations. Experiments with different regularization techniques such as early stopping, dropout and fine-tuning the domain adaptation prior weights are conducted. Our best model uses external weights from the out-of-domain model, as well as feature embeddings, pre-trained word and sub-word embeddings and achieves a tagging accuracy of slightly over 90 art for this task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine Translation

Neural machine translation (NMT) models do not work well in domains diff...
research
04/04/2019

Unsupervised Domain Adaptation of Contextualized Embeddings: A Case Study in Early Modern English

Contextualized word embeddings such as ELMo and BERT provide a foundatio...
research
04/07/2019

Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging

Fine-tuning neural networks is widely used to transfer valuable knowledg...
research
06/09/2021

Neural Supervised Domain Adaptation by Augmenting Pre-trained Models with Random Units

Neural Transfer Learning (TL) is becoming ubiquitous in Natural Language...
research
05/15/2022

Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification

Complex word identification (CWI) is a cornerstone process towards prope...
research
07/14/2023

Unsupervised Domain Adaptation using Lexical Transformations and Label Injection for Twitter Data

Domain adaptation is an important and widely studied problem in natural ...
research
10/31/2014

Rapid Adaptation of POS Tagging for Domain Specific Uses

Part-of-speech (POS) tagging is a fundamental component for performing n...

Please sign up or login with your details

Forgot password? Click here to reset