Cross-Domain Evaluation of POS Taggers: From Wall Street Journal to Fandom Wiki

04/27/2023
by   Kia Kirstein Hansen, et al.
0

The Wall Street Journal section of the Penn Treebank has been the de-facto standard for evaluating POS taggers for a long time, and accuracies over 97% have been reported. However, less is known about out-of-domain tagger performance, especially with fine-grained label sets. Using data from Elder Scrolls Fandom, a wiki about the Elder Scrolls video game universe, we create a modest dataset for qualitatively evaluating the cross-domain performance of two POS taggers: the Stanford tagger (Toutanova et al. 2003) and Bilty (Plank et al. 2016), both trained on WSJ. Our analyses show that performance on tokens seen during training is almost as good as in-domain performance, but accuracy on unknown tokens decreases from 90.37 (Stanford) and 87.84% to 80.41% (Bilty) across domains. Both taggers struggle with proper nouns and inconsistent capitalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2019

Beyond The Wall Street Journal: Anchoring and Comparing Discourse Signals across Genres

Recent research on discourse relations has found that they are cued not ...
research
05/28/2019

A Cross-Domain Transferable Neural Coherence Model

Coherence is an important aspect of text quality and is crucial for ensu...
research
10/25/2022

Evaluating Parameter Efficient Learning for Generation

Parameter efficient learning methods (PERMs) have recently gained signif...
research
05/16/2018

Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples

We revisit domain adaptation for parsers in the neural era. First we sho...
research
08/03/2017

Sensor Transformation Attention Networks

Recent work on encoder-decoder models for sequence-to-sequence mapping h...
research
03/19/2018

Acoustic feature learning cross-domain articulatory measurements

Previous work has shown that it is possible to improve speech recognitio...
research
03/19/2018

Acoustic feature learning using cross-domain articulatory measurements

Previous work has shown that it is possible to improve speech recognitio...

Please sign up or login with your details

Forgot password? Click here to reset