Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

10/12/2022
by   Petter Mæhlum, et al.
0

Norwegian Twitter data poses an interesting challenge for Natural Language Processing (NLP) tasks. These texts are difficult for models trained on standardized text in one of the two Norwegian written forms (Bokmål and Nynorsk), as they contain both the typical variation of social media text, as well as a large amount of dialectal variety. In this paper we present a novel Norwegian Twitter dataset annotated with POS-tags. We show that models trained on Universal Dependency (UD) data perform worse when evaluated against this dataset, and that models trained on Bokmål generally perform better than those trained on Nynorsk. We also see that performance on dialectal tweets is comparable to the written standards for some models. Finally we perform a detailed analysis of the errors that models commonly make on this data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2020

TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis

Twitter is a well-known microblogging social site where users express th...
research
05/30/2018

A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection

Social media platforms like twitter and facebook have be- come two of th...
research
04/11/2021

NorDial: A Preliminary Corpus of Written Norwegian Dialect Use

Norway has a large amount of dialectal variation, as well as a general t...
research
12/11/2016

Flu Detector: Estimating influenza-like illness rates from online user-generated content

We provide a brief technical description of an online platform for disea...
research
10/28/2022

"It's Not Just Hate”: A Multi-Dimensional Perspective on Detecting Harmful Speech Online

Well-annotated data is a prerequisite for good Natural Language Processi...
research
03/14/2023

Geolocation Predicting of Tweets Using BERT-Based Models

This research is aimed to solve the tweet/user geolocation prediction ta...
research
06/13/2020

Through the Twitter Glass: Detecting Questions in Micro-Text

In a separate study, we were interested in understanding people's Q A ...

Please sign up or login with your details

Forgot password? Click here to reset