Exploring Swedish English fastText Embeddings with the Transformer

07/23/2020
by   Tosin P. Adewumi, et al.
0

In this paper, our main contributions are that embeddings from relatively smaller corpora can outperform ones from far larger corpora and we present the new Swedish analogy test set. To achieve a good network performance in natural language processing (NLP) downstream tasks, several factors play important roles: dataset size, the right hyper-parameters, and well-trained embedding. We show that, with the right set of hyper-parameters, good network performance can be reached even on smaller datasets. We evaluate the embeddings at the intrinsic level and extrinsic level, by deploying them on the Transformer in named entity recognition (NER) task and conduct significance tests.This is done for both Swedish and English. We obtain better performance in both languages on the downstream task with far smaller training data, compared to recently released, common crawl versions and character n-grams appear useful for Swedish, a morphologically rich language.

READ FULL TEXT
research
03/23/2020

Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks

Word2Vec is a prominent tool for Natural Language Processing (NLP) tasks...
research
11/06/2020

Corpora Compared: The Case of the Swedish Gigaword Wikipedia Corpora

In this work, we show that the difference in performance of embeddings f...
research
01/12/2023

Adversarial Adaptation for French Named Entity Recognition

Named Entity Recognition (NER) is the task of identifying and classifyin...
research
11/15/2020

The Challenge of Diacritics in Yoruba Embeddings

The major contributions of this work include the empirical establishment...
research
12/19/2022

Do CoNLL-2003 Named Entity Taggers Still Work Well in 2023?

Named Entity Recognition (NER) is an important and well-studied task in ...
research
08/22/2018

Neural Named Entity Recognition from Subword Units

Named entity recognition (NER) is a vital task in language technology. E...
research
08/26/2022

Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings

Song embeddings are a key component of most music recommendation engines...

Please sign up or login with your details

Forgot password? Click here to reset