Evaluation Of Word Embeddings From Large-Scale French Web Content

05/05/2021
by   Hadi Abdine, et al.
0

Distributed word representations are popularly used in many tasks in natural language processing, adding that pre-trained word vectors on huge text corpus achieved high performance in many different NLP tasks. This paper introduces multiple high quality word vectors for the French language where two of them are trained on huge crawled French data and the others are trained on an already existing French corpus. We also evaluate the quality of our proposed word vectors and the existing French word vectors on the French word analogy task. In addition, we do the evaluation on multiple real NLP tasks that show the important performance enhancement of the pre-trained word vectors compared to the existing and random ones. Finally, we created a demo web application to test and visualize the obtained word embeddings. The produced French word embeddings are available to the public, along with the fine-tuning code on the NLU tasks and the demo code.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/30/2020

Development of Word Embeddings for Uzbek Language

In this paper, we share the process of developing word embeddings for th...
10/08/2018

Word Embeddings from Large-Scale Greek Web content

Word embeddings are undoubtedly very useful components in many NLP tasks...
02/13/2020

Comparison of Turkish Word Representations Trained on Different Morphological Forms

Increased popularity of different text representations has also brought ...
01/10/2020

MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding

Word embeddings have undoubtedly revolutionized NLP. However, pre-traine...
07/11/2021

Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF

Over the last few years, neural network derived word embeddings became p...
11/03/2017

Compressing Word Embeddings via Deep Compositional Code Learning

Natural language processing (NLP) models often require a massive number ...
10/28/2016

Word Embeddings for the Construction Domain

We introduce word vectors for the construction domain. Our vectors were ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.