Evaluation Of Word Embeddings From Large-Scale French Web Content

05/05/2021
by   Hadi Abdine, et al.
0

Distributed word representations are popularly used in many tasks in natural language processing, adding that pre-trained word vectors on huge text corpus achieved high performance in many different NLP tasks. This paper introduces multiple high quality word vectors for the French language where two of them are trained on huge crawled French data and the others are trained on an already existing French corpus. We also evaluate the quality of our proposed word vectors and the existing French word vectors on the French word analogy task. In addition, we do the evaluation on multiple real NLP tasks that show the important performance enhancement of the pre-trained word vectors compared to the existing and random ones. Finally, we created a demo web application to test and visualize the obtained word embeddings. The produced French word embeddings are available to the public, along with the fine-tuning code on the NLU tasks and the demo code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

Development of Word Embeddings for Uzbek Language

In this paper, we share the process of developing word embeddings for th...
research
10/08/2018

Word Embeddings from Large-Scale Greek Web content

Word embeddings are undoubtedly very useful components in many NLP tasks...
research
02/13/2020

Comparison of Turkish Word Representations Trained on Different Morphological Forms

Increased popularity of different text representations has also brought ...
research
01/10/2020

MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding

Word embeddings have undoubtedly revolutionized NLP. However, pre-traine...
research
11/03/2017

Compressing Word Embeddings via Deep Compositional Code Learning

Natural language processing (NLP) models often require a massive number ...
research
10/28/2016

Word Embeddings for the Construction Domain

We introduce word vectors for the construction domain. Our vectors were ...
research
10/29/2017

Personalized word representations Carrying Personalized Semantics Learned from Social Network Posts

Distributed word representations have been shown to be very useful in va...

Please sign up or login with your details

Forgot password? Click here to reset