An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

07/19/2016
by   Jey Han Lau, et al.
0

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large external corpora, and can be further improved by using pre-trained word embeddings. We also provide recommendations on hyper-parameter settings for general purpose applications, and release source code to induce document embeddings using our trained doc2vec models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2020

Hybrid Improved Document-level Embedding (HIDE)

In recent times, word embeddings are taking a significant role in sentim...
research
05/24/2021

One4all User Representation for Recommender Systems in E-commerce

General-purpose representation learning through large-scale pre-training...
research
09/01/2021

Position Masking for Improved Layout-Aware Document Understanding

Natural language processing for document scans and PDFs has the potentia...
research
08/14/2017

Improved Answer Selection with Pre-Trained Word Embeddings

This paper evaluates existing and newly proposed answer selection method...
research
06/20/2019

Hierarchical Document Encoder for Parallel Corpus Mining

We explore using multilingual document embeddings for nearest neighbor m...
research
04/01/2019

Unsupervised Abbreviation Disambiguation Contextual disambiguation using word embeddings

As abbreviations often have several distinct meanings, disambiguating th...
research
06/11/2021

Generalized Moving Peaks Benchmark

This document describes the Generalized Moving Peaks Benchmark (GMPB) th...

Please sign up or login with your details

Forgot password? Click here to reset