hyperdoc2vec: Distributed Representations of Hypertext Documents

05/10/2018
by   Jialong Han, et al.
0

Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life. Although being effective on plain documents, conventional text embedding methods suffer from information loss if directly adapted to hyper-documents. In this paper, we propose a general embedding approach for hyper-documents, namely, hyperdoc2vec, along with four criteria characterizing necessary information that hyper-document embedding models should preserve. Systematic comparisons are conducted between hyperdoc2vec and several competitors on two tasks, i.e., paper classification and citation recommendation, in the academic paper domain. Analyses and experiments both validate the superiority of hyperdoc2vec to other models w.r.t. the four criteria.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2022

Coherence-Based Distributed Document Representation Learning for Scientific Documents

Distributed document representation is one of the basic problems in natu...
research
01/16/2020

Document Network Projection in Pretrained Word Embedding Space

We present Regularized Linear Embedding (RLE), a novel method that proje...
research
09/25/2020

Virtual Proximity Citation (VCP): A Supervised Deep Learning Method to Relate Uncited Papers On Grounds of Citation Proximity

Citation based approaches have seen good progress for recommending resea...
research
08/25/2023

Nougat: Neural Optical Understanding for Academic Documents

Scientific knowledge is predominantly stored in books and scientific jou...
research
12/27/2021

Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts

In recent years, due to the high availability of electronic documents th...
research
05/25/2021

Taxonomy of academic plagiarism methods

The article gives an overview of the plagiarism domain, with focus on ac...
research
10/02/2016

Text Network Exploration via Heterogeneous Web of Topics

A text network refers to a data type that each vertex is associated with...

Please sign up or login with your details

Forgot password? Click here to reset