A Topological Approach to Compare Document Semantics Based on a New Variant of Syntactic N-grams

03/08/2021
by   Fanchao Meng, et al.
0

This paper delivers a new perspective of thinking and utilizing syntactic n-grams (sn-grams). Sn-grams are a type of non-linear n-grams which have been playing a critical role in many NLP tasks. Introducing sn-grams to comparing document semantics thus is an appealing application, and few studies have reported progress at this. However, when proceeding on this application, we found three major issues of sn-grams: lack of significance, being sensitive to word orders and failing on capture indirect syntactic relations. To address these issues, we propose a new variant of sn-grams named generalized phrases (GPs). Then based on GPs we propose a topological approach, named DSCoH, to compute document semantic similarities. DSCoH has been extensively tested on the document semantics comparison and the document clustering tasks. The experimental results show that DSCoH can outperform state-of-the-art embedding-based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2020

A Topological Method for Comparing Document Semantics

Comparing document semantics is one of the toughest tasks in both Natura...
research
08/30/2018

Syntactic Scaffolds for Semantic Structures

We introduce the syntactic scaffold, an approach to incorporating syntac...
research
03/13/2018

Axiomatic systems and topological semantics for intuitionistic temporal logic

We propose four axiomatic systems for intuitionistic linear temporal log...
research
06/05/2018

Explaining Away Syntactic Structure in Semantic Document Representations

Most generative document models act on bag-of-words input in an attempt ...
research
09/27/2017

KeyVec: Key-semantics Preserving Document Representations

Previous studies have demonstrated the empirical success of word embeddi...
research
09/24/2015

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

We propose a new model for learning bilingual word representations from ...

Please sign up or login with your details

Forgot password? Click here to reset