Semantic Document Clustering on Named Entity Features

07/20/2018
by   Tru H. Cao, et al.
0

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many cases are of user concerns. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type pairs, and identifiers, instead of keywords. Then, hierarchical document clustering can be performed using the similarity measure defined as the cosines of the vectors representing documents. Experimental results are presented and discussed. Clustering documents by information of named entities could be useful for managing web-based learning materials with respect to related objects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2018

A Generalized Vector Space Model for Ontology-Based Information Retrieval

Named entities (NE) are objects that are referred to by names such as pe...
research
06/05/2022

Story Beyond the Eye: Glyph Positions Break PDF Text Redaction

In the past redaction involved the use of black or white markers or pape...
research
07/20/2018

Combining Named Entities with WordNet and Using Query-Oriented Spreading Activation for Semantic Text Search

Purely keyword-based text search is not satisfactory because named entit...
research
07/12/2017

Using RDF Summary Graph For Keyword-based Semantic Searches

The Semantic Web began to emerge as its standards and technologies devel...
research
02/16/2017

Clustering articles based on semantic similarity

Document clustering is generally the first step for topic identification...
research
02/08/2017

Name Disambiguation in Anonymized Graphs using Network Embedding

In real-world, our DNA is unique but many people share names. This pheno...

Please sign up or login with your details

Forgot password? Click here to reset