Vec2GC – A Graph Based Clustering Method for Text Representations

04/15/2021
by   Rajesh N Rao, et al.
0

NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2020

A Novel Graph Based Clustering Approach to Document Topic Modeling

Clustering is the task of assigning a set of objects into groups so that...
research
03/22/2019

An end-to-end Neural Network Framework for Text Clustering

The unsupervised text clustering is one of the major tasks in natural la...
research
01/16/2014

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

While traditional research on text clustering has largely focused on gro...
research
02/08/2017

Name Disambiguation in Anonymized Graphs using Network Embedding

In real-world, our DNA is unique but many people share names. This pheno...
research
10/28/2020

Graph-based Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News Articles

Production of news content is growing at an astonishing rate. To help ma...
research
07/03/2019

Clustering of Medical Free-Text Records Based on Word Embeddings

Is it true that patients with similar conditions get similar diagnoses? ...
research
09/27/2020

NN-EVCLUS: Neural Network-based Evidential Clustering

Evidential clustering is an approach to clustering based on the use of D...

Please sign up or login with your details

Forgot password? Click here to reset