Hybrid Multisource Feature Fusion for the Text Clustering

08/24/2021
by   Jiaxuan Chen, et al.
0

The text clustering technique is an unsupervised text mining method which are used to partition a huge amount of text documents into groups. It has been reported that text clustering algorithms are hard to achieve better performance than supervised methods and their clustering performance is highly dependent on the picked text features. Currently, there are many different types of text feature generation algorithms, each of which extracts text features from some specific aspects, such as VSM and distributed word embedding, thus seeking a new way of obtaining features as complete as possible from the corpus is the key to enhance the clustering effects. In this paper, we present a hybrid multisource feature fusion (HMFF) framework comprising three components, feature representation of multimodel, mutual similarity matrices and feature fusion, in which we construct mutual similarity matrices for each feature source and fuse discriminative features from mutual similarity matrices by reducing dimensionality to generate HMFF features, then k-means clustering algorithm could be configured to partition input samples into groups. The experimental tests show our HMFF framework outperforms other recently published algorithms on 7 of 11 public benchmark datasets and has the leading performance on the rest 4 benchmark datasets as well. At last, we compare HMFF framework with those competitors on a COVID-19 dataset from the wild with the unknown cluster count, which shows the clusters generated by HMFF framework partition those similar samples much closer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2014

Improving Image Clustering using Sparse Text and the Wisdom of the Crowds

We propose a method to improve image clustering using sparse text and th...
research
02/17/2023

Multi-View Clustering from the Perspective of Mutual Information

Exploring the complementary information of multi-view data to improve cl...
research
12/03/2019

Multi-view Subspace Clustering via Partition Fusion

Multi-view clustering is an important approach to analyze multi-view dat...
research
01/16/2019

Visual Feature Fusion and its Application to Support Unsupervised Clustering Tasks

On visual analytics applications, the concept of putting the user on the...
research
01/01/2017

Self-Taught Convolutional Neural Networks for Short Text Clustering

Short text clustering is a challenging problem due to its sparseness of ...
research
04/15/2019

Deep Comprehensive Correlation Mining for Image Clustering

Recent developed deep unsupervised methods allow us to jointly learn rep...
research
11/29/2018

Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

Current state-of-the-art nonparametric Bayesian text clustering methods ...

Please sign up or login with your details

Forgot password? Click here to reset