Real-Time Construction Algorithm of Co-Occurrence Network Based on Inverted Index

08/17/2023
by   Jiahao Cheng, et al.
0

Co-occurrence networks are an important method in the field of natural language processing and text mining for discovering semantic relationships within texts. However, the traditional traversal algorithm for constructing co-occurrence networks has high time complexity and space complexity when dealing with large-scale text data. In this paper, we propose an optimized algorithm based on inverted indexing and breadth-first search to improve the efficiency of co-occurrence network construction and reduce memory consumption. Firstly, the traditional traversal algorithm is analyzed, and its performance issues in constructing co-occurrence networks are identified. Then, the detailed implementation process of the optimized algorithm is presented. Subsequently, the CSL large-scale Chinese scientific literature dataset is used for experimental validation, comparing the performance of the traditional traversal algorithm and the optimized algorithm in terms of running time and memory usage. Finally, using non-parametric test methods, the optimized algorithm is proven to have significantly better performance than the traditional traversal algorithm. The research in this paper provides an effective method for the rapid construction of co-occurrence networks, contributing to the further development of the Information Organization fields.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2020

Using word embeddings to improve the discriminability of co-occurrence text networks

Word co-occurrence networks have been employed to analyze texts both in ...
research
07/27/2020

Measuring similarity in co-occurrence data using ego-networks

The co-occurrence association is widely observed in many empirical data....
research
07/30/2020

Label or Message: A Large-Scale Experimental Survey of Texts and Objects Co-Occurrence

Our daily life is surrounded by textual information. Nowadays, the autom...
research
07/29/2016

Text authorship identified using the dynamics of word co-occurrence networks

The identification of authorship in disputed documents still requires hu...
research
11/17/2018

Towards Scalable Subscription Aggregation and Real Time Event Matching in a Large-Scale Content-Based Network

Although many scalable event matching algorithms have been proposed to a...
research
05/18/2019

Semantic flow in language networks

In this study we propose a framework to characterize documents based on ...
research
04/16/2023

SikuGPT: A Generative Pre-trained Model for Intelligent Information Processing of Ancient Texts from the Perspective of Digital Humanities

The rapid advance in artificial intelligence technology has facilitated ...

Please sign up or login with your details

Forgot password? Click here to reset