Implementation of a noisy hyperlink removal system: A semantic and relatedness approach

03/06/2023
by   Kazem Taghandiki, et al.
0

As the volume of data on the web grows, the web structure graph, which is a graph representation of the web, continues to evolve. The structure of this graph has gradually shifted from content-based to non-content-based. Furthermore, spam data, such as noisy hyperlinks, in the web structure graph adversely affect the speed and efficiency of information retrieval and link mining algorithms. Previous works in this area have focused on removing noisy hyperlinks using structural and string approaches. However, these approaches may incorrectly remove useful links or be unable to detect noisy hyperlinks in certain circumstances. In this paper, a data collection of hyperlinks is initially constructed using an interactive crawler. The semantic and relatedness structure of the hyperlinks is then studied through semantic web approaches and tools such as the DBpedia ontology. Finally, the removal process of noisy hyperlinks is carried out using a reasoner on the DBpedia ontology. Our experiments demonstrate the accuracy and ability of semantic web technologies to remove noisy hyperlinks

READ FULL TEXT
research
07/03/2019

Use of OWL and Semantic Web Technologies at Pinterest

Pinterest is a popular Web application that has over 250 million active ...
research
03/25/2022

Personalize Web Searching Strategies Classification and Comparison

Personalization is becoming very important direction in semantic web sea...
research
07/22/2011

MeLinDa: an interlinking framework for the web of data

The web of data consists of data published on the web in such a way that...
research
04/24/2012

ILexicOn: toward an ECD-compliant interlingual lexical ontology described with semantic web formalisms

We are interested in bridging the world of natural language and the worl...
research
11/08/2019

Semi-Supervised Method using Gaussian Random Fields for Boilerplate Removal in Web Browsers

Boilerplate removal refers to the problem of removing noisy content from...
research
12/17/2019

Extraction of Relevant Images for Boilerplate Removal in Web Browsers

Boilerplate refers to unwanted and repeated parts of a webpage (such as ...
research
05/31/2017

Dynamic Discovery of Type Classes and Relations in Semantic Web Data

The continuing development of Semantic Web technologies and the increasi...

Please sign up or login with your details

Forgot password? Click here to reset