iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling

12/19/2016
by   Gerhard Gossen, et al.
0

Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In this paper we address the issues of enabling the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.

READ FULL TEXT
research
07/06/2021

Garbage, Glitter, or Gold: Assigning Multi-dimensional Quality Scores to Social Media Seeds for Web Archive Collections

From popular uprisings to pandemics, the Web is an essential source cons...
research
06/16/2023

The Use of Web Archives in Disinformation Research

In recent years, journalists and other researchers have used web archive...
research
12/19/2016

The iCrawl Wizard -- Supporting Interactive Focused Crawl Specification

Collections of Web documents about specific topics are needed for many a...
research
10/29/2018

Renarration for All

The accessibility of content for all has been a key goal of the Web sinc...
research
05/27/2019

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Used by a variety of researchers, web archive collections have become in...
research
04/04/2018

Focused Crawl of Web Archives to Build Event Collections

Event collections are frequently built by crawling the live web on the b...
research
11/16/2021

PROVENANCE: An Intermediary-Free Solution for Digital Content Verification

The threat posed by misinformation and disinformation is one of the defi...

Please sign up or login with your details

Forgot password? Click here to reset