Analyzing Web Archives Through Topic and Event Focused Sub-collections

12/16/2016
by   Gerhard Gossen, et al.
0

Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the opportunities and challenges of this approach and suggest a framework for creating sub-collections.

READ FULL TEXT
research
04/04/2018

Focused Crawl of Web Archives to Build Event Collections

Event collections are frequently built by crawling the live web on the b...
research
07/28/2017

Extracting Event-Centric Document Collections from Large-Scale Web Archives

Web archives are typically very broad in scope and extremely large in sc...
research
02/01/2017

ArchiveWeb: Collaboratively Extending and Exploring Web Archive Collections

Curated web archive collections contain focused digital contents which a...
research
02/01/2017

ArchiveWeb: collaboratively extending and exploring web archive collections - How would you like to work with your collections?

Curated web archive collections contain focused digital content which is...
research
07/05/2017

Web Video in Numbers - An Analysis of Web-Video Metadata

Web video is often used as a source of data in various fields of study. ...
research
09/07/2017

Capturing natural-colour 3D models of insects for species discovery

Collections of biological specimens are fundamental to scientific unders...
research
12/19/2016

The iCrawl Wizard -- Supporting Interactive Focused Crawl Specification

Collections of Web documents about specific topics are needed for many a...

Please sign up or login with your details

Forgot password? Click here to reset