Curators create web archive collections to preserve pages, thereby preventing link rot or content drift, according to a particular theme or collection development policy. Such collections have been used by historians (Milligan, 2019), journalists (Hafner and Palmer, 2017), and other researchers (Curty and Zhang, 2011) to understand the details of particular events, subjects, or even the changes in an organization. These collections are often built using tools like the Internet Archive’s subscription-based service Archive-It111https://archive-it.org. When such collections are encountered by those who did not build them, how are these third-parties to know what they contain?
Web pages exist in the “perpetual now” and are updated with new content as needed. The Memento protocol (Van de Sompel et al., 2013) uses the term original resource to refer to the current version of the web page on the live web. Curators build web archive collections by employing software known as a crawler. A crawler visits the original resource, and the representation captured at crawl time is known as a memento, a version of the page now in the archive that will no longer change even if the live web version changes. Curators create a collection by choosing seeds, URLs of original resources from which to begin the crawl. Depending on the crawl parameters, the collection could include additional original resources that are not seeds (e.g., pages linked from a seed). Figure 1 displays a simplified view of this collection building. Curators select seeds based on a theme. They crawl these seeds at different points in time, thus each seed produces multiple mementos of that seed, with each memento representing the seed page at a different point in time. In addition, a curator can instruct the software to follow all links from each page, resulting in many more mementos linked from the seed and then linked from those pages. For example, if a seed has three links to pages with three links each, a single crawl can lead to 13 documents being added to the collection. If this same seed is crawled three times, then 39 documents are added to the collection. This process causes web archive collections to grow to hundreds or thousands of documents.
Inspired by the work of AlNoamany et al. (AlNoamany et al., 2017a), we want to provide users with a visualization that allows them to understand a collection so that they can determine if the time spent evaluating these thousands of documents is worthwhile. Rather than synthesizing additional material, we want to intelligently sample mementos from the mementos that are in the collection, such that . Our mementos become a story summarizing the collection. The right side of Figure 1 displays the storytelling part of the process. AlNoamany’s work visualized mementos using the now-defunct social media service Storify (Jones, 2017; Storify, 2017), but was this the best interface? Given a sample of mementos, how do we effectively visualize these stories so that a user understands the underlying collection?
Existing information retrieval (IR) research has focused on the concept of providing each search result to a user as a surrogate of the underlying web page. Figure 1(a) displays a surrogate from a Google search engine result page. Surrogates are used by search engines to answer a user’s question of “Will this link meet my information need?” Social media uses surrogates as well. Figure 1(b) displays the same URI rendered as a Facebook social card. In social media, surrogates answer the question of “Should I click on this?” The differences in use cases are subtle. Each surrogate is a summary of the page, often providing images, text, and metadata. We wish to use surrogates as well, but our use case is different from search engines and social media. In social media, the user focuses on a single surrogate. In IR, they compare many surrogates to each other, but for discriminating between documents. We want to provide them with a cohesive story using the combination of many surrogates together as a single unit. Using a visualization of not one, but surrogates, we want to answer the user’s question of “What does the underlying collection contain?” The mementos in this visualization are not search results, but a product of this automatic sampling. We wish to challenge conventional wisdom beyond aesthetics. Our goal is to demonstrate the utility of a given surrogate for our web archive collection use case. There are many types of surrogates. Which one best conveys the concepts of the underlying collection?
In this pilot work, we consider six different types of surrogates and how well they might work to convey understanding of a collection. We compare the existing Archive-It surrogates, thumbnails of page screenshots, social cards, and three combinations of social cards and thumbnails. Our hypothesis is that surrogates with more information drawn from the source document produce better results, both in terms of time and understanding. Because we are evaluating surrogates for use in collection understanding rather than search engine result performance, we consider this to be a unique contribution. Overall, our results show that the type of surrogate does not influence the time to complete the task, but social cards () and social cards side-by-side with thumbnails () probably provide better collection understanding than the existing text-based Archive-It interface(Figure 4). We find that our participants interact most with the social card side-by-side with thumbnails and second most with screenshots alone. While Archive-It is our focus, our results can be applied to other web archiving platforms, such as Webrecorder222https://webrecorder.io/. These results are important in understanding not only which surrogate performs best for our web archiving summaries, but also for social media, live web curation platforms, and bookmarking applications as well.
With more than 8,000 collections (Jones et al., 2018a) by the end of 2017, the Internet Archive’s Archive-It is the largest web archive collection platform. It allows curators to easily select seeds and control crawling behavior. By default, Archive-It starts each crawl at a seed and creates mementos of other linked documents from the same web site until it reaches a preconfigured document count, storage limit, or time limit. With each curator’s subscription comes a pre-established data storage limit, bounding the size of all of their collections. Thus, it is in their best interest not to create an excessive number of mementos. Curators can change crawling behavior in a variety of ways ranging from asking Archive-It to only crawl the single page to expanding the crawling scope to include connected web sites. It is difficult for an outsider to know which crawling behavior was selected at the time of crawl without again crawling the resulting mementos.
Archive-It provides a search interface allowing a user to find collections matching certain key words. Figure 3 demonstrates that searches for topics such as “human rights” return more than 30 collections. When a third-party user accesses one of these collections, they are greeted by an interface like that shown in Figure 4. This interface is seed-centric, driving users to explore the collection first via the URLs of seeds and the metadata supplied by curators. To understand the collection, a user must follow a link from this seed interface to a list of mementos for that seed. These mementos are accessible via URIs like any other web resource. To differentiate them from original resource URIs, we refer to memento URIs with the Memento protocol (Van de Sompel et al., 2013) standard nomenclature URI-M. The user clicks on a link to a URI-M from that list to then read its contents. The user can then follow links to other mementos until they reach a page that was not archived. From there, they can select another memento from the same seed or start again with a link from the seed. This is a tedious process, requiring going through thousands of documents to understand the collection. If a human is trying to decide between many collections, they would need to go through many documents one-at-a-time using this interface. To narrow down the number of mementos to review, a user can employ the Archive-It search engine on a single collection, but they must know enough about the collection to form a query.
Each Archive-It collection has a page, shown in Figure 4, that allows end users to view metadata about the collection and search within its contents via traditional IR techniques such as facets and search forms. Metadata is optional and may not be present on seeds or even entire collections.
Each seed (not memento) has its own surrogate in the Archive-It interface. Curators can enhance these surrogates with metadata, but again it is optional. The storage of this metadata is already handled by the database backend of Archive-It. We analyze Archive-It surrogates and discuss their metadata in Section 4.
Browser thumbnails are screen captures of a web page rendered in a browser. Kopetzky demonstrated the use of thumbnails as surrogates as early as 1999 (Kopetzky and Mühlhäuser, 1999). Shown in Figure 3, the UK Web Archive uses browser thumbnails as surrogates for mementos in its collections. These browser thumbnails are also used by other collection visualization tools, such as TMVis (Weigle and Nelson, 2017) and What Did It Look Like? (Nwala, 2015), to show how seeds change over time. Even though their generation can be automated with tools like Puppeteer444https://pptr.dev, browser thumbnails require significant resources to create. Generation involves launching a browser, loading the page, including all images and scripts, and then taking a screenshot of that page. In addition to the memory and processing needed, thumbnails also require multiple network connections to retrieve all resources for a page. In aggregate, browser thumbnails can also be costly to store, leading the UK Web Archive to only store thumbnails for seeds, but not linked pages (Jackson, 2019). This cost in time and resources has led to research that focuses on optimizing the selection of mementos worthy of thumbnails (AlSum and Nelson, 2014). The UK Web Archive uses thumbnails only 98 pixels wide. Because we seek to evaluate understanding, the thumbnails in this study are 208 pixels wide, established as the optimal size for high recognition by Kaasten (Kaasten and Greenberg, 2002).
A common surrogate found in social media is the social card, like the Facebook example in Figure 1(b). Social cards typically contain an image selected from the underlying web page, the title of that page, and some text sampled from the page. Social cards can require fewer HTTP requests than thumbnails. They extract existing content from the page and do not require the time and space required to create and store new content, such as a thumbnail. The popularity of social cards has encouraged both Twitter and Facebook to recommend specific HTML metadata fields so that authors can control how cards are generated from their pages555http://ogp.me/,666https://developer.twitter.com/en/docs/tweets/optimize-with-cards/guides/getting-started. We know of no web archives that currently use social cards as surrogates for their mementos.
Services such as Embed.ly777https://embed.ly exist to produce social cards of live web resources. When used to generate social cards for mementos, they create a poor or confusing experience for users (Figure 5(a)). For this reason, we have developed MementoEmbed, an archive-aware platform that accepts a URI-M and then generates either a social card or a thumbnail for that memento (Jones, 2018b). In addition to the image, title, and text provided by most social cards, MementoEmbed also provides the date and time of the observation leading to the memento, its original domain name and favicon, the name and favicon of the web archive holding it, and links to other versions of this same page (Figure 5(b)). Most of this data comes from the underlying Memento protocol supported by many web archives (Van de Sompel et al., 2013). MementoEmbed is used to generate the social cards and thumbnails used in our study.
To evaluate our surrogates in terms of understanding, we have several requirements. We must recruit participants and we must also provide a consistent environment to evaluate their understanding. We must then evaluate how well the participants demonstrate that they understand some elements of the underlying collection by viewing the surrogates.
To recruit a sufficient number of participants for this study, we turned to Mechanical Turk (MT). MT provides a web interface for participants to view information and fill out surveys. MT participants are paid for their submissions. Each assignment in MT is referred to as a Human Interface Task (HIT). MT has been used in other visualization studies with great success. It has allowed researchers to verify earlier results with a larger set of participants (Kosara and Ziemkiewicz, 2010; Bartneck et al., 2015), and others have used it to test the effectiveness of new visualization techniques (Heer and Bostock, 2010). As our surrogates are visualizations of underlying mementos, this past support provides confidence in MT as a recruitment tool.
Bloom’s taxonomy (Bloom et al., 1956) and Anderson and Krathwohl’s later revision (Anderson et al., 2001) provide definitions for different levels of cognitive effort with respect to learning a subject. Kelly applies these concepts to the development of search tasks (Kelly et al., 2015) for IR studies. In our study, we focus on two levels from this taxonomy. The remember process requires that the participant demonstrate the ability to identify and retrieve specific facts. The understanding process requires that the participant infer and construct additional meaning based on what they have learned already. We evaluate a user’s ability to remember by giving them 30 seconds to view a visualization before presenting them with a question. We evaluate their ability to understand by asking them to select which mementos from a list likely come from the collection that they just viewed.
3. Related Work
Summarization of Archive-It collections using surrogates was pioneered by AlNoamany et al. (AlNoamany et al., 2017a). She focused on the use of Storify as the target visualization platform, but Storify has been shut down (Jones, 2017; Storify, 2017). Storify used social cards exclusively, and AlNoamany et al. did not evaluate other surrogate types.
All of the following studies evaluated surrogates in terms of search engine result relevance. In 2001, Woodruff et al. attempted to improve upon the browser thumbnail by introducing the “enhanced thumbnail”, which also included highlighted and enlarged text to further convey aboutness (Woodruff et al., 2001). As search result surrogates, she discovered that thumbnails outperformed text alone, and enhanced thumbnails outperformed thumbnails. Unfortunately, the discovery and enlargement of text made enhanced thumbnails computationally expensive to create. In 2009, Teevan et al. further sought to replace the thumbnail with the “visual snippet” (Teevan et al., 2009). Visual snippets consist of a 120-by-120 pixel image representing the page constructed from an internal image, the title, and a logo. Her user testing showed that they performed better than thumbnails. She also demonstrated that text alone performed better than thumbnails. Dziadosz and Chandrasekar (Dziadosz and Chandrasekar, 2002) found that text alone combined with thumbnails performed better than merely text alone and that text alone performed better than thumbnails alone. Aula et al. (Aula et al., 2010) discovered no difference in performance between text alone and thumbnails. She also examined text combined with thumbnails and found no difference in performance.
In more recent years, social cards have become a topic of study. Al Maqbali et al. (Al Maqbali et al., 2010) discovered no performance difference between text combined with thumbnail, social card, or text alone. Loumakis discovered no performance difference between text snippets or social cards (Loumakis et al., 2011). Capra et al. (Capra et al., 2013) discovered that social cards were barely more performant than text snippets for search.
These studies all consider how well these surrogates perform for the purpose of relevance judgements in search results. The surrogate only needed to answer a single question for the user: “Will this link meet my information need?” We differ by considering how well the surrogates themselves convey understanding when presented together as a story summarizing a web archive collection, answering the question of “what does the underlying collection contain?” Our study also provides a unique contribution in this space, as none of these prior studies compare browser thumbnails to social cards directly.
We have chosen MT as a recruitment tool for evaluating our visualizations. Kittur et al. (Kittur et al., 2008b) evaluated using MT for complex tasks. He cautions that participants are encouraged to complete tasks quickly to increase their rate of pay, but sometimes this results in nonsense answers, thus it is “best suited for tasks in which there is a bonafide answer”. His study showed that MT could be used for the complex task of rating the quality of Wikipedia articles, producing similar results to human Wikipedia curators. Bartneck et al. (Bartneck et al., 2015) asked participants to rate expressions on LEGO minifigure faces, and discovered that MT participants performed as well as participants from in-person studies. Heer et al. (Heer and Bostock, 2010)
repeated a well known visualization study using MT participants. Heer showed participants different visualizations, and asked them to identify the smaller of two marked values. Heer then asked them to estimate what percentage the smaller was of the larger. Heer’s results were consistent with the original study showing that position outperformed length in terms of human cognition. After establishing that MT participants were consistent with in-person studies, Heer went on to evaluate new visualizations. Micallef et al.(Micallef et al., 2012) evaluated different visualization techniques for understanding the results of Bayesian problems. She confirmed that MT participants did not perform better with any of the visualizations. Yu et al. (Yu et al., 2013) used MT participants to discover which pictograms better described at-home medical procedures. The success of these studies informs our choice of MT as a recruitment tool.
4. Evaluation of Archive-It Surrogates
Before evaluating discussing the results of evaluating different surrogates against each other, we first quantify the information available from Archive-It surrogates. Rather than accepting colloquial reports about the variation in Archive-It surrogates, we used the aiu Python package (Jones, 2018a) to collect the metadata of 5,857 public Archive-It collections in March 2019. Our goal was to understand the amount of metadata available with most Archive-It surrogates.
Curators may also supply metadata for these seeds, but not their mementos (Praetzellis, 2016). The metadata available is based on structural vocabulary provided by Archive-It. Most of these fields come from Dublin Core (Apps, 2013), with some Archive-It specific fields like group. The curator can also supply fields from their own freeform vocabulary. Archive-It surrogates contain two sources of metadata: the archiving process and the original resource. A seed’s minimal Archive-It surrogate contains information from the archiving process: the seed’s URL, the dates of the first and last memento, and the number of mementos available. The third surrogate in Figure 4 is an example of a minimal surrogate. The title is an example of metadata derived from the original resource. It may be manually extracted at the time the seed is added to the collection or may be manually added later by the curator. These original resource fields are optional.
In addition to metadata being nonexistent, it is also inconsistently applied among surrogates, as seen in the Figure 4. As shown in Figure 7, from the 602,944 seeds gathered, 329,178 have no metadata, meaning that 54.60% of seeds are represented by the minimal Archive-It surrogate. These seeds convey only the URL and information from the archiving process. Of the 84,558/602,944 (14.02%) seeds using one metadata field, the top three fields in use are group (40,144/84,558), title (29,175/84,558), and coverage (8,665/84,558). Group allows the user to create sub-collections of seeds. The coverage field corresponds to the Dublin Core field of the same name.
Figure 8 shows the top ten fields in use by any seed, regardless of the number of fields per seed. In this case title is the most widely used metadata field, being present in 177,680/602,944 (29.5%) seeds. The description field is in use by 110,065/602,944 (18.3%) seeds. If metadata fields are provided by the curator, such as a title or description, the Archive-It surrogate begins to resemble surrogates typically found in search engine results. The two fields together are used on 75,575/602,944 seeds, meaning that 12.53% of Archive-It seeds contain the same metadata fields as a Google surrogate.
Some collections, such as Government of Canada Publications (ID 3572)888https://archive-it.org/collections/3572, have hundreds of thousands of seeds, making the addition of metadata a costly proposition in terms of manual time and effort. Does this cost affect the behavior of the curator? For each collection, we counted how many metadata fields were applied to all seeds in the collection, regardless of size. We then divided the number of fields counted by the number of seeds to produce the mean metadata field count per collection. Figure 9 shows a point for each collection where the y-axis is the mean metadata field count and the x-axis is the number of seeds in scale. This graph displays a pattern whereby an increase in the number of seeds corresponds to a decrease in the number of metadata fields used to describe those seeds. This matches our intuition that because each metadata field requires some level of effort to maintain, the curator supplies fewer metadata fields as the number of seeds increases. The mean metadata field count for 3,096/5,867 (52.86%) collections is , again indicating that a majority of collections only contain minimal Archive-It surrogates.
These results appear to support our intuition that many of the Archive-It surrogates contain little information, but do they? How much information can be gathered from the seed URIs? As noted, there are many collections about the same topic, so there is some overlap in choice of seed URIs by different curators. There are 14,179 repeated seed URIs across Archive-It collections, meaning that only 588,749 unique seed URIs exist in Archive-It. From those seed URIs, we employed regular expressions from Alkwai’s work (Alkwai, 2019) to detect different forms of crude information available in the seed URIs from Archive-It. As shown in Figure 10, it is possible for a seed URI to belong to all three information classes.
Our regular expressions only detected dates in the paths of 55,924/588,749 (9.44%) seed URIs. Such dates are typically the publication dates of blog posts or news articles. Dates can provide the viewer with a concept of aboutness with respect to the time period of a collection.
Long strings may indicate the presence of phrases or sentences. Long strings are defined as any string greater than five alphabetic characters followed by an underscore or other separator, followed by another set of five alphabetic characters. We discovered 62,370/588,749 (10.59%) URIs contained long strings in their domain names.
We borrow the term slug from journalism, where it indicates a shortened title for an article. Slugs are detected in the path part of a URI using the same rules as long strings. We discovered that 177,441/588,749 (30.14%) seed URIs contained slugs in their path.
Figure 11 displays the results of this analysis. These results indicate that, in spite of missing metadata, information can still be gleaned from the URIs found in Archive-It surrogates.
5. Comparing Surrogates
We conducted prototype studies in Fall 2018. Rather than developing reading comprehension questions or constructing artificial search tasks, we chose something more easily measurable and verifiable: a checklist of known correct or incorrect items that we could keep consistent between participants viewing the same collection. This would more directly let us compare their performance. These prototypes taught us to avoid tasks that would overly favor one surrogate over another. We also wanted to ensure that the participants did not rely on their own knowledge and instead used the information from the visualization they were presented to answer the question.
In January 2019 we presented 120 MT participants with a link to a survey hosted at Old Dominion University. We produced four stories represented by six different surrogates for 24 different combinations of surrogates and stories. This gave us five participants per story-surrogate combination, providing 20 participants per surrogate type. The MT participants were required to have the Master Turker qualification and an acceptance rate of greater than 95%. To control for the effects of learning (Kelly, 2007), we employed UniqueTurker999http://uniqueturker.myleott.com to ensure that the same participant did not provide results for multiple surveys. Each participant was paid $0.50 to complete the task.
After reading the instructions, each participant was given 30 seconds to view a story using a given surrogate. They were then asked a question about what they had just seen. As is common practice for externally hosted surveys on MT, once they submitted their results, they were given a completion code for the MT HIT so that we could map their results to those collected by our survey.
|Collection||Collection||Collected||Diversity of||Collection||Collection Size||Story||% of Story|
|ID||Name||By||Original Resource||Lifespan||(# of Mementos||Size||with Good|
|Domain Names||from Seeds)||Surrogates|
|694||April 16||VT: Crisis, Tragedy, etc.||0.8391||48 weeks||374||17||88.24%|
|1784||Earthquake||IA Global Events||0.7656||9 weeks||1,080||27||85.19%|
|2017||Wikileaks 2010||IA Global Events||0.575||3 years||3,333||24||70.83%|
|2358||Egypt Revolution||American University||0.2585||7 years||80,484||16||87.50%|
|and Politics||in Cairo|
|2535||Brazilian School||VT: Crisis, Tragedy, etc.||0.2604||5 days||1,540||26||73.08%|
|2823||Russia Plane Crash||VT: Crisis, Tragedy, etc.||0.7843||1 week||603||27||77.78%|
|2950||Occupy Movement||IA Global Events||0.5585||44 weeks||31,863||15||100.00%|
|3649||2013 Boston||IA Global Events||0.3766||1.9 years||2,421||27||96.30%|
|3936||United States||IA Global Events||0.1177||4 years||24,583||16||93.75%|
|4887||Global Health Events||NLM||0.2723||4 years||9,204||35||100.00%|
As a source of stories to display to the participants, we selected four stories from AlNoamany’s 2016 dataset (AlNoamany et al., 2017b). Each story consists of ordered URI-Ms selected by a human curator to describe their collection. Details of the full dataset are shown in Table 1. Some collections have mementos that are no longer available, possibly because they were removed by the curator. Some collections also have mementos that produce poor quality thumbnails. If a thumbnail failed to contain at least a heading describing some of the content within the memento, we considered it to be of poor quality. The last column in this table lists the percentage of the story that produced good quality surrogates.
We did not select collection Global Health Events web archive (ID 4887) or United States Government Shutdowns (ID 3936) because they have been repurposed to suit a larger topic and hence their 2016 stories no longer accurately reflect their content. Our four selections represent a variety of structural and semantic considerations. Occupy Movement 2011/2012 (ID 2950) was selected because it produces the best quality thumbnails. April 16 Archive (ID 694) has the highest diversity of original resource domain names in its URIs (Jones et al., 2018a). Egypt Revolution and Politics (ID 2358) is a collection that is still currently being maintained and hence is the longest lived collection in the set. Collection Russia Plane Crash Sept 7,2011 (ID 2823) is about an event that is likely not familiar to American MT participants.
To compare against the as-is interface at Archive-It, we generated a facsimile of the Archive-It surrogates using Archive-It’s stylesheets as well as metadata gathered using aiu (Jones, 2018a). An example story using the Archive-It Facsimile surrogate is shown in Figure 12.
To produce the two correct answers, we randomly selected two URI-Ms from the same collection as the story shown to the participants. In choosing these URI-Ms, we discarded ones that used the same original resource domain as any memento in the story, avoiding issues where simple banners or logos might indicate that they are from the same collection.
|694 – April 16 Archive||0.000||0.969||0.970||0.981||0.961||0.968||0.986||0.962||0.978||0.974|
|1784 – Earthquake in Haiti||0.969||0.000||0.959||0.971||0.960||0.975||0.983||0.967||0.972||0.961|
|2017 – Wikileaks 2010 Document Release Collection||0.970||0.959||0.000||0.962||0.953||0.977||0.965||0.959||0.956||0.966|
|2358 – Egypt Revolution and Politics||0.981||0.971||0.962||0.000||0.958||0.985||0.965||0.971||0.955||0.970|
|2535 – Brazilian School Shooting||0.961||0.960||0.953||0.958||0.000||0.974||0.967||0.955||0.952||0.961|
|2823 – Russia Plane Crash Sept 7,2011||0.968||0.975||0.977||0.985||0.974||0.000||0.992||0.978||0.987||0.977|
|2950 – Occupy Movement 2011/2012||0.986||0.983||0.965||0.965||0.967||0.992||0.000||0.974||0.942||0.981|
|3649 – 2013 Boston Marathon Bombing||0.962||0.967||0.959||0.971||0.955||0.978||0.974||0.000||0.961||0.968|
|3936 – United States Government Shutdowns||0.978||0.972||0.956||0.955||0.952||0.987||0.942||0.961||0.000||0.966|
|4887 – Global Health Events web archive||0.974||0.961||0.966||0.970||0.961||0.977||0.981||0.968||0.966||0.000|
To produce the four incorrect answers, we selected four other URI-Ms from semantically different collections. To determine which collections were semantically different from our story collection, we extracted entities from each collection in AlNoamany’s dataset using Stanford NLP (Manning et al., 2014). We then computed the Jaccard distance between these entity sets and selected two collections with the greatest distance from our story collection. We randomly selected two URI-Ms from the most distant and second most distant collections. The distances between these collections are shown in Table 2, where blue indicates the collections most distant from each other, and light green indicates second most distant. For the question for the Egypt collection shown in Figure 18, the collection Russia Plane Crash Sept 7,2011 (ID 2823) has the greatest distance at 0.985. With a distance of 0.981, April 16 Archive (ID 694) comes in second. Hence, two mementos are selected from each of these collections for the incorrect answers.
In all cases, we discarded URI-Ms that produced poor quality thumbnails to ensure that the quality of the memento did not affect the participant’s choice. We also discarded URI-Ms that were off-topic, such as maintenance pages or 404 pages, as described in (Jones et al., 2018b). If a URI-M was discarded, we redrew to ensure that there were two selections from the collection they had just viewed, two selections from the most semantically distant collection, and two selections from the second most distant collection. We then randomly sorted the six URI-Ms and generated the surrogates. MementoEmbed cards contain the name of the collection from which they were selected. To avoid giving an unfair advantage to social cards, the collection name was removed from the social cards used in the question. Appendix B contains screenshots of the questions shown to study participants.
Table 3 displays the mean and median question completion times for each surrogate. At 149.53 seconds, the Archive-It Facsimile surrogates have the highest mean time for answering the question. Browser thumbnails come in second highest at 111.22 seconds. Social cards have the lowest overall mean at 46.12 seconds. The sc+t and sc^t have means slightly greater than 62 seconds. The sc/t surrogate comes in slightly higher at 62.86 seconds. We executed the Student’s t-test on the times for all pairs of surrogates. No values are statistically significant at . Social cards compared to browser thumbnails produces the lowest -value at . The next lowest -value is
for social cards compared to the Archive-It Facsimile. In spite of the mean values, these p-values indicate that our results provide weak evidence that the Archive-It Facsimile or thumbnails take the most amount of time to evaluate or that social cards take less time. The medians demonstrate that some outliers are skewing these means. The Archive-It Facsimile has the lowest median at 33.46 seconds, followed by social cards at 35.89 seconds. The median completion time for browser thumbnails is highest at 53.30 seconds. The combinations of social card and thumbnail all have medians between 38 and 40 seconds. Thus, even though the browser thumbnails still have the highest median, the p-values still demonstrate that we have not established that thumbnails take longer to process.
Table 4 displays the mean and median number of correct answers for each surrogate. With only 2 correct answers out of 6, the distribution of potential values is small. Social cards score highest with a mean correct answer score of 1.75, followed by a tie between sc+t and sc^t at 1.70. The Archive-It Facsimile mean is the lowest at 1.30. The medians are 2.0 for all but the Archive-It Facsimile at 1.5. The Archive-It Facsimile paired with the social card comes closest to statistical significance at with . The next lowest -values are for Archive-It vs. sc+t at and Archive-It vs. sc^t at . Within collection 2358, social cards, sc+t, and sc/t all fare better than the Archive-It Facsimile at in all cases. Within collection 2950, social cards and sc+t all fare better than the Archive-It Facsimile, both at . Familiarity with the topics of some collections may have influenced the results and this is why we had selected different collections for this evaluation. The close p-values indicate that our general results of social cards compared to the Archive-It surrogate are similar to those of Capra et al. (Capra et al., 2013), even though Capra focuses on information retrieval and not summarization.
The variation in the quality Archive-It Facsimile surrogates may also have shaped the results. Some of the Archive-It surrogates in the story for the Egypt collection contained as many as 12 additional metadata fields while others from the same collection were minimal Archive-It surrogates. Almost all of the surrogates in the story for the Occupy collection contained only the additional metadata field Group. In those cases Group contained values like Social Media and News Sites and Articles, text that provides little information specific to the collection. In contrast, almost all Archive-It surrogates for stories from the Russia and VATech collections contained the additional title metadata field. For a story consisting of mostly minimal Archive-It surrogates, it is possible that a small number of metadata-rich surrogates provided enough information for the user to effectively answer the question.
Because each story has a different size, it is difficult to normalize the recorded user interactions across all stories. We chose to tally the number of users who hovered over images, hovered over links, and clicked links. The results are shown in Figure 19. This engagement gives some insight into the amount of work each participant put into interacting with the story that they viewed. Recall that there are 20 total participants for each surrogate type.
Social cards produced interactions from the least number of participants. The most engagement with social cards involved link hovering. We recorded 10 participants that hovered over links in social cards but only one participant clicked on links. Only two participants hovered over images. As noted above, they also spent the least amount of time with social cards when answering the question, and typically answered the questions more accurately. It is possible that they felt that social cards provided sufficient information for them to answer the question quickly and correctly.
The Archive-It surrogate has no images, and hence no image hovers. We recorded 18 participants hovering over the links, but only five participants clicked on these links. In spite of no images being present, they hovered and clicked more links than with social cards. Considering their performance in both time and correct answers, it is possible that they felt that more interaction was necessary to understand the story.
With browser thumbnails the image is the anchor of the link, hence every hover over an image is also a hover over a link. To account for this, we created a separate category named “thumbnail hovers” combining link and image hovers for thumbnails. Browser thumbnails experienced the most link clicks with 18 participants choosing to open the page behind the surrogate rather than just relying upon the surrogate alone. We recorded 19 participants hovering over thumbnails for this surrogate type. We did not measure if the user magnified each thumbnail or how long they viewed the pages that they opened.
The sc+t consists of a social card with a thumbnail beside it, also of 201 pixels wide, meaning that the sc+t surrogate also supports thumbnail hovers. This surrogate type produced interactions from the most users. This level of engagement is surprising considering that the mean completion time for sc+t is shorter than that of browser thumbnails. We recorded 17 participants hovering over the thumbnail portion of the sc+t surrogate. This is two fewer participants than for the browser thumbnail surrogate, but still indicates a lot of mouse movement around thumbnails. Only two participants chose to hover over the non-thumbnail images on the social cards. This surrogate type did not inspire as much link clicking as browser thumbnails, with only seven participants clicking links. This is still higher than social cards alone, where only one participant clicked links.
The sc/t surrogate contains a thumbnail instead of the striking image normally found in social cards, so the image hovers are actually over thumbnails. We do not count them as thumbnail hovers because these images are not also anchors for links. For sc/t, four participants hovered over images, 17 participants hovered over links, and seven participants clicked links on these surrogates. This difference in behavior, coupled with the different response times and accuracy for sc/t compared to social cards suggests that including the thumbnail rather than a striking image drawn from the page may inspire more activity on the part of the user.
The sc^t surrogate provides a thumbnail if the user hovers over the striking image. Only four participants actually discovered this capability. In addition, 13 participants hovered over links, and seven participants clicked links.
Social cards inspired the least user interactions and the least link clicks. Perhaps the social card inspired more confidence and fewer participants needed to view the pages behind them. In contrast, the most users clicked on thumbnails to open links. Perhaps they found the thumbnails harder to read and felt less confident about their content. The most participants interacted with the sc+t surrogate in some way. More link clicks occurred in all cases where thumbnails were present. This difference in behavior, coupled with the different response times and accuracy for sc/t compared to social cards suggests that including the thumbnail rather than a striking image drawn from the page may inspire more activity on the part of the user. It is possible that our survey measured users zooming in on thumbnails to see them better. Link hovers have a strong correlation with completion time at Pearson’s , but other interactions, including link clicks, had much weaker correlations to completion time at . Link hovers have a weak negative correlation with answer accuracy at , but other interactions had much weaker correlations to accuracy at . It is possible that participants hovered over links to read the URLs in their browser status bar before making their choice.
6. Future Work
In our previous work (Jones et al., 2018a), we organized Archive-It collections into four categories. The collections in this study fit into the category of type Time Bounded - Spontaneous. AlNoamany et al. (AlNoamany et al., 2017a) discuss different types of stories that can be derived from web archive collections. All of the stories used in this study are of the type sliding page, sliding time. A study examining if some surrogates perform better for other types of collections and other types of stories may be beneficial.
The type of question asked of the participant may also allow us to determine which aspects of these surrogates work best for different purposes. For example, if we present the participant with a series of images drawn from various collections, it may indicate how well images function for understanding.
How well do the contents of the surrogates compare to the underlying documents they visualize? Computing the overlap between the text present in the surrogate and the information of the documents they visualize may provide a measure of how well a surrogate is expected to perform. These results can be contrasted with how well users actually perform.
Our results are similar to those observed by Capra et al. (Capra et al., 2013). What other visualization elements from search engine result pages may be useful to our summarization efforts? Perhaps we should next explore concepts like entity cards (Bota et al., 2016) which summarize multiple resources from a collection that center on a specific entity.
Another area of interest to explore may be the sources of content. If users can identify sources via domain name on the social cards, full URI in the Archive-It surrogate, or recognizing layouts and logos in the browser thumbnail, then it may affect how they view the content of the story and hence the underlying collection.
Do users visually scan differently for thumbnails vs. social cards or the Archive-It like interface? Perhaps techniques like eye tracking can be introduced to evaluate their behavior to ensure that information is presented in a location optimized for their behavior.
Further measuring different interactions with other parts of the surrogate may offer additional insight. We assume that users are zooming in to better view thumbnails, but we have no way of measuring that at this time.
Determining the most effective visualization is only one important part of our work. The stories in this study were generated by human curators. We are also building on the work of AlNoamany et al. (AlNoamany et al., 2017a) by creating new algorithms to automatically select mementos that best represent the collection.
Surrogates have been used in the past to answer the question of “should I click on this?” In this work, we instead consider the use of surrogates in a group to answer the question “What does the underlying collection contain?” We examined the variation in metadata present in Archive-It surrogates and found that, in spite of more than half of Archive-It surrogates missing data, information could still potentially be gleaned from the URL present in a minimal surrogate. We asked participants from MT to view a story visualized using a given surrogate. We then gave them a question with six mementos visualized using the same surrogate and asked them to choose the two from the six that they believed belonged to the same collection as the story that they just viewed. The type of surrogate does not influence the time to complete the task, but social cards and social cards side-by-side with thumbnails probably provide better collection understanding than the existing Archive-It interface at , and , respectively. This is consistent with results from a study by Capra et al. (Capra et al., 2013) comparing the performance of social cards to text snippets in search results.
We also found that user interactions differ between surrogate types, with social cards having the fewest participants interact and a combination of social card side-by-side with thumbnail encouraging the most participants to interact. Because participants also appear to hover and click more when thumbnails are present, we postulate that users engage more with browser thumbnails than other surrogate elements, possibly to zoom in and see details.
For collection summarization, the overall goal of surrogates is to convey aboutness without requiring the user to click on the underlying link. In this case, social cards appear to require less interaction, provide higher accuracy, and allow the users to answer our question in less time. These results are encouraging for users of social cards. Social cards require fewer resources to generate and store than thumbnails. Archive-It surrogates require humans to construct metadata, but social cards can be generated dynamically from existing web page content. Users also appear to interact with social cards less, possibly indicating that they find them easier to use. These features indicate that social cards may be the best surrogate for use in summarizing web archive collections, displaying stories on live web curation platforms, viewing saved items in bookmarking applications, sharing on social media, and beyond.
Acknowledgements.This work has been supported in part by the Institute of Museum and Library Services (LG-71-15-0077-15).
- Al Maqbali et al. (2010) Hilal Al Maqbali, Falk Scholer, James Thom, and Mingfang Wu. 2010. Evaluating the Effectiveness of Visual Summaries for Web Search. In ACDS ’10. 1–8. http://www.cs.rmit.edu.au/adcs2010/proceedings/pdf/paper%2013.pdf
- Alkwai (2019) Lulwah Alkwai. 2019. Expanding the Usage of Web Archives by Recommending Archived Webpages Using Only the URI. Ph.D. Dissertation. Old Dominion University.
- AlNoamany et al. (2017a) Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2017a. Generating Stories From Archived Collections. In ACM WebSci 2017. 309–318. https://doi.org/10.1145/3091478.3091508
- AlNoamany et al. (2017b) Yasmin AlNoamany, Michele C. Weigle, Michael L. Nelson, and Shawn Jones. 2017b. Dataset: Human Stories Work for evaluating the DSA Framework. (2017). https://doi.org/10.6084/m9.figshare.5701054
- AlSum and Nelson (2014) Ahmed AlSum and Michael L. Nelson. 2014. Thumbnail Summarization Techniques for Web Archives. In ECIR ’14. 299–310.
- Anderson et al. (2001) Lorin W Anderson, David R Krathwohl, Peter W Airasian, Kathleen A Cruikshank, Richard E Mayer, Paul R Pintrich, James Raths, and Merlin C Wittrock. 2001. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives, abridged edition. Longman.
- Apps (2013) Ann Apps. 2013. Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata. http://dublincore.org/documents/dc-citation-guidelines/.
- Aula et al. (2010) Anne Aula, Rehan M. Khan, Zhiwei Guan, Paul Fontes, and Peter Hong. 2010. A comparison of visual and textual page previews in judging the helpfulness of web pages. In WWW ’10. 51–60. https://doi.org/10.1145/1772690.1772697
- Bartneck et al. (2015) Christoph Bartneck, Andreas Duenser, Elena Moltchanova, and Karolina Zawieska. 2015. Comparing the Similarity of Responses Received from Studies in Amazon’s Mechanical Turk to Studies Conducted Online and with Direct Recruitment. PLOS ONE 10, 4 (2015), 1–23. https://doi.org/10.1371/journal.pone.0121595
- Bloom et al. (1956) Benjamin S Bloom, David R Krathwohl, and Bertram S Masia. 1956. Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. David McKay Company.
- Bota et al. (2016) Horatiu Bota, Ke Zhou, and Joemon M. Jose. 2016. Playing Your Cards Right: The Effect of Entity Cards on Search Behaviour and Workload. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval (CHIIR ’16). ACM, Carrboro, North Carolina, USA, 131–140. https://doi.org/10.1145/2854946.2854967
- Capra et al. (2013) Robert Capra, Jaime Arguello, and Falk Scholer. 2013. Augmenting Web Search Surrogates With Images. In CIKM ’13. 399–408. https://doi.org/10.1145/2505515.2505714
- Curty and Zhang (2011) Renata Gonçalves Curty and Ping Zhang. 2011. Social commerce: Looking back and forward. ASIS&T ’11 48, 1, 1–10. https://doi.org/10.1002/meet.2011.14504801096
- Dziadosz and Chandrasekar (2002) Susan Dziadosz and Raman Chandrasekar. 2002. Do Thumbnail Previews Help Users Make Better Relevance Decisions about Web Search Results?. In SIGIR ’02. 365–366. https://doi.org/10.1145/564376.564446
- Hafner and Palmer (2017) Katie Hafner and Griffin Palmer. 2017. Skin Cancers Rise, Along With Questionable Treatments. https://www.nytimes.com/2017/11/20/health/dermatology-skin-cancer.html. The New York Times (2017).
- Heer and Bostock (2010) Jeffrey Heer and Michael Bostock. 2010. Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design. In SIGCHI ’10. 203–212. https://doi.org/10.1145/1753326.1753357
- Jackson (2019) Andrew Jackson. 2019. Personal Communication with Andy Jackson of the UKWA.
- Jones et al. (2018a) Shawn Jones, Alexander Nwala, Michele Weigle, and Michael Nelson. 2018a. The Many Shapes of Archive-It. In iPres ’18. https://doi.org/10.17605/OSF.IO/EV42P
- Jones et al. (2018b) Shawn Jones, Michele Weigle, and Michael Nelson. 2018b. The Off-Topic Memento Toolkit. In iPres ’18. https://doi.org/10.17605/OSF.IO/UBW87
- Jones (2017) Shawn M. Jones. 2017. Storify Will Be Gone Soon, So How Do We Preserve The Stories? https://ws-dl.blogspot.com/2017/12/2017-12-14-storify-will-be-gone-soon-so.html.
- Jones (2018a) Shawn M. Jones. 2018a. Extracting Metadata from Archive-It Collections with Archive-It Utilities. http://ws-dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html.
- Jones (2018b) Shawn M. Jones. 2018b. A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages. http://ws-dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html.
- Kaasten and Greenberg (2002) Shaun Kaasten and Saul Greenberg. 2002. How People Recognize Previously Seen Web Pages from Titles, URLs and Thumbnails. People and Computers XVI - Memorable Yet Invisible (2002), 247–265. https://doi.org/10.1007/978-1-4471-0105-5_15
- Kelly (2007) Diane Kelly. 2007. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval 3, 1—2 (2007), 1–224. https://doi.org/10.1561/1500000012
- Kelly et al. (2015) Diane Kelly, Jaime Arguello, Ashlee Edwards, and Wan-ching Wu. 2015. Development and Evaluation of Search Tasks for IIR Experiments using a Cognitive Complexity Framework. In ICTIR ’15. 101–110. https://doi.org/10.1145/2808194.2809465
- Kittur et al. (2008a) Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008a. Crowdsourcing User Studies With Mechanical Turk. In CHI ’08. 453–456. https://doi.org/10.1145/1357054.1357127
- Kittur et al. (2008b) Aniket Kittur, Bongwon Suh, and Ed H. Chi. 2008b. Can You Ever Trust a Wiki?: Impacting Perceived Trustworthiness in Wikipedia. In CSCW ’08. 477. https://doi.org/10.1145/1460563.1460639
- Kopetzky and Mühlhäuser (1999) Theodorich Kopetzky and Max Mühlhäuser. 1999. Visual preview for link traversal on the World Wide Web. Computer Networks 31, 11-16 (1999), 1525–1532. https://doi.org/10.1016/S1389-1286(99)00050-X
- Kosara and Ziemkiewicz (2010) Robert Kosara and Caroline Ziemkiewicz. 2010. Do Mechanical Turks dream of square pie charts?. In BELIV’10. 63–70. https://doi.org/10.1145/2110192.2110202
- Loumakis et al. (2011) Faidon Loumakis, Simone Stumpf, and David Grayson. 2011. This Image Smells Good: Effects of Image Information Scent in Search Engine Results Pages. In CIKM ’11. 475–484. https://doi.org/10.1145/2063576.2063649
Manning et al. (2014)
Christopher D. Manning,
Mihai Surdeanu, John Bauer,
Jenny Finkel, Steven J. Bethard, and
David McClosky. 2014.
The Stanford CoreNLP Natural Language Processing Toolkit. InACL ’14. 55–60. http://www.aclweb.org/anthology/P/P14/P14-5010
- Micallef et al. (2012) Luana Micallef, Pierre Dragicevic, and Jean-Daniel Fekete. 2012. Assessing the Effect of Visualizations on Bayesian Reasoning through Crowdsourcing. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2536–2545. https://doi.org/10.1109/TVCG.2012.199
- Milligan (2019) Ian Milligan. 2019. History in the Age of Abundance? McGill-Queen’s University Press.
- Nwala (2015) Alexander Nwala. 2015. What Did It Look Like? http://ws-dl.blogspot.com/2015/01/2015-02-05-what-did-it-look-like.html.
- Praetzellis (2016) Maria Praetzellis. 2016. Add and edit metadata at the document level. https://support.archive-it.org/hc/en-us/articles/208012676-How-to-add-and-edit-metadata-at-the-document-level.
- Storify (2017) Storify. 2017. Storify End-of-Life. https://storify.com/faq-eol.
- Teevan et al. (2009) Jaime Teevan, Edward Cutrell, Danyel Fisher, Steven M. Drucker, Gonzalo Ramos, Paul André, and Chang Hu. 2009. Visual snippets: summarizing web pages for search and revisitation. In SIGCHI ’09. 2023–2032. https://doi.org/10.1145/1518701.1519008
- Van de Sompel et al. (2013) Herbert Van de Sompel, Michael L. Nelson, and Robert Sanderson. 2013. RFC 7089: HTTP Framework for Time-Based Access to Resource States – Memento. https://tools.ietf.org/html/rfc7089.
- Weigle and Nelson (2017) Michele C. Weigle and Michael L. Nelson. 2017. Visualizing Webpage Changes Over Time - new NEH Digital Humanities Advancement Grant. http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html.
- Woodruff et al. (2001) Allison Woodruff, Andrew Faulring, Ruth Rosenholtz, Julie Morrsion, and Peter Pirolli. 2001. Using Thumbnails to Search the Web. In SIGCHI ’01. 198–205. https://doi.org/10.1145/365024.365098
- Yu et al. (2013) Bei Yu, Matt Willis, Peiyuan Sun, and Jun Wang. 2013. Crowdsourcing Participatory Evaluation of Medical Pictograms Using Amazon Mechanical Turk. Journal of Medical Internet Research 15, 6 (2013), e108. https://doi.org/10.2196/jmir.2513
Appendix A The Current Collection Understanding Process for Archive-It Collections
Understanding an Archive-It collection is an iterative, tedious process. Figures 20 through 27 provide the steps necessary to manually achieve collection understanding for an Archive-It collection. To begin at step 1 (Figure 20), a user must first have a query. As seen in the search results from the screenshot in Figure 20, not all collections contain metadata, thus we are often left with their collection title to make a decision. In step 2 (Figure 21), we choose a collection from the list that we think will meet our information need. In step 3 (Figure 22), we view the collection, navigating through its seeds and choosing one in step 4 (Figure 23). Note how not all seeds have metadata. Once we have chosen a seed, we view its mementos in step 5 (Figure 24) and choose one. Note how this interface provides the dates for each memento, but no other information. In this example there are 923 mementos for this seed and this seed was one of 1,149 seeds in this collection. This first memento, however, is just the start of this crawl and other mementos were captured that were linked from that page, hence in Step 7 (Figure 26), we review the linked pages until we reach an Archive-It error page indicating that the linked page that was not crawled. At this point, we understand the contents of a single crawl of a single seed of a single collection. To understand the rest of the collection, we must review other seeds and their mementos, and their linked mementos (Figure 27).
If this collection meets our information needs then we just need to iterate from the seed level. If this collection is not meeting our information need, then we have two options. We can restart at step 2 by choosing one of the other 17 collections that matched our search term. The first two collections in the list have 95 and 331 seeds to review, respectively. Alternatively, if we believe that our search terms are not successful, then we must restart at step 1 by reformulating our query.
Appendix B Screenshots Of Pages Shown to Study Participants
The following sections display the task instructions, questions, and an example completion code page shown to the study participants. Due to space limitations, we were unable to include screenshots of the stories themselves.