SHARI – An Integration of Tools to Visualize the Story of the Day

08/01/2020 ∙ by Shawn M. Jones, et al. ∙ 0

Tools such as Google News and Flipboard exist to convey daily news, but what about the past? In this paper, we describe how to combine several existing tools with web archive holdings to perform news analysis and visualization of the "biggest story" for a given date. StoryGraph clusters news articles together to identify a common news story. Hypercane leverages ArchiveNow to store URLs produced by StoryGraph in web archives. Hypercane analyzes these URLs to identify the most common terms, entities, and highest quality images for social media storytelling. Raintale then uses the output of these tools to produce a visualization of the news story for a given day. We name this process SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 6

page 10

page 12

page 16

page 17

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

    {
    config”: ”/files/config/polar-media-consensus-graph/f6e84be9969ecef7adb20689002608d0/”,
    connected-comps”: [
      {
        avg-degree”: 4.318181818181818,
        density”: 0.10042283298097252,
        node-details”: {
          annotation”: polarity”,
          color”: green”,
          connected-comp-type”: event
        },
        nodes”: [
          0,
          1,
   additional node ids omitted for brevity 
        ],
        unique-source-count”: 14
      },
      {
        avg-degree”: 1,
        density”: 1,
        node-details”: {
          annotation”: polarity”,
          color”: red”,
          connected-comp-type”: cluster
        },
        nodes”: [
          9,
          67
        ],
        unique-source-count”: 2
      }
    ],
    links”: [
      {
        rank”: 1,
        sim”: 0.57,
        source”: 2,
        target”: 21,
        label”: ”1 (0.57)”,
        label-description”: rank (sim)”
      },
   additional link definitions omitted for brevity 
      {
        rank”: 96,
        sim”: 0.3,
        source”: 53,
        target”: 73,
        label”: ”96 (0.3)”,
        label-description”: rank (sim)”
      }
    ],
    ner-version”: ”3.8.0”,
    nodes”: [
   other nodes omitted for brevity 
      {
        entities”: [
          {
            class”: LOCATION”,
            entity”: Coney Island
          },
          {
            class”: LOCATION”,
            entity”: Brooklyn
          },
          {
            class”: PERSON”,
            entity”: Victor J. Blue
          },
  
        ],
        extraction-time”: ”2020-03-23T00:09:10.325362”,
        favicon”: https://www.nytimes.com/vi-assets/static-assets/favicon-4bf96cb6a1093748bf5b3c429accb9b4.ico”,
        id”: nytimes.com-1”,
        link”: https://www.nytimes.com/2020/03/22/health/coronavirus-restrictions-us.html”,
        node-details”: {
          annotation”: polarity”,
          color”: blue”,
          connected-comp-type”: event”,
          type”: left
        },
        published”: Sun, 22 Mar 2020 22:00:52 +0000”,
        rss-uri-m”: https://web.archive.org/web/20200323000609id_/https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml”,
        text”: Health |Harsh Steps Are Needed to Stop the Coronavirus, Experts Say\nhttps://nyti.ms/3dkfoCc\nA beach stroller in the Coney Island neighborhood of Brooklyn on Saturday.CreditVictor J. Blue for The New York Times\nHarsh Steps Are Needed to Stop the Coronavirus, Experts Say\nScientists who have fought pandemics describe difficult measures needed to defend the United States against a fast-moving pathogen.\nA beach stroller in the Coney Island neighborhood of Brooklyn on Saturday.CreditVictor J. Blue for The New York Times\nSupported by\nBy Donald G. McNeil Jr.\nMarch 22, 2020, 6:00 p.m. ET\nTerrifying though the coronavirus may be, it can be turned back. China, South Korea, Singapore and Taiwan have demonstrated that, with furious efforts, the contagion can be brought to heel.\nWhether they can keep it suppressed remains to be seen…”,
        title”: Harsh Steps Are Needed to Stop the Coronavirus, Experts Say - The New York Times
      },
   other articles omitted for brevity 
    ],
    self”: http://storygraph.cs.odu.edu/graphs/polar-media-consensus-graph/#cursor=0&hist=1440&t=2020-03-23T00:09:10”,
    timestamp”: ”2020-03-23T00:09:10.707796Z”,
    graph-pointer”: {
      cursor”: 0,
      hist”: 1440,
      cur-path”: ”2020/03/23”
    }
  }
Figure 1: An abridged version of the JSON file generated by StoryGraph that drives the visualization in Figure 2.
Figure 2: The StoryGraph news similarity graph for March 23, 2020.
URL:http://storygraph.cs.odu.edu/graphs/polar-media-consensus-graph/#cursor=0&hist=1440&t=2020-03-23T00:09:10
Figure 3: The “biggest news story” of March 23, 2020 produced by the SHARI process.
URL:https://oduwsdl.github.io/dsa-puddles/stories/shari/2020/03/23/storygraph_biggest_story_2020-03-23/
Figure 4: Annotations detail which SHARI components provide each part of the visualization shown in Figure 3.
Figure 5: SHARI process for creating a visualization of the biggest news story for a given day

Tools such as Google News and Flipboard exist to convey daily news, but what about the news of the past? We have combined StoryGraph111http://storygraph.cs.odu.edu with tools from the Dark and Stormy Archives Toolkit222https://oduwsdl.github.io/dsa/software.html to produce the StoryGraph Hypercane ArchiveNow Raintale Integration (SHARI) process. These tools represent disparate research efforts in news analysis, corpus summarization, web archiving, and visualization. The integration produces a summary of the “biggest story” for a given date. SHARI combines the following components from Old Dominion University’s Web Science and Digital Libraries Research Group333https://ws-dl.cs.odu.edu:

Nwala et al. nwala_bootstrapping_2018 ; nwala_scraping_2018 have focused on finding seeds within search engine result pages (SERPs), social media stories, and news feeds. As part of this research, Nwala et al. also developed StoryGraph nwala_365_dots_2020 , a service that saves RSS feeds from 17 news sources (Table 1 in Appendix A) every ten minutes. With these RSS feeds, StoryGraph analyzes the lexical connections between articles across feeds to generate JSON output, which drives a graph visualization. Figure 1 displays some of this JSON output for March 23, 2020. StoryGraph then visualizes this output, as shown in Figure 2.

Collections on specific topics exist at various web archives jones_many_2018 . AlNoamany et al. alnoamany_generating_2017 introduced how to use social media storytelling to summarize web archive collections. Klein et al. 10.1145/3201064.3201085 have built collections from web archives by conducting focused crawls. Jones developed Hypercane jones_hypercane_2020 to intelligently sample mementos from larger collections. Jones also developed Raintale jones_raintale_2019 for generating social media stories to summarize groups of mementos, providing visualizations that employ familiar techniques, like cards, that require no training for most users to understand.

The JSON data structure from Figure 1 provides all information gathered but is difficult for humans to understand at a glance. The graph shown in Figure 2 provides an overview of the JSON through favicons and edges, but a user requires some training to fully comprehend what it represents. Figure 3 displays the largest connected component from this graph visualized via the SHARI process. Through images, text snippets, titles, cards, domain names, favicons, and other content, the SHARI output allows the viewer to intuitively understand that the biggest news story for this date consists of different reactions to the growing COVID-19 pandemic.

2 The SHARI process

The StoryGraph Hypercane ArchiveNow Raintale Integration (SHARI) jones_shari_2020 process automatically creates stories summarizing news for a day. Figure 4 details what each tool contributes to the story. Figure 5 shows the steps of the SHARI process.

Figure 6: SHARI steps 1-2 illustrated with a single URI-R from the story shown in Figure 3. Here SHARI extracts the URI-R from StoryGraph and then creates a corresponding URI-M with ArchiveNow.
Figure 7: SHARI step 3 reproting entities from the URI-M generated in Figure 6
Figure 8: SHARI step 4 reporting sumgrams from the URI-M generated in Figure 6
Figure 9: SHARI step 5 reporting a image metrics from the URI-M generated in Figure 6
Figure 10: SHARI step 6 orders all mementos first by publication date, then memento-datetime.
Figure 11: SHARI step 7 combines all data into a JSON format used by Raintale for storytelling.
Figure 12: SHARI step 8 feeds the JSON file from Step 7 and a template file into Raintale to generate the story. Raintale queries MementoEmbed for information about each memento.
Figure 13: The dsa_tweeter bot announces the availability of new SHARI stories each day.
  1. With the StoryGraph Toolkit, we query the StoryGraph service for the URI-Rs belonging to the biggest story of the day.

  2. Hypercane converts these URI-Rs to URI-Ms by first attempting to find a corresponding URI-M by querying the LANL Memento Aggregator444https://timetravel.mementoweb.org via the Memento Protocol van_de_sompel_rfc_2013 . For each URI-M that does not have a memento, Hypercane creates a memento by calling ArchiveNow aturban_archivenow:_2018 (Figure 6).

  3. Hypercane runs the mementos through spaCy555https://spacy.io/ to generate a list of named entities, sorted by frequency (Figure 7).

  4. Hypercane runs the mementos through sumgram nwala_sumgram_2019 and generates a list of sumgrams, sorted by frequency (Figure 8).

  5. Hypercane scores all of the mementos’ embedded images. Images that article authors reference in HTML META tags are favored first, followed by MementoEmbed jones_preview_2018 score, then pixel size, color count, the ratio of width to height, and finally position on the page (Figure 9).

  6. Hypercane runs the mementos through newspaper3k666https://newspaper.readthedocs.io/en/latest/ to extract each article’s publication date and orders the URI-Ms by that date (Figure 10) .

  7. Hypercane consolidates the entities, terms, image scores, and ordered URI-Ms into a JSON file containing the structured data for the summary. During this step, Hypercane uses the highest scoring image as the striking image for the summary (Figure 11). In Figure 4, the highest-ranking image is the UK Prime Minister addressing his country about the COVID-19 pandemic.

  8. Raintale renders the output as Jekyll HTML based on the contents of this JSON file, a template file, and information on each memento provided by MementoEmbed (Figure 11).

  9. The SHARI script publishes the summary story to GitHub Pages for distribution. Figure 13 shows the output of our dsa_tweeter bot which announces the story after publication through the @StormyArchives Twitter account.

3 Discussion

StoryGraph is a valuable resource with additional unrealized potential. We are not only able to create stories for today or yesterday but any date back to August 8, 2017, when Nwala launched StoryGraph. As seen in Figures 14, 15, and 16 we can see how the world has evolved each year on StoryGraph’s launch date. In Figure 14, the biggest news story was that of North Korea threatening other nations with nuclear weapons. One year later, in Figure 15, we see that the biggest news story is the results of several United States Congressional and gubernatorial primaries. Two years after StoryGraph’s launch, Figure 16 shows that the biggest news story is the aftermath of the 2019 shootings in El Paso and Dayton.

Figure 14: SHARI output for August 8, 2017 - the launch date of StoryGraph
URL: https://oduwsdl.github.io/dsa-puddles/stories/shari/2017/08/08/storygraph_biggest_story_2017-08-08/
Figure 15: SHARI output for August 8, 2018 - a year after the launch date of StoryGraph
URL: https://oduwsdl.github.io/dsa-puddles/stories/shari/2018/08/08/storygraph_biggest_story_2018-08-08/
Figure 16: SHARI output for August 8, 2019 - two years after the launch date of StoryGraph
URL: https://oduwsdl.github.io/dsa-puddles/stories/shari/2019/08/08/storygraph_biggest_story_2019-08-08/

4 Summary and Future Work

SHARI produces a familiar yet novel method of viewing news for a given day. SHARI can create stories for today, yesterday, and back to StoryGraph’s creation on August 8, 2017. It is different from other storytelling services like Wakelet777https://wakelet.com/ because SHARI is entirely automated. The stories produced by SHARI are different from services like Google News888https://news.google.com/ or Flipboard999https://flipboard.com/ because those tools focus on current events and personalized topics. Because StoryGraph samples content from multiple sides of the political spectrum, the SHARI process can provide a visualization of articles not tied to one interest area or even a single side’s terminology. This process works because each component is loosely coupled, has high cohesion, has explicit interfaces, and engages in information hiding. Each command passes data in the expected format to the next.

We are also exploring how to improve striking image selection for stories. One could use this to consider how the same story is told in different venues. For instance, one could ask StoryGraph only to include left-leaning sources and produce a SHARI story. One could then do the same for only the right-leaning sources. With both stories, one could compare the striking images and sumgrams that SHARI produces. We are investigating how to produce and render other news stories for a given day and any given period of time. Finally, we are examining how to best visualize significant events that span substantial periods of time, like the entire COVID-19 news story. Though StoryGraph is an existing service that gathers current news, we also want to apply its algorithm directly to mementos and tell the news stories of past events like the Hurricane Katrina disaster.

5 Acknowledgements

This work supported in part by the Institute of Museum and Library Services (LG-71-15-0077-15).

References

6 Appendix A: StoryGraph News Sources

News Source Feed URL US Political Polarity
Politicus USA http://www.politicususa.com/feed Left
Vox https://www.vox.com/rss/index.xml Left
Huffington Post http://www.huffingtonpost.com/section/front-page/feed Left
MSNBC http://www.msnbc.com/feeds/latest Left
New York Times http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml Left
Washington Post http://feeds.washingtonpost.com/rss/politics Left
CNN http://rss.cnn.com/rss/cnn_topstories.rss Center
Politico http://www.politico.com/rss/politics.xml Center
ABC News http://abcnews.go.com/abcnews/topstories Center
The Hill http://thehill.com/rss/syndicator/19109 Center
Real Clear Politics http://feeds.feedburner.com/realclearpolitics/qlMj Center
Washington Examiner http://www.washingtonexaminer.com/rss/news Right
Fox News http://feeds.foxnews.com/foxnews/latest Right
Daily Caller http://feeds.feedburner.com/dailycaller Right
Conservative Tribune http://conservativetribune.com/feed/ Right
Breitbart http://feeds.feedburner.com/breitbart Right
The Gateway Pundit http://www.thegatewaypundit.com/feed/ Right
Table 1: The 17 news sources analyzed by StoryGraph