1 Design decisions
With a large dataset of messages from a social media platform, we first build a taxonomy for classifying these messages. For this particular earthquake dataset, we develop two main categories, which are “event” and “resource”. Sub-categories for “event” are: earthquake, ground damage, flooding, aftershock, and fire; sub-categories for “resources” are: water, energy, medical, shelter, transportation, and food. Each sub-category contains a set of keywords to determine if a message belong to it or not. One message can belong to more than one category, e.g., a message can indicate that there are needs for both water and food. The reasoning for this assumption comes from the possibility that the meaning of a message can be ambiguous, hence we need to take all data into account.
For time-series data, the stacked area chart is frequently utilized for visualizing changes of data over time, for multiple categories. We utilize this chart type to represent the volume of classified messages and provide an overall view of the condition. To explore detail, a sliding window on top of the area chart is provided for brushing.
In terms of exploring specific topics to characterize the condition, there are many approaches to handle text data. We utilize WordStream [wordstream], a technique for visualizing evolution of topic over time. Besides the overall or global trending of terms, this technique represents the local evolution of an individual topic. Similar approach could also be found from IoTNegViz [iotnegviz]. Regarding community, a network of users represents the existing clusters, based on the relationships among users: the source is the sender and target(s) are the users that are mentioned in the message. This strategy is used to detect who are the users that have multiple connections in the community; hence, they are likely to spread the important messages. The location and ranking of users by amount of content are provided with a map and a horizontal bar chart.
2 System model
Figure EQSA: Earthquake Situational Analytics from Social Media shows the main components of our system designed for the VAST Challenge 2019: Mini-Challenge 3 dataset111https://vast-challenge.github.io/2019/MC3.html including a stacked area chart, a WordStream, a geographical map, a network of users and a horizontal bar chart. Detail-on-demand is provided with tooltip and applied for the WordStream, network, and bar chart.
The analytics application includes active and interactive components that can work with dynamic data. The main control panel (A) is built as a stacked area chart, showing the volume of messages according to chosen categories from selection panel (A1). The vertical axis shows the number of posts, while the horizontal axis shows the timeline. A sliding window across the timeline has expandable width from 1 hour to 31 hours. For each change in main panel (A) – whether it is adjusting time frame, timestamp or category, all other four panels (B to E) are updated according to (A).
Panel (B) provides a WordStream, demonstrating the evolution of topics extracted from the messages’ content. The WordStream consists of two topics: the keywords within the content of messages and the location of the message. The thickness of the stream is proportional to the number of posts – the global trend. Users can also explore the local trend of an individual term and detail of messages.
Panel (C) is a geographical map, in which the color of each neighborhood indicates the number of posts. User can use this map for highlighting corresponding messages from a location in the WordStream and vice versa.
Panel (D) is a network of user interaction. The network demonstrates the connection between users, through the account mentioning in the content of messages. Via this network, we can spot which one is the account that has an essential role in the community.
Panel (E) is an account list for ranking content creators. This chart shows the accounts that create the largest number of posts. Users can see who writes many posts and their connection to the community; exploring these points can help detect irrelevant accounts.
The demonstration link, explanatory video of EQSA, and report for the challenge of VAST Challenge 2019: Mini-Challenge 3 can be found at our GitHub page222https://idatavisualizationlab.github.io/VAST2019mc3/.
4 Analyzing VAST Challenge 2019: MC3 dataset
Figure 1 shows the visualization with data from five hours after the third strike. The sliding window in the stacked area chart has a width of 6 hours. Inside the window, the brown stream - transportation has two major regions, one arise from right when the earthquake strikes and one towards the end of the window, showing that issues with transportation occurred within this time, we can come up with the observation: The problem happens and then it is addressed.
Three blue boxes are highlighted, from 1) The WordStream, 2) The network and 3) The ranking of content creators. In the WordStream, “bridge”, “one-lane” and “routes” are emphasized, support the observation from the main panel. Moving on to the network, the node of “dot-sthimark”, the DoT - Department of transportation of St. Himark, is in large size - an important component of the community at this time. This means that the community is connecting to the department for information. On the ranking bar chart, the department provides 25 notifications within 5 hours, a large number comparing to the next ranked account (only 3). In the map, the neighborhood of Downtown is in the darkest shade - with biggest number of posts, because the DoT is located in Downtown. By exploring the detail messages from the department, we discover that the bridges to get in and out of St.Himark are closed and then re-opened for safety inspection. This confirmation helps validate the observation mentioned above.
5 Conclusions and future works
This paper proposes an interactive exploratory visualization to analyze earthquake and characterize the condition from social media posts. The general view and brushing/linking features support users to explore and discover the patterns and relating events. The application of this technique to VAST Challenge 2019: Mini-Challenge 3 dataset also confirms the design decisions of multiple linked views help users to convey the messages from the perspectives of community, location and volume of related posts. Future work will focus on optimization of the overall layout and scalability to big data.