Visual Analytics of Anomalous User Behaviors: A Survey

05/14/2019 ∙ by Yang Shi, et al. ∙ 0

The increasing accessibility of data provides substantial opportunities for understanding user behaviors. Unearthing anomalies in user behaviors is of particular importance as it helps signal harmful incidents such as network intrusions, terrorist activities, and financial frauds. Many visual analytics methods have been proposed to help understand user behavior-related data in various application domains. In this work, we survey the state of art in visual analytics of anomalous user behaviors and classify them into four categories including social interaction, travel, network communication, and transaction. We further examine the research works in each category in terms of data types, anomaly detection techniques, and visualization techniques, and interaction methods. Finally, we discuss the findings and potential research directions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 6

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The increasing accessibility of data collected from various sources provides potential opportunities for understanding user behaviors. Identifying anomalies in user behaviors is of particular interest in many application domains such as cybersecurity, urban planning, and social media. For instance, detecting rumors and tracking their spreading patterns alert people to the risks of being influenced by misinformation, which is especially critical in political elections.

Detecting anomalous user behaviors is a challenging task as the boundary between abnormal and normal data cannot be clearly defined. Even equipped with domain knowledge, analysts may find results of automatic machine learning approaches lack contextual information to support decision-making, e.g., analysts are limited to exploring who did what when and where, why (5W’s) and how. To address the issue, visualization integrates human knowledge into information processing tasks. It presents anomalous patterns intuitively to decision makers as well as involves a human-machine dialog as they interact with the data set. Our work aims to summarize the-state-of-art in visual analytics of anomalous user behaviors, with the purpose of highlighting current research trends as well as future directions.

In this survey, we contribute a taxonomy of visual analytics of anomalous user behaviors. The overview of the analytical pipeline is summarized in Figure 1.

  • [leftmargin=*]

  • We categorize four user behaviors, including social interaction, travel, network communication, and transaction based on the data collected from specific data sources. We extract four common data types from these four behaviors, including text, network, spatiotemporal information, and multidimensional data.

  • We review how research works use visualization techniques combined with interaction methods to analyze anomalous user behaviors. We extract six visualization techniques, including sequence visualization, graph visualization, text visualization, geographic visualization, chart visualization, and glyph visualization. We also summarize six interaction methods, including tracking & monitoring, exploration & navigation, pattern discovery, knowledge externalization, and refinement & identification.

The remaining survey is organized as follows. First, we describe related surveys in Section II. Then, we present the taxonomy, methodology, and taxonomy used in this survey in Section III. Section IV, V, VI, and VII analyze the four user behaviors respectively using the taxonomies explained in Section III. Analysis of each behavior follows the general visual analytics pipeline. We start with identifying data types and anomaly detection techniques, visualization techniques and interaction methods are then discussed. Finally, we discuss findings and trends acquired from surveying papers in Section VIII and conclude our work in Section IX.

Ii Related Surveys

In this section, we discuss related surveys for visual anomalous user behaviors analysis. There are survey papers in the literature that focus on analyzing user behaviors. Jin et al.  [2] categorize user behaviors in online social network into four types including connectivity and interaction, traffic activity, mobile social behavior, and malicious behavior. Jiang et al.  [3] classify anomalous behaviors when using web applications (e.g., Hotmial, Facebook, Amazon) into four categories: traditional spam, fake reviews, social spam, and link farming. Surveys regarding visualization of user behaviors data explore application domains such as urban computing [4], social media [5, 6], financial domain [7], and network security [8, 9]. In the field of anomaly detection, Chandola et al.  introduce categories of anomaly detection (AD) techniques [1]. [10] and [11] examine techniques used in intrusion detection systems and for detecting graph-based anomalies, respectively. Recent work of Chalapathy and Chawla [12]

present a structured overview of research approaches in deep learning-based anomaly detection. Our survey covers a wider range of application domains than existing surveys. To the best of our knowledge, it is the first survey that explores anomalous user behaviors from a perspective of visual analytics.

Iii Terminology, Methodology, and Taxonomy

In this section, we first explain the terminology used in this survey and describe our methodology of selecting papers suitable for the topic of the survey. Next, we introduce the taxonomy of anomalous user behaviors regarding common data types, anomaly detection techniques, visualization, and interaction methods.

Iii-a Terminology

The survey aims to summarize visualization works that focus on anomalous user behaviors. Here, user behaviors can be derived directly and indirectly from user actions. For example, posting a tweet is a behavior directly related to user actions while a cyber-attack is conducted by nodes in networks but indirectly manipulated by the perpetrator. Investigation of user behavior focuses on tracking, collecting, and assessing patterns caused by users’ as opposed to information of devices and events [13, 14]. Analyzing and identifying anomalous user behaviors uses anomaly detection techniques. According to Chandola et al.  [1], anomalies are “patterns in data that do not conform to a well-defined notion of normal behavior”. As we collect research works from a diverse set of domains such as social media, finance, and cybersecurity, the scope of anomaly detection in our survey is broader than the scope identified in specific domains. For example, e.g., Chen et al.  [5] identify data outside normal ranges of attributes as anomalies in social media while in the field of cybersecurity, anomalies refer to malware, insider threats, and targeted attacks [13, 14]. In our work, anomalies refer to frauds, spam, intrusion, sudden increases in the volume of data, and periodic patterns of users, etc. In short, as long as results detected express “interestingness of real-life relevance” [1], we claim that the visualization works are within the scope of anomaly detection.

Iii-B Methodology

Our interested range of publications is constrained by three conditions: user behaviors, anomaly detection, and visual analytics/visualization. We started from a core set of relevant research works known to us in advance, and followed references from “Related Work” as well as papers that cite the previously identified papers. We also conducted a keyword search for papers published in visualization conferences or journals. Examples of keywords are “anomaly, anomalous, outlier, abnormal, unusual” and “rare”. The research papers were checked to affirm that they are indeed associated with the concept of anomaly in 

[1]. The association with user behaviors was expected to be seen in Case Study section in publications. During the process of investigating research works, we found that the range of pertinent papers is relatively narrow. To solve the potential shortage in the number of references, our survey range covered publications that incorporate anomaly detection as one of their visual analytic approaches in addition to those that solely address the issue of anomaly detection, e.g., we include [15] in our collection through the authors’ ultimate goal is predictive analysis of event evolution.

We also keep our exploration spectrum balanced in terms of application domains. We noticed the number of publications related to travel and network communication

outnumber others. The outnumbering of travel probably results from the early history of visualizing spatiotemporal data (in 1869 Charles Minard produced a map to illustrate Napoleon’s March to Moscow) and continuous study ever since. As for cybersecurity, the establishment of a conference for visualization of cybersecurity,

IEEE Symposium on Visualization for Cyber Security (VizSec), encourages researchers to devote efforts in this field. As such, we allocated more time to searching for research works of other user behaviors comparatively. We are hoping to capture possibly interesting relationships across user behaviors by maintaining a broad scope of investigation.

Iii-C Taxonomy

Based on a literature review of more than 150 papers that relevant to visual analytics of anomalous user behaviors, we summarize four user behaviors including social interaction, travel, network communication, and transaction. For each of the four user behaviors, we attempt to identify common data types, anomaly detection techniques, visualization, and interaction methods. The different categories are highlighted in the overall pipeline of visual analytics in Figure 1. The selected papers are summarized in Table 2, with color indicates each category.

Fig. 2: The selected papers regarding visualization and visual analytics of anomalous behaviors. DTs: text, network, spatiotemporal information, and multidimensional data. ADs: classification-based, nearest neighbor-based, clustering-based, statistical, information theoretic, and spectral anomaly detection techniques. VTs: sequence, graph, text, geographic, chart, and glyph visualizations. ITs: tracking & monitoring, exploration & navigation, knowledge externalization, pattern discovery, and refinement & identification.

User Behaviors. User behaviors are seen in a variety of application domains. Based on the data collected form specific data sources, we classify user behaviors into four categories: social interaction, travel, network communication, and transaction. Social interaction describes the communication of ideas and thoughts between people. Its data is collected from publicly accessible social platforms or private telecommunication platforms. Travel is the physical movement of users between places containing geographic information. Its data is collected from Global Positioning System (GPS), mobile phones and base stations, etc. Network communication is sending and receiving information between machines via networks. Its data is collected from server logs. Transaction refers to monetary flows in buying and selling, whose data is collected from system logs.

We also categorize anomalous user behaviors into egocentric and collective behaviors. The categorization is inspired by the concepts of point and collective anomalies [1]. Note that our survey focuses on the investigation of anomalous user behaviors which constitutes a subset of anomalies. Egocentric behavior refers to the user behavior that distinguishes itself from the rest of data in anomaly detection. Collective behavior is a set of user behaviors that appear anomalous. When analyzing separate user behaviors categorized into collective behavior, they may appear normal on an individual basis. As egocentric and collective behaviors emphasize different aspects, specific visualization designs should be introduced. It will be discussed when analyzing visualization techniques in the following sections.

Data Types. A variety of data can be extracted from user behaviors across different domains. By analyzing multiple attributes of these data, we summarize four common data types including text, network, spatiotemporal information, and multidimensional data [3, 5]. A brief explanation for each data types is described as follows. Text provides semantic information of identities and backgrounds objects. Network, also called subgraph, consists of a set of nodes interlinked with a set of edges. A formal definition of a graph can be found in [16]. Spatiotemporal information captures spatial and temporal attributes of data. Multidimensional data uses multiple attributes to describe the properties of objects. A detailed explanation of data types for each user behavior is introduced in the following sections.

Anomaly Detection Techniques. The categorization of anomaly detection techniques used in this survey is borrowed from the survey written by Chandola et al.  [1]. The six categories are classification-based, nearest neighbor-based, clustering-based, statistical, information theoretic, and spectral anomaly detection techniques. Classification-based anomaly detection techniques develop models in the training phase and distinguish anomalies from normal data instances in the testing phase. In the training phase, classifiers are learned via training a set of data instances. In the testing phase, test instances are classified into one of the classes - normal or anomalous. Nearest neighbor-based techniques compute anomaly scores from distance or relative density measures in a community. Anomalies are separated using distance-based nearest neighbor-based techniques, which calculate anomaly scores based on distance to its nearest neighbor. Clustering-based techniques group similar data instances into clusters, and separate normal instances from anomalous instances. Statistical

techniques presume probability distributions of data instances. Outliers are found in space of low probability whilst normal instances are observed with a high probability of occurrence. Statistical techniques can be further divided into parametric and non-parametric anomaly detection techniques providing whether there exists a model structure a priori.

Information theoretic techniques analyze information content using measures such as entropy, relative entropy and Kolomogorov Complexity. Spectral techniques aim to find an approximation of the data by decomposing the problems and constructing suitable attributes. The attributes or components can then be embedded into lower dimensional subspace in which anomalous instances can be distinguished from normal instances. A detailed explanation of categories and sub-categories can be referred to [1].

We focus our discussion on visualization works that apply anomaly detection techniques. A small proportion of visual analytic tools manage to detect anomalies by using carefully designed visualizations from which anomalous data instances can be visually distinguished from normal ones [17, 18, 19, 20, 21]. The designs encode attributes and/or frequency using easily recognizable visual channels such as hues, heights of glyphs, sizes of nodes, etc [22, 21, 23]. We exclude these papers in the discussion of anomaly detection techniques.

Visualization Techniques. We categorize visualization techniques that have been applied to anomalous user behaviors, including sequence, graph, text, geographic, chart, and glyph visualizations. Sequence visualization illustrates relations between successive events with temporal information. Anomalous sequences include spreading patterns of rumors, sudden changes in the volume of posts, and unusual business processes. Common visual representations are timeline visualization, flow visualization, and parallel coordinates. Graph visualization shows structured patterns composed of nodes and edges. Anomalous graph indicates special communication patterns in a group or communities, financial frauds conducted between employees and clients, and unauthorized network traces directed from sources to destinations. Typical graph visualizations are node-link diagram, circular-based designs (i.e., a network topology map inside an outer ring), tree, and matrix. Text visualization focuses on textual data. Anomalous text is indicated by specific keywords, topics, and sentiments extracted/abstracted from texts. Word cloud is one of the usual visualization techniques for text. Text can also be combined with other visualization techniques such as flow visualization to present more contextual information. Geographic visualization depicts mobility patterns of people or vehicles in geographic space. Mobility patterns include discrete as well as continuous patterns. Discrete patterns describe distribution and co-occurrence while continuous patterns depict trajectories of users when they move from one point to another. Abnormal mobility patterns are hot spots, an opposite traveling direction to most, and uncommon movement when compared to history. Heat maps and flows/bubbles projection on a geographic map are used most often for visual analysis of mobility patterns. Chart visualization and Glyph visualization represent the attributes of a multidimensional data item using a chart (e.g., x-, y-axis, color of objects) and the feature of an icon (color, size, shape), respectively. Examples of anomalies include users who only reply in a discussion board but never initiate a post and who send an unusual amount of emails at a certain time. Typical visualization techniques include 2D/3D scatter plot, bubble chart, bar chart, Gantt chart, etc.

Interaction Methods. Interaction plays an important role in visual analytics. Based on analyzing interactions methods [24] used in research works regarding detecting of anomalous user behaviors, we summarize the categories of interaction tasks including tracking & monitoring, exploration & navigation, knowledge externalization, pattern discovery, and refinement & identification. Analysts may mark data of interest via click, hover or brush for tracking & monitoring. Analysts may observe data via panning, zooming, or drill-down/roll-up functions for exploration & navigation. Analysts may adjust attributes of data (e.g., color, size, range) to reveal interesting patterns (pattern discovery). Analysts may collect, save, and extract the current visualization (e.g., take a snapshot) for knowledge externalization. Analysts may label data with known identities (i.e., abnormal or normal data item) for refinement & identification of results.

Iv Social Interaction

Social interaction describes communication of ideas and thoughts between people. Social interaction can be further classified into private and public interaction. Private social interaction behaviors include sending and/or receiving emails, making phone calls, and sending text messages between familiars on a normal basis. Examples of anomalous interaction are communication of fraudsters [25, 26] and criminals [27, 28], emailing patterns of core contributors in a working group [29, 30] and spam [31]. Public social interaction behaviors associate with posting/sharing/replying contents on publicly accessible social platforms. Specifically, writing reviews on e-commerce platforms and editing articles in Wikipedia are also counted as public social interaction. Anomalies related to this interaction consist of diffusion of rumors [32, 33], social bots [34, 35], and detection of events [36, 37, 38, 39].

We observe a few differences between private and public social interactions. The linkage between senders and receivers is not explicit in public interaction compared to one-on-one conversations in private. The information accessible on public platforms is much more than that in private settings, leading to larger volumes of data collected relevant to public behaviors. The differences can also be implied from design principles of visual analytics tools which will be discussed in Section 4.3.

Iv-a Data Types

Text data such as keywords, hashtags, and email contents help analysts comprehend social interaction behavior, as it provides information including sentiment, categories, and clusters of text under a certain topic. Gloor et al.  [30][40] filter emails by keywords that are known to be related to crime patterns. For example, “bonus” means the most important thing, “investigation” refers to what is coming up for criminals. TargetVue [34] incorporates content features to detect social bot accounts. Mentioning of a topic under which sudden changes in the number of relevant tags are observed, is regarded as an anomalous behavior. Echeverria et al.  [41] discover a bot network in Twitter by solely mining the textual features of tweets. They found that the tweets of the botnet are taken directly from “Star Wars” novels. Beagle [28] allows analysts to filter contents from a filter set as well as to construct filters using keywords that are found useful during the investigation of scamming activities.

As social interaction concerns with passing, sharing, and exchanging information, network are often seen when conversations are held between users. Follower relationship in social media, back-and-forth communication via emails, and amendments made by one user in Wikipedia in response to the edit of another user are considered as network data. Gloor et al.  [30] identify the team leader, practice leader, and practice coordinator from visualization of social email networks. These anomalous users are placed in the center of the social network and connected to multiple nodes. Fu et al.  [29] explore small-scale email networks, where a node represents an email address, and an edge between two nodes indicates an email exchange. Analysts are able to identify different email networks for specific research groups as little communication is made across different groups. FluxFlow [32] derives user networks when exploring the process of anomalous information spreading. Indegree and outdegree are extracted based on the interaction graph of a Twitter user. These measures signal the influential power of the user.

Temporal information can be found from timestamps of microblogs, time and date of emails and calls, and days when a user appear on a forum. Location of geo-located microblogs, the location of calls, and the terrorist network of a country are spatial data. Temporal data facilitates the analysis of communication evolution whereas and spatial data explains where the behavior occurs. Elzen et al.  [25, 26] detect communication bursts using dynamic network visualization. One important part is the temporal analysis of events (e.g., mobile phone calls), where trends opposite to global trends, periodic repetition, and a sudden block between homogeneous behaviors are considered abnormal. CloudLines [42] regards sudden changes in the number of specific keywords within a period as anomalies. The keywords are collected from tweets, which arrive in data streams at non-uniform time intervals. Some visualization works combine temporal and spatial analysis in event detection. ScatterBlogs [37, 43] detects events containing geographic information such as power outages and disasters from microblogs, and in the meantime represent messages related to the events on a map.

Multidimensional data for detecting anomalous user behaviors include the length of a tweet, number of posts/emails, and average rating scores in e-commerce platforms. Multidimensional data not only offers comprehensive descriptions of social interaction, but also helps abstract anomalousness of behaviors. Webga and Lu [44] detect anomalous ratings by incorporating multidimensional data into the analysis. The multidimensional data includes the scores given by every user at the corresponding time. Rating frauds are discovered by measuring differences in average ratings and the number of rating activities in two time windows. Cao et al.  [34] detect anomalous users in social media by carefully selecting communication features. To investigate the interaction aspect of a social account, features such as whether users tend to communicate within a group or spread information in public, and whether users are responded from others are measured. FraudVis [45] selects ten features based on the rank of anomaly score to investigate which features contribute most to frauds on the Internet. The activity count within different time periods, for instance, is one of the features that evaluate the number of views on a video website.

Fig. 3: Visualizations of anomalous social interaction behaviors. (a) TargetVue [34] uses circle-based glyph visualization to encode individual users’ temporal posting/reposting behaviors, anomalousness of their behaviors, and correlation between suspicious users. (b) Leadline [46] visualizes event episodes using horizontal pulse-shaped timeline visualization. (c) FluxFlow [32] shows anomalous information spreading on social media using packed circle timeline visualization. (d) Chae et al.  [47] present public behavior responses to disaster events in microblog using a heat map and hexagons on a map. (e) Mobivis [48] visualizes the calling behavior of a network consisting of university staff and students using a node-link diagram.

Iv-B Anomaly Detection Techniques

Classification-based techniques are popular in discovering abnormal social interaction when compared to the application of the techniques in the other three user behaviors. The retrieval of “Star Wars” botnet [41] is achieved with a naïve Bayesian classifier based solely on textual features. This basic technique is effective because the tweets posted by the botnet are cited from the “Star Wars” novels. ScatterBlogs2 [38]

proposed a supervised, Support Vector Machine (SVM) classification-based approach to train classifiers as user-adjustable filters. A random forest algorithm, i.e., rule-based classification detects misinformation that is spread by social bots in a supervised approach

[35]. RumourLens [33] analyzes the impact of rumors during the information diffusion process. It performs iterative expansion of a query set and iterative refinements of a classifier (ReQ-ReC retriever and classifier) [49]. The output is a ranked list of tweet clusters that seem to be rumors, which can be refined by users. FluxFlow [32]

utilizes one-class conditional random fields (OCCRF) to perform sequential anomaly detection. The OCCRF model assumes the highly dynamic and one-class nature of anomalies, and computes an anomaly score by measuring dissimilarity from unlabeled training samples. The dissimilarity is derived from the difference of the posterior probability of a normal label and that of an abnormal label.

Nearest neighbor-based techniques calculate anomaly scores from distance or density. Metrics such as density, betweenness centrality, and group degree centrality in networks are the ranking criteria of homogeneity/risk for Collaborative Innovation Networks [50]. MobiVis [48] incorporates semantic information of phone calls and geographic proximity into a heterogeneous graph. Through importance filtering based on variables such as node degree of a neighborhood, important nodes and edges can be pruned from interaction with the ontology graph. TargetVue [34]

employs time-adaptive local outlier factor model to quantify sudden changes of posting or emailing behaviors. A user can be identified as a time-series vector in multidimensional feature space. Each user is given an anomaly score computed from features that distinguish one user from others, and from his/her own history. Kernel density estimation (KDE) is used for computing continuous distribution. It scales the parameters of estimation by enabling the kernel scale to vary based on the distance from the point to the

nearest neighbor in a data set. Cloudlines [42] allows logarithmic distortion of amplifying recent events in time. A kernel density estimator and a truncation function help focus on recent events that appear dense in time series. KDE is also used in [39] to inspect spatiotemporal regularities of topics. Point patterns are related to continuous regions by comparative kernel density analysis.

Statistical techniques are used in event detection, where anomalousness is quantified by measuring differences from models constructed from history behaviors. TwitInfo [36]

finds peaks from time-series events by considering exponentially weighted moving average and variance in a time window. The algorithm starts a new window if a significant increase in counts relative to the historical mean is encountered. Iterartive non-parametric regression based on Loess smoothing decomposes time series of interest to three components: trend, seasonal, and remainder component. Z-scores of remainder values are abnormality rating. This novel method was first used in ScatterBlogs

[43]. It was later applied to identify unusual topics in the selected regions [47] and used as part of predictive analytics based on topic trends in historic time series [15].

EventRiver [51] applies a clustering-based approach based on temporal locality in the analysis of streaming texts. The clusters are related in contents regardless of time spans. ScatterBlogs [37] uses the Lyold clustering technique to distinguish unusual events from general message clusters originating from high densities in time and space. Episogram [52] select appropriate features for clustering, and generate clusters that are always centered at the positions with highest densities in the data space. FraudVis [45] employs the CopyCatch algorithm, a graph-based clustering approach to explore fraud groups who suddenly follow a user in social media on a single day.

Spectral techniques are used to detect interesting network structure of editing histories [53] and rating frauds in e-commerce systems [44]. Brandes et al.  [53] abstract weighted attributes on nodes and edges from users and relationships between users respectively. A weighted graph is projected into controversy space where collaboration or competition structure of two user groups are easily identified. Webga et al. [44]

adopt a dimension reduction algorithm, singular value decomposition (SVD), to detect fake ratings that are written to boost the popularity of selected items in e-commerce stores. Once the suspicion level is raised above a threshold, alerts are sent to the visualization.

Iv-C Visualization Techniques

Egocentric Behaviors. Egocentric Social Interaction behaviors study the role of a user from his/her interaction with others. Examples of anomalous egocentric behaviors are users who only reply in a discussion board or who send an unusual amount of emails at a certain time. We observe that glyph, text, and graph visualizations are favored visual representations for egocentric behaviors.

Anomalous user behaviors can be identified via glyph visualization that are in different appearances to those of normal ones. Episogram [52] uses arrow-based and arc-based timelines to demonstrate posting and reposting activities, respectively. The two timelines can be aggregated to obtain overall tweeting behaviors. Users who always repost immediately after a message is posted are identified as arcs that always start from one end. TargetVue [34] (Figure 3 (a)) tackles the challenge of discovering social bots in Twitter. The circle-based glyph visualization facilitates investigation in terms of topics, sentiments, temporal dynamics of communication and its impacts, and relationship among accounts. Specifically, individual users’ temporal posting/reposting behaviors, anomalousness of their behaviors, and the correlation between suspicious users are encoded by behavior glyph, feature glyph, and relation glyph, respectively.

Text visualization can be used to describe egocentric communication patterns in emails [18, 54]. PostHistory [18] shows the evolution of emailing patterns. It consists of two views, with one revealing the intensity of exchanged messages with each contact in a calendar view, and the other demonstrating how email addresses evolve over time in movies. Analysts can change addresses’ positions by vertical/circular/alphabetical arrangement. Social Network Fragment [18] represents social networks in a graph where nodes are replaced by colored names of individuals. The larger the font of the name, the stronger an individual is tied to others. Viégas et al.  [54] study changes of relationships implied from changes of keywords in email contents. The frequency and distinctiveness of keywords can be inferred from the sizes of texts, and thus anomalies such as changes of relationships (e.g., from peer to boss) can be inferred.

In addition to glyph and text visualization, graph visualization, especially node-link visualization helps detect anomalous individual behaviors from their social interaction. Li et al.  [55] explore email patterns in two graphical modes: cliques and email flows. A spam bot is detected in the email flow panel when only edges originating from one node are visualized. Gloor et al.  [50, 30, 40] investigate communication patterns of working groups in node-link visualization, and study the evolution of social structures over time in animation. Networks are drawn in personalized mode or subject mode to identify core contributors in groups and important messages, respectively [50]. The visualization tool TeCFlow [30, 40] detects the hidden communication structure from the Enron email corpus. The hierarchical social networks uncover how Enron employees conduct collusion and frauds by emphasizing the roles of influencers, gatekeepers, and leaders. Semantic node-link views enable investigation in terms of email addresses, keywords or time. Shao et al.  [35] evaluate the extent to which an account expresses similarity to the characteristics of social bots based on diffusion patterns of tweets. In the “Hoaxy” platform, a node-link diagram represents the social networks, with brighter hues indicating higher anomalous scores.

Collective Behaviors. Collective Social Interaction behaviors derived from users acting in a group or acting in response to each other. Anomalous collective social behaviors include temporal development of tweets, the reaction of people to special incidents, and separate group patterns of communication. Sequence, geographic, and graph visualizations used often for collective behaviors.

Sequence visualization represents the evolution of collective behaviors in various forms such as parallel coordinates and pulses/bubbles arranged along a timeline visualization. Viégas et al.  [23] visualize revision history of Wikipedia pages in modified parallel coordinates. Each revised version of an article is represented by a vertical axis, with the axis’ length indicating the length of the article. The vertical axis is divided into parts with each corresponding to revisions made by every author. By linking the axes together, a modified form of parallel coordinates shows the competition/mass deletion histories of articles. RumorLens [33]

demonstrates the movement between different states of interaction with a rumor. The main view shows a Sankey diagram. The number of people exposed to the rumor and the associated correction is illustrated with lengths of colored segments (blue for rumors and red for corrections) in one axis. By linking different states between axes that correspond to time epochs, analysts can understand the influence of rumors and the corrections.

Pulses and bubbles arranged according to temporal sequence illustrate the anomalies of collective social interaction behaviors. Major changes in temporal development of texts are detected by highlighting unusual shapes of timelines. As one of the earliest visualizations that investigate the emergence of events, TwitInfo [36] visualizes bursts of events in a line chart. The highlighted and labeled event peaks suggest events that trigger heated discussion on Twitter. CloudLines, LeadLine and EventRiver [42, 46, 51] detect events by relating volume of text data extracted from online news within a period of time to temporal density of keywords. Horizontal pulse-shaped timeline visualization represents event episodes, with the sizes of pulses indicating the importance of events. LeadLine (Figure 3 (b)) and EventRiver [46, 51] arrange vertical positions of events according to similarity of topics. FluxFlow [32] (Figure 3 (c)) discovers temporal trends and impacts of users in information spreading process (e.g., rumors). The main view consists of packed circles arranged along a timeline. A user’s influence (i.e., the number of followers) and anomaly score are encoded by the size and color of a circle, respectively. A user can be analyzed from three perspectives simultaneously: tweet volume, sequence, and distribution of anomalous accounts. A complementary tree visualization demonstrates the correlation of user accounts in the diffusion process.

Geographic visualization is used to reveal events containing spatial as well as temporal references. With geographic details, anomalies can be detected from spatial intensities obtained from a collection of social interaction behaviors. Lee et al.  [56] introduce one of the earliest works of applying spatiotemporal analysis to social media, where flows of people are represented as arrows on a map. ScatterBlogs [37, 43] employ geographic visualization for anomaly detection of topics and events as well as their spatial and temporal marks. ScatterBlogs2 [38] uses dots on a map to portray geo-located microblog posts. It differs from its previous version since there are two settings in ScatterBlogs2: a classifier creation environment and a monitoring environment. Analysts create task-tailored filters based on messages of well-understood events in the classifier creation environment, and obtain contexts of interesting events from a filter orchestration view and a time slider in the monitoring environment. Thom et al.  [37] extract terms from messages and cluster topics as tag clouds on a zoomable map. Anomalous events are labeled and positioned on a map according to its detected location. The “Star Wars” botnet was discovered by accident when Echeverria et al.  [41] observed sharp boundaries of the latitudinal and longitudinal position of some tweets, which were generated from bots considering the unusual spatial distribution.

Heat map, one of the geographic visualizations, is effective at illustrating geographically-marked microblog messages. Pozdnoukhov et al.  [39] compute heat maps from streaming tweets. Density of heat maps indicates spatial variability of population’s response to various stimuli such as large scale sportive, political or cultural events. The difference in density between two heat maps implies temporal evolution of events. Chae et al.  [47] (Figure IV (d)) collect a sheer volume of real-time microblog messages and mine public behavior response to disasters. A heat map and hexagons on a map identify spatiotemporal differences between crisis and normal situations.

Graph visualization including node-link and circular-based visualization uncover anomalous structures of social interaction. Perer and Shneiderman [27] emphasize the need to examine social networks systematically in SocialAction. The visualization tool is designed accordingly to encourage interaction with clustered node-link visualization. Analysts can quickly direct their attention to the most anomalous networks as nodes/subgroups are colored according to their ranks of anomalousness. Fu et al.  [29] examine small-world email networks using several visualizations. For example, stacked displays of graphs on a spherical surface visualize communication patterns between different groups. A hierarchical drawing emphasizes important nodes by placing them high in the hierarchy. MobiVis [48] (Figure 3 (e)) visualizes the calling behavior of a network consisting of university staff and students using a node-link diagram. The goal is to investigate information exchanges and the implicit social relationship. The researchers design a “behavior ring” for user(s), which arrange events in a radial form around a node. Analysts study structural interaction from the correlation between nodes and temporal interaction from the rings.

Circular-based representation demonstrates collective social interaction behaviors in a packed visualization. Elzen et al.  [25, 26] combine the circular hierarchical edge bundle view and massive sequence view (MSV) to detect unexpected suspicious communication patterns. The novelty of this visualization tool is that it incorporates node reordering strategies in MSV. The reordering techniques take account of closure, proximity, and similarity to ensure outliers stand out from mass data. Webga and Lu [44] project nodes (i.e., users) into a circular layout to discover rating frauds from the temporal relationship between users and items. The combination of singular value decomposition diagram, re-ordered matrix representation, and the temporal view reveals interesting group patterns of items. These patterns share a similar rating history and users of similar behaviors.

Iv-D Interaction Methods

Visual analytics of social interaction behaviors applies tracking & monitoring as one of the first steps of exploratory analysis. TwitInfo [36] tracks bursts of events in time series by highlighting the event peaks in a line chart. These peaks suggest events that trigger heated discussion on Twitter. Koven et al.  [28] multi-select summaries of email contents in the main panel to keep track of important keywords regarding scamming activities. FluxFlow [32] monitors information diffusion using multiple coordinated views. As analysts select a point in tree view, the diffusion pattern generated by the user’s reposting behavior is shown in the thread view. The interaction is usually achieved in tools with multiple coordinated views [25, 18, 48, 34, 57, 58].

Exploration & navigation allows analysts to focus on different subranges of data flexibly. Végas et al.  [54] design a scrolling bar, allowing analysts to review email conversation in different periods of time. TargetVue [34] enables analysts to zoom and pan in global and inspection view to locate to anomalous areas. Exploration in Episogram [52] is not limited to zooming function. Analysts can select a user of interest, and aggregate all users who perform the same posting/reposting activity. In this way, an individual’s details as well as the general trend are obtained. MobiVis [48] designs a “behavior ring”, from which analysts select different levels of granularity to arrange calling events in a radial form around a node. The length of petals corresponds to the duration of selected events.

Pattern discovery is achieved in various forms of interaction such as filtering. Gloor et al.  [50] visualize email data to discern the structure of networks and identify core contributors. Emails are presented according to the type of links (i.e., “To”/“From”/“Cc”) in the email network. ScatterBlogs2 [38] supports generation of task-tailored filters in the classifier creation environment. In the monitoring setting, analysts can orchestrate the filters to detect anomalous users. Sorting visual objects also uncovers interesting patterns. Cloudlines [42] visualizes online news events in timelines in either linear or logarithmic scale. The tool allows analysts to reconfigure visual objects via click and drag. Webga and Lu [44] detect rating frauds in the projection view, which contains two orthogonal axes inside a circle. Analysts can choose any two dimensions and the mapping method to dig out the outlier pattern. Changing encoding scheme is useful. Chae et al.  [47] demonstrate events detected from microblog messages with a heat map, scatters, and hexagons on a map. TargetVue [34] encodes users’ action in a time sequence, anomalousness of their behaviors, and correlation to three glyph designs, so that analysts acquire various perspectives of the social accounts.

Analysts may want to save results of analysis for future study. For example, documents of interest can be saved in the evidence box of the EventRiver [51] visualization tool. This function supports hypothesis evaluation and evidence exchange. Koven et al.  [28] allow analysts to share tags created during analysis of email contents. Visualization on a website tends to have more flexible applications of knowledge externalization than stand-alone tools. After one analyze the anomalous extent of social bots in the Hoaxy platform (https://hoaxy.iuni.iu.edu/[35], the results can be saved into CSV files for sharing.

Refinement & identification is conducted after analysts have obtained a basic understanding of social interaction behaviors. LeadLine [46] associates events with corresponding time-sensitive keywords automatically. Analysts can then annotate the events manually to provide accurate labels. There are two labeling strategies in EventRiver [51]: representative event labeling and outlier labeling. On one hand, representative labeling is for events that contribute to the biggest cluster of a story. On the other hand, outlier labeling labels outlier events in a story. Koven et al.  [28] emphasize tagging abilities in discoveries of anomalies. Analysts can label an account as a scammer, victim, service, or other categories. These tags can be used for creating filters as well as the calculation of statistics about scamming activities.

V Travel

Travel is physical movements of users between places containing geographic information. Analysis of travel behaviors is meaningful for traffic monitoring, urban safety, and urban planning [59]. Travel behavior data can be collected from mobile phones and base stations, Global Positioning System (GPS), maritime search and rescue events, and medical records. Anomalous travel behaviors differ from the expected patterns indicated by individual historic records or activities of the crowd. Examples include irregular driving direction [60, 59], hotspots (e.g., crowded neighborhoods) [59, 61, 62], and characteristic travel patterns associated with groups of travelers [20, 63]. These anomalous behaviors can reveal potentially harmful events such as disease outbreaks and terrorist attacks.

V-a Data Types

Spatiotemporal data is essential to describe the information of when and where about users’ physical motion. Spatial data consists of latitudes and longitudes, trajectories, pickup/drop-off locations, locations of base stations, etc. Temporal data includes timestamps of indoor activities, estimated time arrival, and pickup/drop-off date and time. Analysis of travel behavior usually combines both spatial and temporal data. Pu et al.  [64] explore mobility patterns of different user groups from mobile phone data collected from each base station and handoff data (i.e., successive calls with different base station IDs). Spatiotemporal data related to communication include the start time of calls, time duration, the city of the opposite side of calls, and location and direction of base stations. TelCoVis [61] explores co-occurrence of people using telco data, which is a type of all-in-one mobile phone data containing activity records of calls, messages, and Internet usage. Data of each type of activity is comprised of timestamps, base station ID and the corresponding latitude and longitude. Kim et al.  [65] create a visualization that helps comprehend flow patterns by analyzing the spatial distribution of non-directional discrete events over time.

Multidimensional data enriches skeletons of analysis of travel behavior. A combination of attributes including distance traveled, speed of cars, tip amount and toll amount for taxi trips, and frequency of residents’ indoor activities provides a detailed description of travelers or vehicles. Pu et al.  [64] aggregate multidimensional data associated with base stations and mobile phone users. The data includes the total number of phone calls made by each user at each station and at all stations, in addition to spatiotemporal details. Malik et al.  [66] evaluate the potential risks of Coast Guard search and rescue (SAR) operations to better plan response actions to mitigate risks. The SAR data consists of two components: response cases and response sorties. Multidimensional data of each component contains the number of lives saved, lost, and assisted. Voila [59] extracts multidimensional features to detect abnormal incoming and outgoing taxi flows in a cell (a region is segmented into multiple cells). Examples of the features are the number of vehicles that flow in and out from one cell to another. Analysis of inflows and outflows for multiple cells consist of multidimensional data.

Text associated with travel behavior is mainly used for identification and categorization. Examples include user ID, textual messages, and roam type and toll type. Pu et al.  [64] collect information of mobile phone ID, International Mobile Equipment Identity, city ID, roam city, roam type, and toll type to describe properties of mobile phones. These details help explain the nature of mobile phone users, i.e., travelers. Beecham et al.  [63] categorize people into different groups in order to summarize group-cycling behaviors. Cyclists under the cycle hire scheme are classified according to age, sex, full postcode, whether they cycle more with others or on an individual basis, and spatiotemporal information. Liao et al.  [67] study resident indoor activities. These activities include not only long-term activities such as sleep, relax, watch TV, but also short-term ones such as entering home.

Network data refers to trajectories between origins and destinations. Network data is mainly used to complement spatiotemporal analysis. Ko et al.  [68] assess flight journeys that often delay by analyzing pairs of origin and destination airports. By aggregating the amount of delays for each flight journey (i.e., network), analysts detect anomalous airports and flights where prevalent delays are often found. Beecham et al. [63] study group-cycle journeys that link starting points and destinations.

Fig. 4: Visualizations of anomalous travel behaviors. (a) Kim et al.  [65] show origin and destination via directions of arrows in a flow map. (b) Ferreira et al.  [62] investigate anomalous taxi trips in New York city in multiple coordinated views of a dot map and a line chart. (c) Voila [59] displays unusual traffic flows between a focal region using heap map. (d) Von et al.  [69] visualize different types spatiotemporal patterns by parallel coordinates. (e) Wu et al.  [61] design a contour-based treemap to illustrate spatial and temporal characteristics of human mobility at a specific place.

V-B Anomaly Detection Techniques

Statistical anomaly detection technique is the most often used to analyze travel behaviors. A data-driven approach [70]

using self-organizing maps and Gaussian mixture models are applied to describing normal behaviors of vessels. By comparing data with rules and signatures, unusual traveling patterns are detected through visual analytics. A box-plot method 

[56] checks whether geographical regularity deviates from normal conditions by large extents. Two visualization works [71, 69] apply cumulative sum (CUSUM) algorithms after kernel density estimation to better identify outliers in time series. One [71] calculates density estimation for the event category as well as density estimation for all categories, and obtain the expected number of events within a given area. Outbreaks in the temporal domain can be detected with the cumulative summation algorithm for the given location. Applying CUSUM after kernel density estimation enables analysts to spot spatial areas worth investigation quickly, and then analyze historical time series to look for unusual trends. The other work [69] also utilizes CUSUM algorithm to trace uncommon development patterns.

Clustering-based is employed to reduce computation complexity and visual clutter for large-scale databases. Andrienko et al.  [72]

use k-means clustering to analyze spatiotemporal phenomena described by multiple spatial time series. The clustering approach groups spatial objects by the similarity of their corresponding time series, and thus spatially unusual events can be detected. The clustering approach is used in conjunction with statistical methods to model time series such that residuals are randomly distributed over time. High deviations from expected time values are seen as anomalies. K-means clustering is also used to detect anomalies of mobility patterns around base stations 

[64] and group-cycling behavior [63]. This clustering approach requires the number of output clusters to be specified before computation. Lin et al.  [73]

propose VizTree and Diff-Tree to mine anomalous patterns by comparing time series (e.g., yoga postures) with normal references. It uses bottom-up hierarchical clustering to produce a nested hierarchy of similar groups of objects based on a pairwise distance matrix. TelCoVis 

[61] applies a biclustering technique in binary matrices, where 1 means co-occurs of human mobility and 0 means otherwise. Thus, origins and destinations of human mobility can be bundled into coordinated sets as biclusters.

Nearest neighbor-based anomaly detection techniques compute the continuous distribution for detection and anomaly scores. KDE computes the spatial and/or temporal distribution of discrete events, which is particularly useful for detecting hotspots in density-based visualizations. Malik et al. [66] employs a modified variable KDE technique to identify spatial hotspots of search and rescue cases in the U.S. Coast Guard. Kim et al. [65] compute continuous spatiotemporal distributions of discrete events by applying the KDE approach to two-dimensional data, which is achieved without trajectory information. Local outlier factor (LOF), a density-based nearest neighbor-based technique is used to calculate anomaly degree of indoor daily activities of residents. Duration, number of times, and start time are selected as the properties to compute outliers.

V-C Visualization Techniques

Egocentric Behaviors. Egocentric Travel Behavior is individual physical movement in geographic space. An example of anomalies associated with egocentric travel behavior is an unexpected increase in time spent on indoor activities. Chart visualization is seen to represent egocentric travel behaviors.

VizTree [73] uses suffix tree visualization to indicate abnormal parts of the time series by comparing with reference (i.e., normal) patterns. Anomaly detection is achieved by transforming a time series into a symbolic representation and visualizing it as a modified suffix tree. Weaver et al.  [20] explore individual hotel visitors in a calendar view, a map view, and an arc diagram. A calendar view shows total visits on each day, with squares and circles indicating weekends and weekdays, respectively. A multi-layer map view describes paths from residences to hotels, relative to railroads and rivers. By synthesizing temporal and spatial patterns observed from multiple views, analysts obtain circuitous routes taken by salesmen, cooperation between traveling merchants, and the effects of weather and seasonal variations, etc. Liao et al.  [67] are interested in resident behaviors recorded by smart home visual systems. A heat Gantt chart view shows start time, duration, and the number of occurrence of different activities on a daily basis. By combining the heat Gantt chart with other views, activities that deviate from daily routines are detected through comparison on different daily records.

Geographic visualization is also seen for egocentric travel behaviors. A transit map displays GPS traces [60] of moving taxis in basic mode, monitoring mode, and tagging mode. Taxis are represented by glyphs on the map, with the colors dependent on whether the taxi is loaded with passengers. A taxi with an irregular driving direction or moving at high speed, and a crowded neighborhood are egocentric anomalous travel behaviors.

Collective Behaviors. A collection of users move together in time and space, we say their travel behaviors are collective. Abnormal travel behaviors can be identified from regions crowded with people. As most visualization tools studying collective travel behaviors employ geographic visualization, we analyze travel behaviors using the finer categories under geographic visualization including flow maps, heat maps, and bubble/dot map.

Flow maps represent trajectories by linking origins and destinations on a map. Andrienko [72] proposes a framework for spatiotemporal analysis and modeling. Anomalies are found in temporal line charts displaying model residuals. Spatial flows between cells are represented by directed half-arrows whose widths are proportional to the total counts of objects that move. The flows are laid upon Voronoi maps. Trajectories of cycling patterns are shown as flows on a London city map [63]. The straight and curved end of a flow represent origin and destination, respectively. Group journeys are colored red on the map whereas non-group journeys are colored blue. One of the findings is that female cyclists are more likely to make late evening journeys when cycling in groups. Kim et al.  [65] (Figure 4 (a)) extract, represent, and analyze flow maps and heat maps of spatiotemporal data without the use of trajectory information. The flow map visualizes origin and destination via directions of arrows, and the difference of flows are encoded in heat maps. Hot spots can be found with this visualization.

Heat maps display spatial densities of collective travel behaviors. Maciejewski et al.  [71] develop an interactive visual environment to dig out hot spots in spatiotemporal data for crime analysis or surveillance syndrome. Bivariate and multivariate heat maps help detect spatiotemporal hot spots by combining height maps, colors, and contours. To analyze risks of Coast Guard search and rescue (SAR), Malik et al.  [66] identify potential hot spots using heat maps. Risks of stations are indicated by the intensity of colors. The red heat map shows the time taken by stations to deploy an asset to an SAR accident while the green heat map indicates the SAR coverage. Ferreira et al.  [62] (Figure 4 (b)) investigate anomalous taxi trips in New York city in multiple coordinated views of a dot map and chart visualizations. Dots on a map imply pickup and dropoff sites in the region. In the cases of Hurricane Sandy and Irene, there are virtually no dots during hurricanes, but traffic seemed to go back to normal in the following days. Voila [59] (Figure 4 (c)) explores taxi trips to detect sudden changes in traffic patterns. There is an anomaly detection mode giving visual cues of regional anomalies, and a context mode providing information of volume difference, traffic flow, and expected patterns at different times. Unusual traffic flows between a focal region and two other places are highlighted by the deep red color of heat maps. Feedback from analysts can update the anomaly score and thus change the color of heat maps for the selected region.

We analyze other visualization techniques for travel behaviors including sequence and graph visualization. Von et al.  [69] (Figure 4 (d)) categorize spatiotemporal patterns into different types of locations according to home, work, tennis, etc. The main view is Dynamic Categorical Data View in a varied form of parallel coordinates, which show the evolution of all types of data. Each axis of parallel coordinates indicates a point in time. When analysts select a type of data, related geographic information is plotted in the linked map, where arrows on the map indicate physical movement of people. In TelCoVis [61], Wu et al. design a contour-based treemap to illustrate the spatial and temporal characteristics of human mobility. By combining with heat map, matrix, and parallel coordinates, analysts gain insights into co-occurrence of human mobility and correlations of co-occurrence.

V-D Interaction Methods

Analysts track and monitor data to look for anomalies. Uninteresting and expected patterns can be unmarked [73]. This improves the efficiency of detection processes and reduces false positives. TelCoVis [61] emphasizes the correlation between spatial and temporal data for exploring the co-occurrence of human mobility. When analysts hover on a sector in the contour-based treemap, all sectors corresponding to the same region will be highlighted. Moreover, analysts can mark the region for exploration. Analysts can track a set of features of categoric data [69] including location, movement pattern, group membership, and group changes. The selected data instances are highlighted in the linked map view and the categoric view.

The interactions associated with exploration & navigation piece separate fragments of data. Panning and altering views via scrollbars facilitate detection of non-trivial patterns in large time series databases [73]. High-level outlooks and details should be accessed interchangeably when exploring travel behaviors. Different levels of aggregation in time [66, 62, 72] and space [62, 68, 59, 63] are seen in a variety of visualization tools.

Unusual travel patterns are uncovered by filtering, configuration, and encoding to various visual forms. The anomaly grading view in SHVis [67] present anomaly scores of selected activities. Analysts click on different days and drag date intervals to compare the activities during the different periods of time. In order to analyze maritime operations and assess risks associated with the allocation of resources [66], analysts generate a combination of filters which can be applied to spatial regions and temporal plots. In addition, analysts can evaluate the effects as a result of opening/closing a station, and determine which station is suited for closing. Visualization can be altered in color and in form to reveal anomalous patterns. Andrienko [72] builds a framework for spatiotemporal analysis. A rich set of interactive exploration is embedded. Analysts can change the color scheme and assign colors to clusters on maps and line charts. Analysts can choose the parameters to be mapped in the parallel coordinates, and adjust smoothing parameters as well as the time period for the contour-based treemap in TelCoVis [61].

Externalization of results records analysts of important discoveries. Voila [59] includes a snapshot panel for analysts to conveniently capture the overall and detailed map views. Ferreira et al.  [62] explore taxi trips using TaxiVis, which supports exporting query results in CSV files, the same type of files as their input source. The visual analytics framework [72] models spatiotemporal data. The model description files can be stored externally along with group membership of place, statistical details.

As analysts gradually develop basic knowledge, they recognize suspicious areas and integrate domain knowledge in anomaly detection. After a link is described as anomalous, the link is placed on the top of visualization while the other links become transparent [68]. In Voila [59], analysts incorporate their judgments about whether the region is anomalous. This feedback is taken into consideration in the recalculation of anomaly scores of all regions in the space.

Vi Network Communication

Network communication is sending and receiving information between machines via networks. Examination of network communication has practical significance for national defense [74] and commercial enterprises [75]. Network communication behaviors include routing, network traffic, and port activities, etc. Anomaly detection associated with network communication is usually concerned with cyber security, which is protecting computers and systems against malicious activities in a computer-related system. Anomalies are indicated by alarms and suspicious patterns that deviate from expectation. Investigation into these signals reveal attacks such as BGP routing instability [76, 77], virus outbreak [78], port scans [79, 80, 81], and intrusion into systems [82, 83].

Vi-a Data Types

The identified connection between sources and destinations is seen as network data. Network data is important for detecting anomalous network communication, as it is the foundation for analyzing information exchange between machines. For example, the network connection between autonomous domains (ASes) [84] and that between subnets and hosts [85] can be analyzed. VisFlowConnect [78] focuses on network traffic between an internal domain sender and an internal/external domain receiver. Liao et al.  [75] represent enterprise networks consisting of hosts, users, and applications as host-user-application connectivity graphs. From the graphs, the similarity of users by applications can be assessed. VisAlert [86, 74] considers large-scale attack patterns between alerts and local networks. Analysts can obtain an overview of intrusion attempts and general situations by inspecting networks formed by alerts and a topology map of local network nodes.

Multidimensional data contains multiple numeric attributes to describe context information in network communication. Attack frequency, flow rates (i.e., number of packets and bytes for a fixed period), and system load are examples of multidimensional data when discussing network communication behavior. Teoh et al.  [76] uses intensity, categorical, and counting measures to describe routing behaviors. Each measure has its corresponding degree of abnormality. The anomaly threshold is calculated from the anomaly degrees of multiple measures. SpiralView [87] presents a connection as a list of events introduced in terms of time, source host, application, and destination host. The details of connection are described using multidimensional data, which are incorporated in the description of alarms. MVSec [80] uses multidimensional data including the number of connections, flow counts, and flow bytes. The statistics are combined with temporal features to explain each unit of network security data.

Spatiotemporal data of network communication associates mainly with addresses of receivers and/or senders, and temporal information of occurred activities. Spatiotemporal data provides details of timestamps and IP addresses. Investigation of spatiotemporal data is helpful for traffic monitoring, as can be seen in [88] which deals with timestamps from millisecond to year together with IP addresses from IP prefix to continents. Erbahcer et al.  [21] explore time and difference in IP addresses between the external domain and that of the monitored system. The greater the differences between addresses, the more suspicious the network communication is. SpiralView [87] is interested in how alarms evolve in time with the purpose of detecting periodic patterns. By inspecting alarms of the same level of attack severity, alarms can be segmented based on their temporal distribution to better understand network behaviors. VisTracer [77] visualizes destination ASes of traceroutes against time to assess spatiotemporal patterns of occurred anomalies.

Text data type provides low-level details about connections in cyber networks. Text data can be encoded to visualization for high-level exploration, or acts as evidence for confirmation of hypothesis regarding anomalousness. Text data includes textual logs and categories of events. Erbacher et al.  [21] represent textual log information using glyphs. Textual logs contain time, locations and, types of connection. Teoh et al.  [82] project connections with known classes (i.e., normal, probe, DOS, U2R, and R2L) into regions in a visualization panel. Suspicious data is found separate from normal data, facilitating further investigation.

Fig. 5: Visualizations of anomalous network communication behaviors. (a) VisTracer [77] visualizes routing anomalies in traceroutes using matrix. (b) Tao et al.  [89] design a high-order correlation graph to show collective anomalies. (c) MVSec [80] mines correlation of events attributed by what, when and where in a dandelion-metaphor using circular-based design. (d) SpiralView [87] analyzes how alarms evolve in time and detect suspicious patterns using a radar chart.

Vi-B Anomaly Detection Techniques

Statistical anomaly detection techniques are widely used in the detection of abnormal network communication. Detection techniques of cyber attacks are categorized into signature-based (matching suspicious behaviors with known attack patterns based on existing statistical models or rules) and anomaly-based (comparing behaviors against a “normal” baseline) [10], both of which can be described using statistical methods.

We describe visualization works that incorporate statistical methods below. Teoh et al. [76] investigate BGP routing instability with a signature-based detection and a statistics-based algorithm. Signatures based on bursts of sequence within a time window are matched with data. The statistics-based approach raises an alarm when current behaviors deviate from expected patterns obtained from history. VIAssist [90] highlights data instances that meet the criteria of attacks seen in the catalog and discovers the unexpected patterns by interactive exploration of visualization. Mansmann [88] applies a signature-based algorithm to detecting botnet spread propagation whereas significant traffic changes are visualized in a readily noticeable form. VisTracer [91, 77] compares anomalies with existing scenarios of BGP hijacking. Unknown suspicious attacks are found by adapting online change-point detection algorithm and comparing path similarity. MVSec [80] uncovers overall network state details by visualizing several statistical time series including network traffic and the number of distinct active IPs over time. Suspicious patterns are analyzed in terms of what, when, and where from statistics (e.g., time interval, flow counts, flow bytes). Tao et al.  [89] detect point anomalies with a Gaussian model-based technique for labeled data, and with a histogram-based technique for unlabeled data. The correlation analysis and propagation of anomaly score is performed to detect collective anomalies.

Classification-based methods are used in intrusion detection [82, 87]. Teoh et al.  [82] utilize a user-directed drawing program, PaintingClass, to classify each object and predict the categories. Unsupervised attacks are found by comparing positions of normal instances and unlabeled data. SpiralView [87]

models user behaviors using Bayesian networks, and raises anomalies for deviations from usual behaviors.

Nearest neighbor-based techniques based on similarity is applied in [75], which transforms relations among hosts, users, and applications into network connectivity graphs, bipartite graphs, multidimensional scaling, and similarity graphs. The inter-graph similarity is evaluated in a top-down manner, and node similarity is analyzed based on the dynamics of node degrees. LongLine [92] uses local outlier factor to facilitate the comparison of temporal patterns of anomalous systems behaviors. The tool employs a frequency-based model which identifies files and addresses in audit logs as an individual entity. The entity is described by a feature vector constructed from their extended bag of system call models.

TVi [93] uses a spectral technique to direct users to time periods of anomalous activities. The tool derives a scalable metric (entropy from IP addresses and ports) and conducts dimension reduction using principle component analysis (PCA). NStreamAware [83] applies a DBSCAN algorithm to cluster timelines, which achieves event detection in streaming data. The possibly important temporal segments are further assessed by analysts through interactive exploration.

Vi-C Visualization Techniques

Egocentric Behaviors. An egocentric network communication behavior triggers alarms due to suspicious network properties of the connection between source host(s) and destination host(s). Examples of egocentric anomalous network communication behaviors are hijacking network traces by another AS, a port scan, and unusually high volume of traffic on a machine. Glyph and graph visualizations are used to represent egocentric behaviors.

Erbacher et al.  [21] initiated one of the earliest visualizations to display IP addresses of alarms in a glyph-based radial form. Line glyphs surrounding a central node represent different types of connection (e.g., parallel lines indicate initial connection requests). The difference in IP addresses between the external domain and that of the monitored system is encoded in the length of line glyphs. The suspicious connection is colored red due to unexpected user activity such as timeouts expire. Teoh et al.  [76] inquire into Border Gateway Protocol (BGP) routing instability. Near-real-time monitoring of Internet routing is pictured as temporal line charts and glyphs, where a suspicious event detected from statistics is illustrated with a large circle in high position and a spike in timeline.

Graph visualization, especially matrix is used to detect anomalous egocentric network communication. Goodall et al.  [81] develop a matrix showing network activity of hosts over time. Communication between hosts is superimposed on the matrix, complemented by multiple linked views detailing port activity and raw packets. NVisionIP [85] detects traces of abnormal network behaviors in multiple levels of an entire class-B IP network. NVisionIP consists of a galaxy view in matrix, a small multiples view, and a machine view with bar chart. Spikes in traffic volume are seen as changes in node colors in the matrix. Simple scanning attacks are discovered as clusters in the matrix, where x- and y-axis stand for subnets and hosts, respectively. VisTracer [77] (Figure 5 (a)) tackles large trace route data sets to distinguish legitimate routing changes and spam campaigns. Time and destination of ASes are represented by x- and y-axis in a matrix layout. Rectangular glyphs in the matrix layout are anomalies. Two nearly identical anomaly patterns at the same x-position in the matrix indicate routing anomalies in two ASes.

Collective Behaviors. Collective network communication behaviors involve more than one exchange of information between two machines or among multiple machines. Anomalous behaviors include botnet infection and periodic attacks, which are represented in graph and sequence visualizations.

Tree visualization, one of the graph visualization, helps identify anomalous network communication behaviors. Teoh et al.  [84] examine routing behavior of BGP data. Each IP address is mapped to one pixel in a quadtree visualization to detect anomalous origin AS changes. An event is represented by a line connecting the affected IP prefix and ASes. Anomalies are revealed as an area concentrated in lines, since events that take similar paths multiple times are suspicious. Teoh et al.  [82]

detect intruders by allowing analysts to interactively explore activity logs in an interactive decision tree visualization layout. Complementary to this view, a three-dimensional scatter diagram pinpoints unlabeled anomalies when a high-density cluster lies in areas of sparse training data. Mansmann et al.  

[88] aggregate IP addresses according to prefix, autonomous system, country and continent in treemaps based on two layout algorithms. This visualization helps monitor large-scale network data. Segments in treemaps are colored indicating sharp changes in the number of incoming connections.

Node-link diagrams visualize structures of collective network communication. Tao et al.  [89] (Figure 5 (b)) design a high-order correlation graph to show collective anomalies. When applied to software analysis, malicious attacks due to software vulnerabilities are identified as collective anomalies. In this case, a node illustrates each line of code, an event represents an execution, and a correlation link represents data flow. NIVA [94, 95] coordinate 3D node-link view with glyph design and circular histograms. It distinguishes from other visualizations as it builds attack severity into interaction inspired by the “haptic” concept. For example, when dragging nodes in the three-dimensional view, users can feel the force of “push” and “pull” motion computed based upon attack frequency.

Circular-based visualization is also used to demonstrate collective network communication behaviors. VisAlert [86, 74] identifies critical attacks of hosts through analyzing “what, when, where” information of alerts. The alerts are allocated on segments of rings according to the severity of attacks. “When” attribute is mapped such that the innermost ring represents the most recent activities. Inside the ring, a network topology map is used to depict network under scrutiny. FloVis [79] observes interactions between host pairs on either side of the monitored border. A bundle diagram displays connections between entities in a radial tree layout. Scanning activities can be detected by examining bundles directed from 9000 consecutively numbered ports to the internal host. MVSec [80] presents four coordinated views to discover anomalies and retrieve stories behind subtle events. The event radar view (Figure 5 (c)) mines correlation of events attributed by what, when and where in a dandelion-metaphor in a ring. Seeds (i.e., subnets) spread from the center of the dandelion stalk, which represents the only entrance to the network. Antennas (i.e., hosts) extend from the seed, giving a two-layer hierarchical structure. The seriousness of botnet infection, for instance, is indicated by the number of colored nodes in the dandelion-metaphor.

Sequence visualization uncover abnormal trends of collective network communication. While NVisionIP [85] focuses on activities occurred on machines, its complementary tool VisFlowConnect [78] explores network flows between machines using parallel coordinates. VisFlowConnect investigates the relationship between senders and receivers. A cluster of lines originating from an external host sender indicates a virus outbreak. SpiralView [87] (Figure 5 (d)) analyzes how alarms evolve in time and detect suspicious patterns (e.g., alarms appearing everyday at the same time). The alarms are scattered dots in a radar chart, which is useful for identifying periodic patterns of intrusions. The alarms are arranged from the center to the outer part so that recent events are allocated with more space. NStreamAware [83] analyzes a condensed heterogeneous data stream and uses a sliding slice to provide a summary for the selected period of time. The tool supports omitting and merging normal ranges so that suspicious port activities, attack patterns, and routing behaviors are revealed.

Vi-D Interaction Methods

Detection of anomalous network communication requires tracking & monitoring. Teoh et al.  [76] direct analysts’ attention to anomalies by highlighting the background gray. In the TVi [93] visual querying system, analysts select an item in the anomaly list, and then the associated time range is highlighted in the timeline visualization. In NVisAware [83], analysts can click the star icon to store the real-time sliding slice under investigation. The events marked with star icons are added to the same view. Analysts can determine suspicious patterns from flagged and labeled events from the starred time slices. There are four coordinated views in MVSec [80]. Interaction in one view is linked to visualization in another view, which is helpful for digging hidden network attacks that are hard to recognize.

Interesting network communication behaviors are found by exploring visual elements in the same scale or in multiple levels of granularity. VisAlert [86, 74] enables panning and zooming operations of the topology map in the ring. Analysts can also configure projections onto rings by collapsing and expanding alert grouping on rings. Tao et al.  [89] employs the direct-walk technique (i.e., a series of mouse clicks) for exploring anomalies. When an analyst notices a suspicious node, he/she clicks another node that contributes to the anomaly of the suspicious node. That is, the analyst extends examines effects on the node due to more nodes. Mansmann et al.  [96, 88] aggregate IP addresses according to prefix, autonomous system, country and continent in treemaps. Drill-down and roll-up functions can be applied for nodes of the same level of detail.

Interactive methods are used to unveil suspicious patterns of data. The filter dialogue in NVisionIP [85] restricts what data flows to be visualized. Analysts visualize network traffic according to the filters based upon the combination of IP address, ports, protocols, and display type. The visual analytics tool FloVis [79] has a bundle diagram that describes network flows between a source and a destination. Analysts can loosen the bundles to find suspicious attack patterns. Additionally, analysts can choose to linearly distort points on the circle of the bundle diagram. Mansmann et al.  [96, 88] color data in treemaps in a linear or logarithmic scale. Coloring in the logarithmic scale makes the visualization resistant to the randomness of data. Teoh et al.  [82] use a painting program to help categorize the same type of anomalies into one group. Analysts interactively arrange data instances through drawing, partition, and appropriate coloring.

Analysts may keep a record of results for further analysis. The intrusion detection tool NIVA [95] allows analysts to export results in an ASCII format. VIAssist [90] is designed for collaborative working environments. The report builder in the visualization tool allows analysts to drag and drop graphical objects in the current display. The results with annotations can then be saved as PowerPoint or PDF file. MVSec [80] simplify analysts’ operation by offering frequently-used configuration files for anomaly detection. Analysts can export their configurations as a new configuration file.

VIAssist [90] has an expression builder and E-Diary to fulfill the refinement & identification task. Analysts can formulate a hypothesis about a suspicious activity into an expression. A catalog of expressions collects knowledge, i.e., hypotheses made by analysts during analysis. The E-Diary helps documentation of hypotheses. This encourages sharing annotations with colleagues and communication of hypotheses in a group. Analysts can annotate suspicious patterns in SpiralView [87] for long-term analysis and policy’s assessment. The annotations can be an explanation for the anomalies and the action applied to the system.

Vii Transaction

Transaction refers to monetary flows in buying and selling. The goal is to connect financial sources to companies or individuals. In a broad sense, stock market deals [97], credit card transactions [98], business processes [99, 100] are under this category. Frauds are the typical type of anomalies associated with transactions, as people may be allured by monetary benefits to perform illegal transactions. Clients may collude with employees in financial institutes in activities of money laundering, unauthorized transactions, and embezzlement, etc. [101]. Other anomalies include unexpected business processes [102, 100] and high default group in a network of guaranteed loans [103].

Vii-a Data Types

Spatiotemporal data describes details of location, timestamps of transaction, and time series of events. Spatioemporal analysis is critical in financial analysis, and thus detection of anomalous transactions often incorporates analysis of geographic locations and time series. Attributes including time of transaction [100, 98], how often a customer executes operations [104] and geographic regions [105, 101] provide a foundation for first-step analysis. For example, the Event Tunnel [100] conducts temporal correlation to link seemingly isolated events, and thus business patterns and fraud patterns involving more than one individual [97] can be uncovered. Huang et al.  [97] perform spatial correlation in addition to temporal and spectral (based on frequency) to identify suspected traders and attack plans.

Multidimensional data is often used in conjunction with spatiotemporal data to detect anomalous transactions. By probing into time series along with details of the amount of money transferred [101, 98, 105], the number of transactions within a period of time [99, 105], and number of the activities that are new to the user [106], analysts can gain an overall picture of the histories of financial transactions. An example of using multidimensional and spatiotemporal information is VisImpact [105]. VisImpact correlates variables of purchase quarter (i.e., temporal details), fraud amount, and fraud count to reveal relationships among important factors. Legg [106] identifies insider threats in an organization by inspecting multidimensional data including the number of times that the user performs particular tasks, number of these activities that are new to this user and to any user in this same position.

Network data describes relationships among entities involved in transactions. A network can be links between traders [97] in trading networks, between entities such as people, companies, and banks [107], and between enterprises that take loan guarantee [103]. For example, Niu et al.  [103] consider high default groups as communities in networks. A community that interacts with each other internally more frequently than those outside of it can trigger serious financial losses. Didimo et al.  [107] analyze categorical networks that contain different types of entities to discover financial crimes. Indices such as the centrality of a node, like betweenness, and node degree are measured to indicate anomalousness.

When analyzing transaction behavior, categories derived from text help describe the relationship between a payer and a payee [108, 109], label different types of activities conducted by employees [106], and identify the type of state changes in a business process [100]. Text data is used to distinguish between senders, intermediates, and receivers in financial transactions, and to build profiles for analyzing their potential suspicious behaviors. For example, WireVis [108, 109] extracts pre-defined keywords from a set of transactions and relates the keywords that appear in the same transaction. Keyword-to-account relationship is analyzed based on the number of time the keywords appear in that transaction. Jigsaw [110] help identify any linkages between people or companies relevant to financial frauds such as fictitious suppliers’ invoices and systematic deletion of suppliers’ invoices. These linkages are found by keyword/sentence summaries of transactions, sentiment, and word clouds of a document.

Fig. 6: Visualizations of anomalous transaction behaviors. (a) Argyriou et al.  [111] use a multi-layer radial drawing to describe activities between employees and clients. (b) Niu et al.  [103] assess risk of guaranteed loans by visualizing networks of small and medium enterprises groups using a node-link visualization. (c) Leite et al.  [101] design user-friendly views of chart visualizations and parallel coordinates to help identify anomalous transactions.

Vii-B Anomaly Detection Techniques

Statistical methods applied to the detection of suspicious transactions build normal profiles of customers, and then evaluate new transactions against known anomalies in historical data. Huang et al.  [97] match suspected patterns in spatial, temporal, and spectral (i.e., frequency) domains with similar patterns seen in historical databases, which act as anomalous signatures. Leite et al.  [104] first build customer profiles from their frequency, amount, and location of transfer from histories. New transactions are then evaluated against the profiles to see if they are anomalous. The visualization tool EVA [101] generates customer profiles and provides different statistical measures for new transactions. The statistical profiles combine histograms and rules specified by experts to provide references. Sudden behavior changes in comparison to the profiles are identified as suspicious. Anomalies are highlighted if anomaly scores exceed a threshold.

Application of clustering-based techniques is based on the assumption that anomalous financial communities share common features within a group. WireVis [108, 109] implements the k-d tree algorithm to detect suspicious behaviors in wire transactions. It treats accounts as points in k-dimensional space, where k is the number of attributes. The accounts are grouped using a centroid-based clustering technique. Schaefer et al.  [112] cluster entries based on similarity of temporal event patterns so that analysts can identify suspicious patterns in a packed visualization. An event pattern refers to an event sequence or event episode that displays interesting properties. Didimo et al.  [107] apply hierarchical clustering by finding k-cores in a graph, which is effective for discovering relevant groups in networks. This graph-based clustering defines clusters of cohesive structures, in which each cluster has at least k inter-connected neighboring points. Clustering based on graph structure is used in Network Explorer [113]. Communities in the financial network can be identified as clusters converted from undifferentiated nodes and edges. Two clustering algorithms are employed to process large-scale networks on the server side and process smaller networks on the client side.

Classification-based, nearest neighbor-based, information theoretic, and spectral techniques are discussed below. Olszewski [98] uses a threshold-type binary classification technique to determine whether an account in self-organizing maps (SOM) is fraudulent or not. The threshold is computed by measuring dissimilarity between the centroid of SOM grid and the maximal value in the matrix. Accounts with values higher than the threshold are anomalous. A decision tree [111] is generated from the patterns suggested by auditors. To detect internal frauds conducted by employees, each employee is assigned an anomaly value. The value indicating the severity of anomalousness is obtained by evaluating event series of an employee against fraud patterns. Structured networks reveal anomalies. Two risk indices [114] based on neighborhood structure, i.e., pattern centrality and transaction pattern centrality, are computed by assigning weights to each edge that corresponds to a taxpayer in a transaction network. Niu et al.  [103] employ an information theoretic-based approach to uncover risk guarantee pattern and detect high default groups for loans risk management. Specifically, the proxy for information flow is the probability flow of random walks in directed weighted networks. PCA is utilized for identifying insider threat [106] due to its effectiveness in detecting users that exhibit irregular variances across the set of derived features. An interactive PCA helps comprehend relationships between the PCA space and the original higher-dimensional space in a visual interface.

Vii-C Visualization Techniques

Egocentric Behaviors. An egocentric transaction is described as buying or selling behaviors conducted by an individual. An anomalous egocentric transaction can be an unauthorized transaction or a deal with an exceptionally high amount of value. Detection of these behaviors mainly uses sequence visualization.

VisImpact [99, 105] organizes attributes of transactions by allocating them onto three parts/axes of a ring: left semicircle, bisector, and right semicircle. Each axis stands for an attribute of interest (e.g., region, client, fraud amount, fraud count). Suntinger et al.  [100] display events as nodes in a cylindrical tunnel. The top view of the cylinder represents historical events, which are laid out such that more recent events are in the outer ring. Details of events are encoded by the color and size of glyphs of the Event Tunnel. Anomalous betting behaviors of a user are discovered by temporally correlating the account history events of the user to known suspicious account profiles. Argyriou et al.  [115] study the temporal relationship of transactions between a pair of client and employee in a radar chart. The nodes in the radar chart represent transactions, which are positioned according to the time of action, pre-defined periodicity, and ordering of timelines. Events/transactions related to the same client along the radius of the radar chart are considered suspicious, as the patterns suggest the employee falsifies the client’s invoices.

Graph and text visualizations are also used to demonstrate suspicious egocentric transaction behaviors. Argyriou et al.  [111] (Figure 6 (a)) use a multi-layer radial drawing to describe activities between employees and clients. Each layer represents a pattern that is suspicious in different aspects (e.g., actions, systems, periodicity), with heat maps in the side view measure anomalousness. When an employee is found to perform events that share similarity with fraud patterns, a suspicious egocentric behavior is identified. Jigsaw [110] mines relationships between entities in text documents. The parallel coordinates view reveals the correlation of selected attributes (e.g., company, person). By combining with the heat map for sentiment/similarity analysis, cluster view for grouping similar documents, and document view for details, anomalous behaviors can be detected from unique text entities. Following that work, Kang et al.  [116] studies applications of Jigsaw in various situations including financial transaction. An employee’s egocentric behavior of creating fictitious supplier invoices was discovered.

Collective Behaviors. A collective transaction behavior involves several parties in transaction and businesses. Collective transaction behaviors include a series of wire transfer and periodic transaction. Graph visualizations are popular among research works interested in transaction behaviors.

Graph visualization is popular for uncovering collective transaction anomalies. Huang et al.  [97] develop two stages to inspect stock market security. Firstly, market performance is evaluated using three-dimensional treemaps, with the heights of blocks indicating the current price of stocks. Secondly, trading networks are compared against suspicious patterns in the historical database. Structured networks are regarded as collective anomalies in transactions. Several visual analytics tools [107, 114, 103, 113] develop categoric node-link visualizations where analysts can merge, split, define new subgraph structure, cluster nodes by top-down or bottom-up paradigm, and adjust node sizes by a chosen measurement. Users edit networks interactively to discover communities, which are signals for suspicious financial transactions. Didimo et al.  [107] detect financial activity networks such as money laundering by illustrating entities involved in transactions with nodes. The entities include banks, companies, persons, bank accounts, transactions, and reports filing. Edges between nodes represent semantic connections. For instance, two disjoint clusters that indicate fraudulent patterns are revealed after clustering. The level of depth of a cluster reflects the extent of criticism of the illegal activity. Niu et al.  [103] (Figure 6 (b)) assess the risk of guaranteed loans by visualizing networks of small and medium enterprises groups which back each other to enhance the financial security. Anomalies, i.e., high default groups, are identified as communities in the network using a node-link visualization. A complementary treemap supports navigation of labels/categories and presentation of default rates.

Chart and sequence visualizations are also used to detect collective transaction behaviors. WireVis [108, 109] uses multiple coordinated chart visualization to analyze suspicious wire transfers between a payer to a payee via a chain of intermediaries. The overall trends of activities and individual transactions are represented by strings and beads in an x-y plot of transaction value against time. Suspicious transactions are the ones relevant to a keyword that is only found in the second half of the year, and a transaction of much higher value than others. Leite et al.  [101] (Figure 6 (c)) design user-friendly views of chart visualizations and parallel coordinates to help identify the anomalous connection between the amount and the suspicious transactions. If anomaly scores of transactions deviate from normal ranges, the days that contain at least one suspicious transaction are highlighted in red.

Vii-D Interaction Methods

Analysts track suspicious data by highlighting and correlating relevant data. The visual analytics tool EVA [101] computes the overall anomaly scores and sub-scores according to different standards. If the overall score of transactions exceeds a threshold, the transactions are highlighted in red in the parallel coordinates view. Also, selection in another coordinated chart highlights associated transactions and gray out others in the parallel coordinates view. When analysts click a node of interest, relevant data that are originally not visualized is displayed [107]. This helps analysts discover interesting features that are not apparent from one view, and identify different relationships between data instances. A similar operation is seen in [111], where the selection of one node adds related employees (i.e., nodes) into the visualization. Thus, frauds carried out by two or more employees can be tracked.

VisImpact [99, 105] supports simultaneous browsing and navigation of multiple nodes. Details of a single node representing a transaction record can be obtained using the drill-down function. For the transaction of an account, transactions can be aggregated in terms of day, week, or month in WireVis [108, 109]. Zooming is enabled in the heat map and temporal chart view. One can also drill down to individuals and compare their records against each other in WireVis. Network Explorer [113] includes an overview and an egocentric mode which detects important clusters and individual nodes, respectively. In the overview mode, analysts can navigate to one cluster and compute sub-communities on demand. In the egocentric mode, analysts navigate nodes using the direct-walk from a starting point.

Pattern discovery is often used to help identify anomalous behaviors. Filtering in WireVis [108, 109] is conducted using a set of keywords and criteria like amounts of words. Analysts can select reasonably sized subsets for re-clustering to generate clusters that exhibit interesting features. Furthermore, the color scheme is chosen depending on the characteristic (e.g., sequential or diverging) of the measurement in the heat map. Jigsaw [110] allows involvement in defining clusters of text documents, removing false positives, adjusting the number of words shown, and reordering the entity list. Dragging, merging, and splitting visual elements are often seen in node-link visualization [107, 114, 103, 113]. To discover the tax evasion behaviors [114], analysts can merge and split node-link representation. A selection of subgraphs is ranked according to criteria such as the total amount of economic transactions or the risk index. Additionally, analysts can define and draw suspicious graph patterns using pre-defined operators.

A few visualization tools support exporting analyzed results. The Event Tunnel [100] contains a snapshot management console that captures the current state and configuration. Argyriou et al.  [115, 111] design the exporting function in the visual analytics tools for detecting occupational frauds. The ranking results of anomalousness can be exported in separate log files. The visualization containing suspicious transaction patterns can be stored for post-analysis.

Visual analytics involve domain knowledge into the process of anomaly detection. Analysts are enabled to reassign labels of the “structure hole spanner” during interactive exploration [103]. The structure hole spanner interlinks different communities in a network, which can be modified through merging and splitting operations. High default groups are found to be associated with these labels. In TAXNET [114], analysts can define graph patterns based on their understanding of tax evasion frauds. Textual labels are attached to the graphs to describe rules for nodes (i.e., taxpayers) or edges (i.e., relationship).

Viii Discussion and Outlook

In this section, we first summarize trends of research interest in the community of data visualization regarding anomalous user behaviors. We then discuss our findings regarding data types, anomaly detection techniques, visualization techniques, and interaction methods across different user behaviors.

Viii-a Visual Analytics of Anomalous User Behaviors

Visual analytics of private social interaction behaviors related to emailing received substantial attention in 2000s but showed significant decreases since then. Recent research works [117, 26] are more interested in the social network structures found in emailing, calling behaviors. A clear trend worth noticing is the popularity in analyzing public social interaction behaviors related to posting in social media since 2010. The volume of social media data ensures wide coverage of people’s behaviors including anomalous and normal behaviors. Application to real world is attractive from the perspective of social science and possibly more. We have seen many visualization tools that address event detection from massive information, information spreading, and identification of social bots. However, to the best of our knowledge, we found that only a few visualization works [45] focuses on secretive or collusive anomalous behaviors, when compared to machine learning approaches [3] that detect suspicious behaviors. Specifically, we have not seen visual analytics methods for detecting social Sybil attacks (i.e., astroturfing) [118] or private information inference [119] related to the posting behavior. We are hoping to see more efforts to be put in discovering anomalous behaviors conducted in a collusive, secretive manner.

As for network communication, the research interest remains relatively strong, though classical works [78, 85] that analyze this behavior are mostly published in 2000s. Visual analytics of network communication focuses on aggregating different levels of data as well as real-time monitoring. Aggregation of data is often used to monitor high-level structures of networks and at the same time, to visualize anomalies in an interface of limited space. As data sources of audit logs and network traffic provide detailed and systematic information, attacks are often traceable to individual machines even though malicious activities originate from more than one device. In addition, the preference for real-time or near-real-time monitoring in intrusion detection [120] is emphasized, manifested by the realization of analyzing streaming data in many visualizations. This results from the need for timely detection of malicious attacks. As computing abilities advance, we expect to see more visualization tools that can handle streaming data.

Travel receives continuous attention of researchers given that more data is available for analysis (mobile phones [69], geo-located messages [121], maritime search and rescue events [70]). Though visualization techniques used for analyzing travel behaviors are similar (i.e., geographic visualization), a rich set of interaction methods is implemented in order to detect and comprehend anomalies [69, 62]. By analyzing patterns in user-specified spatial and temporal ranges, analysts study user behaviors in multiple levels of granularity to and fro, and gradually develop their understanding during interactive exploration. As more and more sensors are available in daily life, we hope to finer segmentation of groups of people to offer an accurate description of travel patterns.

Visualization works regarding anomalous transaction behaviors modernizes traditional visual methods in the financial field. For example, EVA [101] integrates human decisions into the analysis of frauds into the existing alert system. In recent years, we have seen an increased number of visualization tools designed for detecting suspicious users involved in financial transactions. However, by comparing the average number of citations between user behaviors, the overall research interest in financial transactions is less than those in travel behaviors, for example. Privacy issues can largely limit the resources available for research. Having said that, we are hoping to see more in-depth collaboration between academic researchers and financial institutes to resolve transaction frauds by recognizing fraudsters’ behaviors.

Viii-B Data Types

Application of multidimensional data to anomaly detection can be found across four behaviors. It offers a variety of features for detecting anomalous behaviors and is often used in conjunction with other data types. Text is an important data type for detecting abnormal social interaction behaviors, whereas text is a compliment in the analysis of other user behaviors. Text provides information about identities and backgrounds of objects involved, which is used to categorize objects. Network is used frequently in the analysis of network communication as well as social interaction behaviors. Links exist in cyber networks between sources and destinations, and social networks between senders and receivers. Spatiotemporal information enriches skeletons of analysis by incorporating contextual information of users’ travel behaviors. Detection of anomalous transaction and social interaction behaviors often incorporates temporal analysis.

Analysis based on data types helps indicate overlapping areas between user behaviors, which is a signal of borrowing analytics approaches from other behaviors. For example, exploration of rating behaviors in online e-commerce stores is similar to that of network security problems. Sensitivity to time-critical behaviors in anomaly detection is emphasized in [44], in which streaming data is processed. Network between sources and destinations is found in network communication, whilst network between users and items is also important for discovering rating frauds. We see a trend of incorporating multiple types of data. Since anomaly detection problems often encounter unknown ill-defined anomalies, usage of all four data types can create a relatively thorough picture for investigation.

Viii-C Anomaly Detection Techniques

Statistical techniques are most widely used. The principle of employing statistical techniques is more intuitive compared to the other techniques: data that are not described by the known distribution are anomalous. For example, a majority of network communication behaviors are studied using statistical techniques. Detection techniques for cyber attacks are classified into signature-based and anomaly-based [10], both of which can be applied with statistical-based approaches. Clustering-based techniques are often used in studying travel and transaction behaviors. Clustering is often employed to tackle large-scale databases associated with travel behaviors. Clustering methods in transactions divide customers into groups based on the assumption that abnormal behaviors are found outside the clusters. Nearest neighbor-based techniques are applied to detecting anomalous social interaction behavior. For example, in graphs composed of senders and/or receivers in associated with emailing and calling, anomaly scores are computed from distance or densities.

We expect to see more visualization tools to employ anomaly detection techniques such as machine learning approaches in the future. The effectiveness of machine learning methods in visualization is well-recognized [122]. Though the time interval between the release of detection techniques and the implementation in visualization might be long (e.g., a five-year interval for FraudVis [45] to apply the CopyCatch [123] algorithm), we believe machine learning techniques are of great value for anomaly detection in visualization. Recently, Chalapathy and Chawla survey [12] deep learning techniques for anomaly detection. For example, Malhotra et al.  [124]

develop a Long Short Term Memory Networks based Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) that is able to uncover predictable, unpredictable, periodic, and aperiodic in long and short time series. Anomalies in multivariate time-series data are uncovered using a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED) 

[125], which can capture dynamics and encode the inter-correlations between different pairs of time series.

Viii-D Visualization Techniques

Among graph visualization, node-link diagram is mostly used in social interaction, transaction, and network communication. Node-link diagram is advantageous in its traceability from one node to the other. It is capable of tracking down to abnormal individuals from email and call records, to individual machines in malicious cyber attacks, and to a pair of employee and client in financial frauds. Text visualization is favored in the analysis of public social interaction behaviors such as posting. These visualization tools are usually equipped with views showing text data to enable interactive exploration and affirmation of suspicious events or users. For example, to complement inspection of microblogs, original messages and keywords are often found in a table format or tag clouds [126, 34]. Detection of anomalous transaction behaviors also uses sequence visualization such as parallel coordinates. Variations of the relationship between subsequent events can be tracked by changes of linkage between two successive axes, which suggest suspicious transactions occurred. Varied configurations of parallel coordinates include radar chart and Sankey diagram. To illustrate social interaction behaviors, changes of heights and size of bubbles in timeline visualization are used to encode sudden and/or important changes in the volume of keywords. Geographic visualization is often used to represent travel behaviors as it has the advantage of illustrating two-dimensional physical movement. Flows and bubbles projection on a map show differences in traveling directions and spatial densities of distribution. Heat map is popular to demonstrate spatial densities of humans and vehicles, as it minimizes visual occlusion that may happen in flows/bubbles projection on maps. Chart visualization is effective in illustrating well-understood anomalies as long as dimensions of the displays are selected properly.

We also found that the number of visualization works that address egocentric behavior forms is much fewer than those studying collective behavior forms. Glyph visualization is suited to visualizing egocentric behaviors as differences in individuals’ roles can be identified more efficiently. Visualizations of collective behaviors take a variety of representations To better explain, we use an example in social media where the same user behavior results in problems viewed from egocentric and collective perspectives, respectively. Both FluxFlow [32] and Episogram [52] analyze retweeting behaviors in Twitter. FluxFlow emphasizes the information diffusion process and visualizes temporal evolution of a group of retweeted microblogs using packed colored circles. Episogram, on the other hand, considers whether a Twitter account is anomalous by comparing one’s individual retweeting patterns with others’. A user is represented as a glyph, which is later found to be used as a typical visualization for egocentric behavior form.

The trend of applying visualization techniques to detecting anomalous user behaviors is summarized as follows. Node-link diagram has long been a popular choice of visualizing anomalous user behaviors. It is still a favored technique as it is effective to present an overall structure as well as detailed information when incorporated with rich interaction techniques. Circular-based designs are gaining attention from researchers for its ability to show connections in a packed visualization, where hierarchical structure is displayed using bundles and tree layout inside the ring. Also, circular-based designs usually represent structures of larger-scale than those (e.g., stars, cliques) in node-link structures.

We observed an increasing trend of using heat maps when compared to flows/bubbles/3D projection on a map. The reason may be that flows/bubbles/3D map result in visual occlusion, which can only be resolved with appropriate interaction techniques. The opposite trend to that of heat map can be explained by its potential to visualize large-scale data with geographic references. It is able to encode some degree of geographical information, and at the same time, variables such as density of users, anomaly degree can be encoded on the map without occlusion. Interest in applying chart visualization has decreased in recent years. Chart visualization is restricted to a few variables, which is ineffective in anomaly detection when an analysis of multiple variables is required.

Viii-E Interaction Methods

Exploration & navigation has been the most popular interaction task in visual analytics of anomalous user behaviors. Most visualization tools support users to gain a high-level summary of large-scale data first and then drill down to anomalies on request. The second most popular interaction task is tracking & monitoring. As the papers surveyed are related to anomaly detection, keeping track of suspicious spots is important during interactive exploration. Analysts also highlight data of interest to show its correlation between in the coordinated views, which helps form a picture of where anomalies originate from. Pattern discovery is also frequently used. During the process, the visual representation of data changes accordingly. These updates of one’s knowledge drive analysts to construct hypotheses of anomalies.

We observe trends of utilizing interaction tasks in different user behaviors. Visualization works that study travel behavior often incorporate exploration & navigation in map visualization. The reason is that panning on a map is seen often when tracking physical movement [59, 72]. Pattern discovery illustrates more than one abnormal feature of anomalies by changing color spectrum and representing traveling patterns in various forms on a map [47, 62]. Also, filtering by keywords is seen in social interaction [44, 34, 27, 127] where textual contents are important for determining anomalies. Knowledge externalization is usually seen in network communication [90, 80] and transactions [108, 111]. This interaction task enables the processed results to be outputted for further analysis and validation with domain experts.

We increasingly see visualization tools involve refinement & identification in rendering visualization. This type of interaction goes beyond the definition of interaction methods [24] because adjustments in anomaly detection algorithms are allowed (e.g., Filter technique). Several research works allow analysts to adjust parameters in constructing queries [62, 38], changing thresholds of anomalies [44, 69], and updating feedback in anomalies [59]. Visual representation is modified due to fundamental calculation rather than the adjustment of visual encoding. These works facilitate visual analytics by involving human perception and interpretation into the computation process of anomaly detection, which is a deeper level of computer-human interaction than those identified in  [24].

Ix Conclusion

In this work, we present a survey of visual analytics of anomalous user behaviors. We analyze the related the-state-of-art according to the proposed taxonomies. Our survey suggests trends and preferences in data types, anomaly detection techniques, visualization techniques, and interaction methods. With these findings, we also highlight potential research directions. We believe our work shed light on understanding and analyzing anomalous user behaviors using visual analytics approaches.

X Acknowledgments

Nan Cao is the corresponding author. This research was sponsored in part by the Fundamental Research Funds for the Central Universities in China.

References

  • [1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Comput. Surv., vol. 41, pp. 15:1–15:58, 2009.
  • [2] L. Jin, Y. Chen, T. Wang, P. Hui, and A. V. Vasilakos, “Understanding user behavior in online social networks: A survey,” IEEE Communications Magazine, vol. 51, no. 9, pp. 144–150, 2013.
  • [3] M. Jiang, P. Cui, and C. Faloutsos, “Suspicious behavior detection: Current trends and future directions,” IEEE Intelligent Systems, vol. 31, no. 1, pp. 31–39, 2016.
  • [4] Y. Zheng, W. Wu, Y. Chen, H. Qu, and L. M. Ni, “Visual analytics in urban computing: An overview,” IEEE Transactions on Big Data, vol. 2, no. 3, pp. 276–296, 2016.
  • [5] S. Chen, L. Lin, and X. Yuan, “Social media visual analytics,” in Computer Graphics Forum, vol. 36, no. 3.   Wiley Online Library, 2017, pp. 563–587.
  • [6] Y. Wu, N. Cao, D. Gotz, Y.-P. Tan, and D. A. Keim, “A survey on visual analytics of social media data,” IEEE Transactions on Multimedia, vol. 18, no. 11, pp. 2135–2148, 2016.
  • [7] S. Ko, I. Cho, S. Afzal, C. Yau, J. Chae, A. Malik, K. Beck, Y. Jang, W. Ribarsky, and D. S. Ebert, “A survey on visual analysis approaches for financial data,” in Computer Graphics Forum, vol. 35, no. 3.   Wiley Online Library, 2016, pp. 599–617.
  • [8] H. Shiravi, A. Shiravi, and A. A. Ghorbani, “A survey of visualization systems for network security,” IEEE Transactions on visualization and computer graphics, vol. 18, no. 8, pp. 1313–1329, 2012.
  • [9] V. Lavigne and D. Gouin, “Visual analytics for cyber security and intelligence,” The Journal of Defense Modeling and Simulation, vol. 11, no. 2, pp. 175–199, 2014.
  • [10] A. Patcha and J.-M. Park, “An overview of anomaly detection techniques: Existing solutions and latest technological trends,” Computer Networks, vol. 51, pp. 3448–3470, 2007.
  • [11] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,” Data Mining and Knowledge Discovery, vol. 29, pp. 626–688, 2014.
  • [12] R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,” arXiv preprint arXiv:1901.03407, 2019.
  • [13] A. Litan and M. Nicolett, “Market guide for user behavior analytics,” URL: https://www.gartner.com/doc/2831117/market-guide-user-behavior-analytics, 2014, accessed 2019-01-15.
  • [14] M. Rouse and M. Bacon, “user behavior analytics (UBA) search security,” URL: https://searchsecurity.techtarget.com/definition/user-behavior-analytics-UBA, 2017, accessed 2019-01-15.
  • [15] H. Yeon, S. Kim, and Y. Jang, “Predictive visual analytics of event evolution for user-created context,” Journal of Visualization, vol. 20, no. 3, pp. 471–486, 2017.
  • [16] R. Balakrishnan and K. Ranganathan, A textbook of graph theory.   Springer Science & Business Media, 2012.
  • [17] K. C. Cox, S. G. Eick, G. J. Wills, and R. J. Brachman, “Brief application description; visual data mining: Recognizing telephone calling fraud,” Data Mining and Knowledge Discovery, vol. 1, no. 2, pp. 225–231, 1997.
  • [18] F. B. Viégas, D. Boyd, D. H. Nguyen, J. Potter, and J. Donath, “Digital artifacts for remembering and storytelling: Posthistory and social network fragments,” in System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on.   IEEE, 2004, pp. 10–pp.
  • [19] P. Gatalsky, N. Andrienko, and G. Andrienko, “Interactive analysis of event data using space-time cube,” in Information Visualisation, 2004. IV 2004. Proceedings. Eighth International Conference on.   IEEE, 2004, pp. 145–152.
  • [20] C. Weaver, D. Fyfe, A. Robinson, D. Holdsworth, D. Peuquet, and A. M. MacEachren, “Visual exploration and analysis of historic hotel visits,” Information Visualization, vol. 6, no. 1, pp. 89–103, 2007.
  • [21] R. F. Erbacher, K. L. Walker, and D. A. Frincke, “Intrusion and misuse detection in large-scale systems,” IEEE Computer Graphics and Applications, vol. 22, no. 1, pp. 38–47, 2002.
  • [22] R. Xiong and J. Donath, “Peoplegarden: creating data portraits for users,” in Proceedings of the 12th annual ACM symposium on User interface software and technology.   ACM, 1999, pp. 37–44.
  • [23] F. B. Viégas, M. Wattenberg, and K. Dave, “Studying cooperation and conflict between authors with history flow visualizations,” in Proceedings of the SIGCHI conference on Human factors in computing systems.   ACM, 2004, pp. 575–582.
  • [24] J. S. Yi, Y. ah Kang, J. T. Stasko, J. A. Jacko et al., “Toward a deeper understanding of the role of interaction in information visualization,” IEEE Transactions on Visualization & Computer Graphics, no. 6, 2007.
  • [25] S. van den Elzen, D. Holten, J. Blaas, and J. J. van Wijk, “Reordering massive sequence views: Enabling temporal and structural analysis of dynamic networks,” in Visualization Symposium (PacificVis), 2013 IEEE Pacific.   IEEE, 2013, pp. 33–40.
  • [26] ——, “Dynamic network visualization withextended massive sequence views,” IEEE Transactions on Visualization & Computer Graphics, no. 8, pp. 1087–1099, 2014.
  • [27] A. Perer and B. Shneiderman, “Balancing systematic and flexible exploration of social networks,” IEEE transactions on visualization and computer graphics, vol. 12, no. 5, pp. 693–700, 2006.
  • [28] J. Koven, C. Felix, H. Siadati, M. Jakobsson, and E. Bertini, “Lessons learned developing a visual analytics solution for investigative analysis of scamming activities,” IEEE transactions on visualization and computer graphics, 2018.
  • [29] X. Fu, S.-H. Hong, N. S. Nikolov, X. Shen, Y. Wu, and K. Xuk, “Visualization and analysis of email networks,” in Visualization, 2007. APVIS’07. 2007 6th International Asia-Pacific Symposium on.   IEEE, 2007, pp. 1–8.
  • [30] P. A. Gloor and Y. Zhao, “Tecflow-a temporal communication flow visualizer for social networks analysis,” in ACM CSCW Workshop on Social Networks, vol. 6, 2004.
  • [31] C. Muelder and K.-L. Ma, “Visualization of sanitized email logs for spam analysis,” in Visualization, 2007. APVIS’07. 2007 6th International Asia-Pacific Symposium on.   IEEE, 2007, pp. 9–16.
  • [32] J. Zhao, N. Cao, Z. Wen, Y. Song, Y.-R. Lin, and C. Collins, “# fluxflow: Visual analysis of anomalous information spreading on social media,” IEEE transactions on visualization and computer graphics, vol. 20, no. 12, pp. 1773–1782, 2014.
  • [33] P. Resnick, S. Carton, S. Park, Y. Shen, and N. Zeffer, “Rumorlens: A system for analyzing the impact of rumors and corrections in social media,” in Proc. Computational Journalism Conference, 2014.
  • [34] N. Cao, C. Shi, S. Lin, J. Lu, Y.-R. Lin, and C.-Y. Lin, “Targetvue: Visual analysis of anomalous user behaviors in online communication systems,” IEEE transactions on visualization and computer graphics, vol. 22, no. 1, pp. 280–289, 2016.
  • [35] C. Shao, G. L. Ciampaglia, O. Varol, K. Yang, A. Flammini, and F. Menczer, “The spread of low-credibility content by social bots,” arXiv preprint arXiv:1707.07592, 2017.
  • [36] A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller, “Twitinfo: aggregating and visualizing microblogs for event exploration,” in Proceedings of the SIGCHI conference on Human factors in computing systems.   ACM, 2011, pp. 227–236.
  • [37] D. Thom, H. Bosch, S. Koch, M. Wörner, and T. Ertl, “Spatiotemporal anomaly detection through visual analysis of geolocated twitter messages,” in Visualization Symposium (PacificVis), 2012 IEEE Pacific.   IEEE, 2012, pp. 41–48.
  • [38] H. Bosch, D. Thom, F. Heimerl, E. Püttmann, S. Koch, R. Krüger, M. Wörner, and T. Ertl, “Scatterblogs2: Real-time monitoring of microblog messages through user-guided filtering,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2022–2031, 2013.
  • [39] A. Pozdnoukhov and C. Kaiser, “Space-time dynamics of topics in streaming text,” in Proceedings of the 3rd ACM SIGSPATIAL international workshop on location-based social networks.   ACM, 2011, pp. 1–8.
  • [40] P. A. Gloor, S. Niepel, and Y. Li, “Identifying potential suspects by temporal link analysis,” University of Cologne, 2006.
  • [41] J. Echeverria and S. Zhou, “Discovery, retrieval, and analysis of the’star wars’ botnet in twitter,” in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017.   ACM, 2017, pp. 1–8.
  • [42] M. Krstajic, E. Bertini, and D. Keim, “Cloudlines: Compact display of event episodes in multiple time-series,” IEEE transactions on visualization and computer graphics, vol. 17, no. 12, pp. 2432–2439, 2011.
  • [43] J. Chae, D. Thom, H. Bosch, Y. Jang, R. Maciejewski, D. S. Ebert, and T. Ertl, “Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition,” in Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on.   IEEE, 2012, pp. 143–152.
  • [44] K. Webga and A. Lu, “Discovery of rating fraud with real-time streaming visual analytics,” in Visualization for Cyber Security (VizSec), 2015 IEEE Symposium on.   IEEE, 2015, pp. 1–8.
  • [45] J. Sun, Q. Zhu, Z. Liu, X. Liu, J. Lee, Z. Su, L. Shi, L. Huang, and W. Xu, “Fraudvis: Understanding unsupervised fraud detection algorithms,” in Pacific Visualization Symposium (PacificVis), 2018 IEEE.   IEEE, 2018, pp. 170–174.
  • [46] W. Dou, X. Wang, D. Skau, W. Ribarsky, and M. X. Zhou, “Leadline: Interactive visual analysis of text data through event identification and exploration,” in Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on.   IEEE, 2012, pp. 93–102.
  • [47] J. Chae, D. Thom, Y. Jang, S. Kim, T. Ertl, and D. S. Ebert, “Public behavior response analysis in disaster events utilizing visual analytics of microblog data,” Computers & Graphics, vol. 38, pp. 51–60, 2014.
  • [48] Z. Shen and K.-L. Ma, “Mobivis: A visualization system for exploring mobile data,” in Visualization Symposium, 2008. PacificVIS’08. IEEE Pacific.   IEEE, 2008, pp. 175–182.
  • [49] C. Li, Y. Wang, P. Resnick, and Q. Mei, “Req-rec: High recall retrieval with query pooling and interactive classification,” in Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval.   ACM, 2014, pp. 163–172.
  • [50] P. A. Gloor, R. Laubacher, S. B. Dynes, and Y. Zhao, “Visualization of communication patterns in collaborative innovation networks-analysis of some w3c working groups,” in Proceedings of the twelfth international conference on Information and knowledge management.   ACM, 2003, pp. 56–60.
  • [51] D. Luo, J. Yang, M. Krstajic, W. Ribarsky, and D. Keim, “Eventriver: Visually exploring text collections with temporal references,” IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 1, pp. 93–105, 2012.
  • [52] N. Cao, Y.-R. Lin, F. Du, and D. Wang, “Episogram: Visual summarization of egocentric social interactions,” IEEE computer graphics and applications, vol. 36, no. 5, pp. 72–81, 2016.
  • [53] U. Brandes, P. Kenis, J. Lerner, and D. Van Raaij, “Network analysis of collaboration structure in wikipedia,” in Proceedings of the 18th international conference on World wide web.   ACM, 2009, pp. 731–740.
  • [54] F. B. Viégas, S. Golder, and J. Donath, “Visualizing email content: portraying relationships from conversational histories,” in Proceedings of the SIGCHI conference on Human Factors in computing systems.   ACM, 2006, pp. 979–988.
  • [55] W.-J. Li, S. Hershkop, and S. J. Stolfo, “Email archive analysis through graphical visualization,” in Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security.   ACM, 2004, pp. 128–132.
  • [56] R. Lee and K. Sumiya, “Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection,” in Proceedings of the 2nd ACM SIGSPATIAL international workshop on location based social networks.   ACM, 2010, pp. 1–10.
  • [57] F. Morstatter, S. Kumar, H. Liu, and R. Maciejewski, “Understanding twitter data with tweetxplorer,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2013, pp. 1482–1485.
  • [58] F. B. Viégas and M. Smith, “Newsgroup crowds and authorlines: Visualizing the activity of individuals in conversational cyberspaces,” in System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on.   IEEE, 2004, pp. 10–pp.
  • [59] N. Cao, C. Lin, Q. Zhu, Y.-R. Lin, X. Teng, and X. Wen, “Voila: Visual anomaly detection and monitoring with streaming spatiotemporal data,” IEEE transactions on visualization and computer graphics, vol. 24, no. 1, pp. 23–33, 2018.
  • [60] Z. Liao, Y. Yu, and B. Chen, “Anomaly detection in gps data based on visual analytics,” in Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on.   IEEE, 2010, pp. 51–58.
  • [61] W. Wu, J. Xu, H. Zeng, Y. Zheng, H. Qu, B. Ni, M. Yuan, and L. M. Ni, “Telcovis: Visual exploration of co-occurrence in urban human mobility based on telco data,” IEEE transactions on visualization and computer graphics, vol. 22, no. 1, pp. 935–944, 2016.
  • [62] N. Ferreira, J. Poco, H. T. Vo, J. Freire, and C. T. Silva, “Visual exploration of big spatio-temporal urban data: A study of new york city taxi trips,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2149–2158, 2013.
  • [63] R. Beecham and J. Wood, “Characterising group-cycling journeys using interactive graphics,” Transportation Research Part C: Emerging Technologies, vol. 47, pp. 194–206, 2014.
  • [64] J. Pu, P. Xu, H. Qu, W. Cui, S. Liu, and L. Ni, “Visual analysis of people’s mobility pattern from mobile phone data,” in Proceedings of the 2011 Visual Information Communication-International Symposium.   ACM, 2011, p. 13.
  • [65] S. Kim, S. Jeong, I. Woo, Y. Jang, R. Maciejewski, and D. S. Ebert, “Data flow analysis and visualization for spatiotemporal statistical data without trajectory information,” IEEE transactions on visualization and computer graphics, vol. 24, no. 3, pp. 1287–1300, 2018.
  • [66] A. Malik, R. Maciejewski, B. Maule, and D. S. Ebert, “A visual analytics process for maritime resource allocation and risk assessment,” in Visual Analytics Science and Technology (VAST), 2011 IEEE Conference on.   IEEE, 2011, pp. 221–230.
  • [67] Z. Liao, L. Kong, X. Wang, Y. Zhao, F. Zhou, Z. Liao, and X. Fan, “A visual analytics approach for detecting and understanding anomalous resident behaviors in smart healthcare,” Applied Sciences, vol. 7, no. 3, p. 254, 2017.
  • [68] S. Ko, S. Afzal, S. Walton, Y. Yang, J. Chae, A. Malik, Y. Jang, M. Chen, and D. Ebert, “Analyzing high-dimensional multivariate network links with integrated anomaly detection, highlighting and exploration,” in Visual Analytics Science and Technology (VAST), 2014 IEEE Conference on.   IEEE, 2014, pp. 83–92.
  • [69] T. Von Landesberger, S. Bremm, N. Andrienko, G. Andrienko, and M. Tekusova, “Visual analytics methods for categoric spatio-temporal data,” in Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on.   IEEE, 2012, pp. 183–192.
  • [70] M. Riveiro and G. Falkman, “Interactive visualization of normal behavioral models and expert rules for maritime anomaly detection,” in 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization.   IEEE, 2009, pp. 459–466.
  • [71] R. Maciejewski, S. Rudolph, R. Hafen, A. Abusalah, M. Yakout, M. Ouzzani, W. S. Cleveland, S. J. Grannis, and D. S. Ebert, “A visual analytics approach to understanding spatiotemporal hotspots,” IEEE Transactions on Visualization and Computer Graphics, vol. 16, no. 2, pp. 205–220, 2010.
  • [72] N. Andrienko and G. Andrienko, “A visual analytics framework for spatio-temporal analysis and modelling,” Data Mining and Knowledge Discovery, vol. 27, no. 1, pp. 55–83, 2013.
  • [73] J. Lin, E. Keogh, and S. Lonardi, “Visualizing and discovering non-trivial patterns in large time series databases,” Information visualization, vol. 4, no. 2, pp. 61–82, 2005.
  • [74] S. Foresti, J. Agutter, Y. Livnat, S. Moon, and R. F. Erbacher, “Visual correlation of network alerts,” IEEE Computer Graphics and Applications, vol. 26, pp. 48–59, 2006.
  • [75] Q. Liao, A. Striegel, and N. Chawla, “Visualizing graph dynamics and similarity for enterprise network security and management,” in Proceedings of the seventh international symposium on visualization for cyber security.   ACM, 2010, pp. 34–45.
  • [76] S. T. Teoh, K. Zhang, S.-M. Tseng, K.-L. Ma, and S. F. Wu, “Combining visual and automated data mining for near-real-time anomaly detection and analysis in bgp,” in Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security.   ACM, 2004, pp. 35–44.
  • [77] F. Fischer, J. Fuchs, P.-A. Vervier, F. Mansmann, and O. Thonnard, “Vistracer: a visual analytics tool to investigate routing anomalies in traceroutes,” in Proceedings of the ninth international symposium on visualization for cyber security.   ACM, 2012, pp. 80–87.
  • [78] X. Yin, W. Yurcik, M. Treaster, Y. Li, and K. Lakkaraju, “Visflowconnect: netflow visualizations of link relationships for security situational awareness,” in Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security.   ACM, 2004, pp. 26–34.
  • [79] T. Taylor, D. Paterson, J. Glanfield, C. Gates, S. Brooks, and J. McHugh, “Flovis: Flow visualization system,” in Conference For Homeland Security, 2009. CATCH’09. Cybersecurity Applications & Technology.   IEEE, 2009, pp. 186–198.
  • [80] Y. Zhao, X. Liang, X. Fan, Y. Wang, M. Yang, and F. Zhou, “Mvsec: multi-perspective and deductive visual analytics on heterogeneous network security data,” Journal of Visualization, vol. 17, no. 3, pp. 181–196, 2014.
  • [81] J. R. Goodall, W. G. Lutters, P. Rheingans, and A. Komlodi, “Preserving the big picture: Visual network traffic analysis with tnv,” in Visualization for Computer Security, 2005.(VizSEC 05). IEEE Workshop on.   IEEE, 2005, pp. 47–54.
  • [82] S. T. Teoh, K.-L. Ma, S. F. Wu, and T. J. Jankun-Kelly, “Detecting flaws and intruders with visual data analysis,” IEEE Computer Graphics and Applications, vol. 24, pp. 27–35, 2004.
  • [83] F. Fischer and D. A. Keim, “Nstreamaware: Real-time visual analytics for data streams to enhance situational awareness,” in Proceedings of the Eleventh Workshop on Visualization for Cyber Security.   ACM, 2014, pp. 65–72.
  • [84] S. T. Teoh, K. L. Ma, S. F. Wu, and X. Zhao, “Case study: Interactive visualization for internet security,” in Proceedings of the conference on Visualization’02.   IEEE Computer Society, 2002, pp. 505–508.
  • [85] K. Lakkaraju, W. Yurcik, and A. J. Lee, “Nvisionip: netflow visualizations of system state for security situational awareness,” in Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security.   ACM, 2004, pp. 65–72.
  • [86] Y. Livnat, J. Agutter, S. Moon, R. F. Erbacher, and S. Foresti, “A visualization paradigm for network intrusion detection,” in Information Assurance Workshop, 2005. IAW’05. Proceedings from the Sixth Annual IEEE SMC.   IEEE, 2005, pp. 92–99.
  • [87] E. Bertini, P. Hertzog, and D. Lalanne, “Spiralview: towards security policies assessment through visual correlation of network resources with evolution of alarms,” in Visual Analytics Science and Technology, 2007. VAST 2007. IEEE Symposium on.   IEEE, 2007, pp. 139–146.
  • [88] F. Mansmann, “Visual analysis of network traffic: interactive monitoring, detection, and interpretation of security threats,” 2008.
  • [89] J. Tao, L. Shi, Z. Zhuang, C. Huang, R. Yu, P. Su, C. Wang, and Y. Chen, “Visual analysis of collective anomalies through high-order correlation graph,” in Pacific Visualization Symposium (PacificVis), 2018 IEEE.   IEEE, 2018, pp. 150–159.
  • [90] A. D. D’Amico, J. R. Goodall, D. R. Tesone, and J. K. Kopylec, “Visual discovery in computer network defense,” IEEE Computer Graphics and Applications, vol. 27, no. 5, 2007.
  • [91] C. Zheng, L. Ji, D. Pei, J. Wang, and P. Francis, “A light-weight distributed scheme for detecting ip prefix hijacks in real-time,” in ACM SIGCOMM Computer Communication Review, vol. 37, no. 4.   ACM, 2007, pp. 277–288.
  • [92] S. Yoo, J. Jo, B. Kim, and J. Seo, “Longline: Visual analytics system for large-scale audit logs,” Visual Informatics, vol. 2, no. 1, pp. 82–97, 2018.
  • [93] A. Boschetti, L. Salgarelli, C. Muelder, and K.-L. Ma, “Tvi: a visual querying system for network monitoring and anomaly detection,” in Proceedings of the 8th international symposium on visualization for cyber security.   ACM, 2011, p. 1.
  • [94] K. Nyarko, T. Capers, C. Scott, and K. Ladeji-Osias, “Network intrusion visualization with niva, an intrusion detection visual analyzer with haptic integration,” in Haptic Interfaces for Virtual Environment and Teleoperator Systems, 2002. HAPTICS 2002. Proceedings. 10th Symposium on.   IEEE, 2002, pp. 277–284.
  • [95] C. Scott, K. Nyarko, T. Capers, and J. Ladeji-Osias, “Network intrusion visualization with niva, an intrusion detection visual and haptic analyzer,” Information Visualization, vol. 2, no. 2, pp. 82–94, 2003.
  • [96] F. Mansmann and S. Vinnik, “Interactive exploration of data traffic with hierarchical network maps,” IEEE transactions on visualization and computer graphics, vol. 12, no. 6, pp. 1440–1449, 2006.
  • [97] M. L. Huang, J. Liang, and Q. V. Nguyen, “A visualization approach for frauds detection in financial market,” in Information Visualisation, 2009 13th International Conference.   IEEE, 2009, pp. 197–202.
  • [98] D. Olszewski, “Fraud detection using self-organizing map visualizing the user profiles,” Knowledge-Based Systems, vol. 70, pp. 324–334, 2014.
  • [99] M. C. Hao, D. A. Keim, U. Dayal, and J. Schneidewind, “Visimpact: business impact visualization,” in Visualization and Data Analysis 2005, vol. 5669.   International Society for Optics and Photonics, 2005, pp. 238–250.
  • [100] M. Suntinger, H. Obweger, J. Schiefer, and M. E. Groller, “The event tunnel: Interactive visualization of complex event streams for business process pattern analysis,” in Visualization Symposium, 2008. PacificVIS’08. IEEE Pacific.   IEEE, 2008, pp. 111–118.
  • [101] R. A. Leite, T. Gschwandtner, S. Miksch, S. Kriglstein, M. Pohl, E. Gstrein, and J. Kuntner, “Eva: Visual analytics to identify fraudulent events,” IEEE Transactions on Visualization & Computer Graphics, no. 1, pp. 1–1, 2018.
  • [102] M. C. Hao, D. A. Keim, and U. Dayal, “Visbiz: A simplified visualization of business operation,” in Proceedings of the conference on Visualization’04.   IEEE Computer Society, 2004, pp. 598–1.
  • [103] Z. Niu, D. Cheng, L. Zhang, and J. Zhang, “Visual analytics for networked-guarantee loans risk management,” in Pacific Visualization Symposium (PacificVis), 2018 IEEE.   IEEE, 2018, pp. 160–169.
  • [104] R. A. Leite, T. Gschwandtner, S. Miksch, E. Gstrein, and J. Kuntner, “Visual analytics for fraud detection: focusing on profile analysis,” in Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: Posters.   Eurographics Association, 2016, pp. 45–47.
  • [105] M. C. Hao, D. A. Keim, U. Dayal, and J. Schneidewind, “Business process impact visualization and anomaly detection,” Information Visualization, vol. 5, no. 1, pp. 15–27, 2006.
  • [106] P. A. Legg, “Visualizing the insider threat: challenges and tools for identifying malicious user activity,” in Visualization for Cyber Security (VizSec), 2015 IEEE Symposium on.   IEEE, 2015, pp. 1–7.
  • [107] W. Didimo, G. Liotta, F. Montecchiani, and P. Palladino, “An advanced network visualization system for financial crime detection,” in Visualization Symposium (PacificVis), 2011 IEEE Pacific.   IEEE, 2011, pp. 203–210.
  • [108] R. Chang, M. Ghoniem, R. Kosara, W. Ribarsky, J. Yang, E. Suma, C. Ziemkiewicz, D. Kern, and A. Sudjianto, “Wirevis: Visualization of categorical, time-varying data from financial transactions,” in Visual Analytics Science and Technology, 2007. VAST 2007. IEEE Symposium on.   IEEE, 2007, pp. 155–162.
  • [109] R. Chang, A. Lee, M. Ghoniem, R. Kosara, W. Ribarsky, J. Yang, E. Suma, C. Ziemkiewicz, D. Kern, and A. Sudjianto, “Scalable and interactive visual analysis of financial wire transactions for fraud detection,” Information visualization, vol. 7, no. 1, pp. 63–76, 2008.
  • [110] C. Görg, Z. Liu, J. Kihm, J. Choo, H. Park, and J. Stasko, “Combining computational analyses and interactive visualization for document exploration and sensemaking in jigsaw,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 10, pp. 1646–1663, 2013.
  • [111] E. N. Argyriou, A. Symvonis, and V. Vassiliou, “A fraud detection visualization system utilizing radial drawings and heat-maps,” in Information Visualization Theory and Applications (IVAPP), 2014 International Conference on.   IEEE, 2014, pp. 153–160.
  • [112] M. Schaefer, F. Wanner, F. Mansmann, C. Scheible, V. Stennett, A. T. Hasselrot, and D. A. Keim, “Visual pattern discovery in timed event data,” in Visualization and Data Analysis 2011, vol. 7868.   International Society for Optics and Photonics, 2011, p. 78680K.
  • [113] J. A. Guerra-Gomez, A. Wilson, J. Liu, D. Davies, P. Jarvis, and E. Bier, “Network explorer: Design, implementation, and real world deployment of a large network visualization tool,” in Proceedings of the International Working Conference on Advanced Visual Interfaces.   ACM, 2016, pp. 108–111.
  • [114] W. Didimo, L. Giamminonni, G. Liotta, F. Montecchiani, and D. Pagliuca, “A visual analytics system to support tax evasion discovery,” Decision Support Systems, vol. 110, pp. 71–83, 2018.
  • [115] E. N. Argyriou, A. A. Sotiraki, and A. Symvonis, “Occupational fraud detection through visualization,” in Intelligence and Security Informatics (ISI), 2013 IEEE International Conference on.   IEEE, 2013, pp. 4–6.
  • [116] Y.-a. Kang and J. Stasko, “Examining the use of a visual analytics system for sensemaking tasks: Case studies with domain experts,” IEEE Transactions on Visualization & Computer Graphics, no. 12, pp. 2869–2878, 2012.
  • [117] D. Redondo, A. Sallaberry, D. Ienco, F. Zaidi, and P. Poncelet, “Layer-centered approach for multigraphs visualization,” in Information Visualisation (iV), 2015 19th International Conference on.   IEEE, 2015, pp. 50–55.
  • [118] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao, “Sybillimit: A near-optimal social network defense against sybil attacks,” in 2008 IEEE Symposium on Security and Privacy (sp 2008).   IEEE, 2008, pp. 3–17.
  • [119] R. Heatherly, M. Kantarcioglu, and B. Thuraisingham, “Preventing private information inference attacks on social networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1849–1862, 2013.
  • [120] B. Mukherjee, L. T. Heberlein, and K. L. Levitt, “Network intrusion detection,” IEEE Network, vol. 8, pp. 26–41, 1994.
  • [121] J. Thomas and J. Kielman, “Challenges for visual analytics,” Information Visualization, vol. 8, no. 4, pp. 309–314, 2009.
  • [122] S. Liu, X. Wang, M. Liu, and J. Zhu, “Towards better analysis of machine learning models: A visual analytics perspective,” Visual Informatics, vol. 1, no. 1, pp. 48–56, 2017.
  • [123] A. Beutel, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos, “Copycatch: stopping group attacks by spotting lockstep behavior in social networks,” in Proceedings of the 22nd international conference on World Wide Web.   ACM, 2013, pp. 119–130.
  • [124] P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and G. Shroff, “Lstm-based encoder-decoder for multi-sensor anomaly detection,” arXiv preprint arXiv:1607.00148, 2016.
  • [125] C. Zhang, D. Song, Y. Chen, X. Feng, C. Lumezanu, W. Cheng, J. Ni, B. Zong, H. Chen, and N. V. Chawla, “A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data,” arXiv preprint arXiv:1811.08055, 2018.
  • [126] Y. Sun, Y. Tao, G. Yang, and H. Lin, “Visitpedia: Wiki article visit log visualization for event exploration,” in Computer-Aided Design and Computer Graphics (CAD/Graphics), 2013 International Conference on.   IEEE, 2013, pp. 282–289.
  • [127] M. E. Joorabchi, J.-D. Yim, and C. D. Shaw, “Emailtime: Visual analytics of emails,” in Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on.   IEEE, 2010, pp. 233–234.