1 Related Work
1.1 Dataflow Visualization System (DFVS)
Dataflow systems enable the user to configure system functionality by drawing a dataflow diagram that defines how the system modules interact with each other. While dataflow systems are effective in fields other than data visualization such as computational workflow design [53, 2, 4], we focus on dataflow systems for visualization purposes in this section. Previous DFVS have demonstrated the effectiveness of using dataflow to render scientific data [51, 39, 28] and manage volume rendering pipelines [15, 37]. Dataflow systems that pass only data subsets (versus program method arguments) yield simpler dataflow diagrams and lower learning overhead [42, 43]. ExPlates  and VisFlow  present embedded visualizations in their dataflow, and focus on interactive information visualization. Most dataflow systems support diagram editing in a drag-and-drop manner. However, it is observed that even with drag-and-drop interfaces, users may often have difficulty in translating their intention to system operations . In this work we design FlowSense to further simplify dataflow diagram construction, so that the user can intuitively use dataflow and make the most of the analytical capability of a DFVS. In particular, we build FlowSense for VisFlow, as its subset flow model supports many of the low-level visual data analysis tasks [45, 12], such as characterizing distribution, finding extremum, etc.
1.2 NLI for Data Visualization
Extensive research has been devoted to NLIs for decades. These interfaces address NL queries that otherwise have to be manually translated to formal query languages, e.g. SQL. A few examples are the interfaces for querying XML , entity-relational database [13, 55], and sequence translator to SQL . NLIs for data visualizations answer the queries by presenting visual data representations. Compared with other interfaces that simply return a numerical answer or a set of database entries, visualization NLIs present results that are more human-readable. Cox et al.  design the Sisl service within the InfoStill data analysis framework. The service asks a series of NL questions to complete an unambiguous query. The Articulate system  uses a Graph Reasoner to select proper visualizations to answer a query. DataTone  addresses query ambiguity by showing ambiguity widgets along with the main visualization so that the user is able to switch to desired alternative views. Eviza  and Evizeon  further improve the user experience by allowing for conversation-like follow-up questions. Fast et al.  propose a conversational user interface called Iris that may perform analytical tasks and plot data upon requests in dialogues. Kumar et al.  also propose a dialogue system for visualization. Orko  is an NLI designed for visual exploration of network data. Dhamdhere et al.  design Analyza that provides database-based NL query and visualizations. Srinivasan et al.  provide a summary and comparison of the majority of these NLIs. Several commercial tools integrate NLIs. IBM Watson Analytics  and Microsoft Power BI  provide a list of relevant data and visualizations to an NL question, from which the user may choose to continue an analysis. Wolfram Alpha  supports knowledge-based Q&A and is able to plot the results. ThoughtSpot  enables interactive search in a relational database, and provides multiple types of visualizations for the database. The NLI design for data visualization has two challenges: First, modern natural language processing (NLP) techniques cannot yet understand well arbitrary NL input due to the complex nature of NL. User queries are apt to be free-form and ambiguous; Second, choosing a proper visualization to answer an analytical question is non-trivial as there can be multiple possible visual representations .
1.3 Comparison with Other NLIs
FlowSense makes a distinction from the other interfaces as it is to our best knowledge the first NLI to address a dataflow context.
We set the scope of FlowSense to focus on assisting dataflow diagram construction, rather than to directly answer free-form analytical questions or seek a best visualization for a given query.
We believe such an approach is beneficial in several aspects:
Capability: The analytical capability of FlowSense is rooted in the design of the DFVS. The outcome of FlowSense is a complete, interactive, and iterative visual data exploration process supported by the DFVS, rather than a single visualization that only answers one particular query as in many other interfaces. Dataflow also naturally preserves analysis provenance , allowing the user to frequently revisit and reassess the current workflow. The diagram created by FlowSense explicitly keeps the user’s preference and intention from previous queries, which must otherwise be maintained by a model behind the scene [25, 44].
Usability: FlowSense integrates real-time presentation of tagged special utterances in the interface that reflect the state of the underlying semantic parser and help the user understand the dataset and dataflow present in the system (subsection 2.2). This is a novel design that facilitates the user’s understanding of the NLI behavior, as in most other NLIs the parsing feedback is only given after the query is submitted. The auto-completion suggestions of FlowSense also present special utterance tags so that the user may better understand the expected query components. Consequently, FlowSense may ease DFVS usage and make DFVS more accessible. Our case study and user study (section 4) show that FlowSense improves the DFVS usability, and its convenience is desirable by both novice and experienced VisFlow users. Besides, the DFVS is able to recover from errors more easily as the user always has full control over the system. However in other interfaces the user has to mostly rely on the behavior of the NLI and can hardly make corrections in case of misinterpretation.
Feasibility: The scope of assisting dataflow diagram construction is well defined and practicable. Even state-of-the-art NLP techniques have limited success in understanding an arbitrary query. Because each query is expected to update dataflow diagram and the user decides what the system should do and what visual representation to apply, FlowSense can produce more expected results and give better user experience under a well-defined scope. The mixed-initiative design mitigates the ambiguity problem. The DFVS users in our case study and user study are all able to understand the scope of FlowSense and use FlowSense effectively.
|#||Function||Sample Queries||Description||Sample Sub-Diagram|
1.4 Semantic Parsing
FlowSense uses semantic parsing to process NL input and map user queries to VisFlow functions (subsection 2.1). It depends on a pre-defined grammar that captures NL input patterns. A semantic parser recursively expands the variables in the grammar to match the input query and can interpret the input based on the rules applied and the order of their application . At a high level, the mapping performed by FlowSense can also be considered a classification task and addressed by classification algorithms 
. However we prefer semantic parsing because most classification approaches are supervised algorithms that require a large corpus of labeled examples. Such training data are not available for DFVS. Besides, compared with deep learning methods[20, 26], semantic parsing does not require heavy computational resources.
The FlowSense semantic parser is implemented within the Stanford SEMPRE framework  and CoreNLP toolkit . The CoreNLP toolkit integrates a comprehensive set of NLP tools including the Part-of-Speech (POS) tagger, Name-Entity-Recognizer (NER), etc. A POS tagger identifies roles of words in a sentence, e.g. verb, preposition, adverb. The SEMPRE framework employs a modular design in which different types of parsers and logical forms can be easily plugged-in. The framework can quickly be adapted for domain-specific parser design . We apply SEMPRE together with CoreNLP to the DFVS domain. In particular, the FlowSense parser utilizes the POS tags produced by CoreNLP for processing special utterances and grammar matching. The FlowSense grammar expects words with certain POS tags to appear in query parts.
2 Semantic Parser
In this section, we define the building blocks of the semantic parser: the VisFlow functions that can be specified by NL, the definition of the parsing grammar, and the general query pattern the parsing algorithm expects. For concept illustration we use the Auto MPG dataset222http://archive.ics.uci.edu/ml/datasets/Auto+MPG throughout the paper, which has information about cars in 9 columns, including mpg, horsepower, origin, etc.
2.1 VisFlow Functions
To create an NLI for VisFlow, we first studied a sample diagram set that includes 60 dataflow diagrams created by 16 VisFlow users from their recorded VisFlow sessions. These diagrams cover a wide range of VisFlow usage scenarios and deal with various types of datasets. We identify a set of frequently appearing sub-diagrams and categorize them into six major categories as listed in Table 1. The construction of these sub-diagrams are defined as the VisFlow functions. By implementing the VisFlow functions, FlowSense essentially supports the building blocks of visual data exploration in VisFlow so that analyses rendered by VisFlow native interactions can be carried out with FlowSense. These functions also reflect the fundamental analytical activity defined in information visualization task taxonomies [45, 12]. Table 1 explains the usage of each VisFlow function and shows several sample queries.
In addition to the six major categories, FlowSense also supports many utility functions such as adding/removing dataflow nodes/edges, undo/redo, loading datasets, etc. Though these functions also enhance the usability of the system, we omit them here as they are indirectly related to visual data analysis.
2.2 Dataflow Context and Special Utterances
It is important to make the semantic parser aware of the dataflow context, such as the dataset loaded and the nodes in the dataflow diagram. FlowSense extracts a special group of tokens called the special utterances from NL input. Special utterances are words that refer to entities in the dataset or the dataflow diagram. They are the arguments and operands of VisFlow functions. FlowSense recognizes table column names, node labels, node types, and dataset names as special utterances. For the query shown in Figure 1, FlowSense identifies “mpg”, “horsepower”, and “origin” as table columns, “MyChart” as a node label, and “parallel coordinates” as a node type. The special utterances identified by FlowSense are shown in colored tags in the FlowSense input box (Figure 2). Each distinct color represents one special utterance type: green for table column, light green for node label, purple for node type, and light blue for dataset name. The colors are applied consistently throughout the user interface.
FlowSense applies a semantic parser to map an NL query to one of the VisFlow functions based on an elaborate grammar designed for these functions. The grammar is context-free  and formally defined as a 4-tuple . is a finite set of variables. is a finite set of terminals. A terminal represents an English word or phrase. is the rule set that defines how a single variable matches an ordered list of terminals and variables (possibly itself in a recursive rule). Below is an example rule:
In this rule, is a high-level variable that matches a query that requests a visualization. matches a verb that has a meaning similar to “show”. matches one or more columns from the data. stands for a phrase that describes a visualization metaphor such as scatterplot or parallel coordinates. The token “in” is a terminal symbol that comes from the NL input directly. The example rule above is simplified for the convenience of explanation. In practice, a rule often matches against generic variables rather than a specific word. is the start variable that expands to other variables to match the whole query.
The grammar of the FlowSense semantic parser attempts to derive an input query by recursively searching for all possible matches (up to a preset limit) of the grammar rules. This procedure is called derivation . FlowSense uses the semantic parsing implementation from SEMPRE. It also uses the Stanford CoreNLP  toolkit that is built into SEMPRE for special utterance tagging. The variables and rules (i.e. SEMPRE formulas) are defined in SEMPRE grammar files.
2.3.1 Special Utterance Placeholders
The FlowSense grammar consists of static grammar rules and the special utterance placeholders. The special utterance placeholders are at runtime dynamically replaced by their corresponding dataflow elements. Therefore, the FlowSense semantic parsing is independent of the dataset, the dataflow diagram, and the analytical tasks. The rules are generalizable across domains: No new rules need to be created when the system switches to new datasets or tasks.
For example, FlowSense uses the generic variable in its grammar as a special utterance placeholder. At runtime, a real column name (e.g. “mpg”) is automatically extracted from the dataset. FlowSense identifies column names on the fly as the user types the query. “mpg” would show up as a tagged column, and then matched with by the parser. A reverse mapping is performed from the placeholder to the particular column after query parsing so that the system may operate on that column.
Using special utterances in the grammar has several benefits. First, special utterances enable VisFlow functions to operate on elements that are important for dataflow diagram editing and visual data exploration. Second, it makes the grammar set small as rules may be written with generic variables rather than specific dataset or diagram content. Last but not least, the real-time tagging of special utterances provides important feedback to the user about what operations are available in the system and how the NLI interprets the query.
2.3.2 Derivation Ambiguity
It is possible to have ambiguity when multiple possible query derivations exist, which can be defined as syntactic ambiguity . For example, FlowSense uses wildcard variables to match general table row references. Over the Auto MPG dataset, the token “cars” from “Show a plot of cars” describes the user’s understanding of data entities but should be only treated as table rows from the NLI perspective. Meanwhile, the token “horsepower” from “Show a plot of horsepower
” is a special utterance and should be treated as a column to visualize. Therefore a wildcard rule that matches “cars” as table rows may also match “horsepower”, resulting in the second query getting improperly executed. We could handle this case by creating a wildcard variable that rejects a special utterance token. Nevertheless, such a design would lead to a larger number of variables and rules in the grammar, which are harder to maintain and develop. Therefore we choose to resolve certain syntactic ambiguity in the parsing phase with supervised learning on a weight vector50], as introduced by Liang et al.  in the SEMPRE framework. The objective is given by:
In the above, is the input query, is the preferred derivation, and is a derivation choice. The pair is iterated over all training data. The feature of a derivation, feature, maps the pair to a -dimensional space and is determined by the applied rules in the derivation. penalty is if and otherwise. The objective function has a penalty for possible choices of incorrect predictions that are within a margin of one from the correct predictions. The parser fits the training examples by giving intended derivations higher probability so that they are preferred in case of ambiguity. In particular, the rule that expands to a column special utterance will be preferred over a rule that expands to a wildcard. Note that we only apply this training to facilitate the simplicity of the FlowSense grammar and reduce the number of required rules. The training cannot address the ambiguity in natural language itself at large. We were able to use a small training set of fewer than twenty examples to guide the preferred derivation in case of syntactic ambiguity for a rule set of around 500 rules. This is feasible because the FlowSense rules are independent of data and dataflow diagrams. The training set only needs to guide the semantic parser to focus on certain important grammatical features, such as special utterances or word proximity.
2.4 Query Pattern
The main goal of FlowSense is to support progressive construction of dataflow diagrams. We studied the creation process of the VisFlow diagrams in our sample diagram set and empirically identified a common pattern with five key query components that all VisFlow functions may contain: function type, function options, source node(s), target node(s), and port specification. This pattern is illustrated in Figure 1 with a sample query “Visualize mpg, horsepower, and origin of the selected cars from MyChart in a parallel coordinates plot”. In this query, the verb “visualize” implies applying a visualization function. The three columns “mpg, horsepower, and origin” indicate the options (i.e. what to visualize) for the visualization function. The phrase “from MyChart” tells the system the location of the data to be plotted and provides source node information. The phrase “in a parallel coordinates plot” indicates a new visualization node of the given visualization type is to be created as the target node. As VisFlow explicitly exports interactive data selection from visualization nodes, the phrase “selected cars” is a port specification that further describes that the user wants to visualize the selection from MyChart and the new visualization node should be connected to the selection output port of MyChart.
The grammar of FlowSense includes a variable hierarchy that matches the five key components of an NL query. Figure 1 illustrates the parse tree that derives the sample query. The variables involved in the derivation are shown in the parse tree, in which rule expansions are bottom-up. A variable may carry information for multiple query components. We design a broad set of variables and rules that are able to not only accept queries with a particular component order, but also their different arrangements. For instance, “Show mpg and horsepower in a scatterplot” is equivalent to “Show a scatterplot of mpg and horsepower”. They both can be accepted by FlowSense. FlowSense is also able to derive multiple functions from one single query and execute their combination, e.g. “Show the cars with mpg greater than 15 in a scatterplot” infers both visualization and filtering functions.
A query may not necessarily contain all the five components explicitly. For example, the user may simply say “Show mpg and horsepower” without mentioning any source node or target visualization type. FlowSense may automatically locate source and target nodes in its query pattern completion phase (subsection 3.3). An NL query may also contain implicit information, e.g. “Find cars with maximum mpg” intends to perform data filtering to search for cars with the largest mpg value. The use of a filter is identified by function classification in the query execution phase (subsection 3.2).
The usability of an NLI is closely related to its discoverability. It is desirable that when the query is partially completed, the system is able to provide hints or suggestions to the user about valid queries that include the partial input. This has been a requested feature in prior NLI user studies . We therefore develop an auto-completion algorithm in FlowSense to enhance its usability and discoverability. When the user types a partial query and pauses, the system triggers query auto-completion automatically. The auto-completion may also be invoked manually with a button press. Figure 2(a) shows the auto-completion suggestions in the FlowSense input box.
Auto-completion has been implemented in other visualization NLI, such as Eviza . Eviza applies a template-based auto-completion, in which the system attempts to align user input to available templates. Here we take a similar approach by creating a set of query templates with around one hundred queries. Upon an auto-completion request, the algorithm searches through all possible textual matches between the user’s partial query and a prefix of some template. All matched queries are then sent to the FlowSense parser for evaluation. If a query is accepted, it becomes an auto-completion candidate. Some of the queries contain value placeholders and the user is expected to fill in those values ([string], [number] in Figure 2(a)).
We also design a token completion algorithm that matches the partially typed word against available special utterances. This helps speed up query typing with respect to the dataflow context. The user may use the tab and arrow keys to select token completion candidates as in a programming IDE. For example, when “scatter” is typed it can be completed to the available visualization type “scatterplot” (Figure 2(b)). Token auto-completion reduces typing workload and helps remind the user of the DFVS capability and the current dataflow diagram elements.
3 Query Execution
FlowSense is built as an extension to VisFlow. The user may activate the NLI at any time while working with the DFVS. The user may either type the query in the input box or use the speech mode that is implemented with HTML5 web speech API. In this section we introduce the query execution workflow as depicted by Figure 3.
3.1 Special Utterance and POS Tagging
Special utterances have remarkable roles in executing a VisFlow function. Their tagging is performed on the fly when the user types the query. For typo tolerance, FlowSense employs approximate matching and checks each -gram in the query (where may range from to the maximum special utterance word length) against all special utterances using case-insensitive Levenshtein distance [32, 38]. We divide the distance over the string length and use the ratio to mitigate the fact that longer strings are more prone to typos. We find a value of or and a ratio threshold of work well in practice.
In addition to recognizing special utterances, FlowSense also performs POS tagging on the query with CoreNLP. Each token receives a POS tag as shown in Figure 1. POS tags are used to generalize the FlowSense grammar. For example, many prepositions can be used interchangeably, e.g. “selection of the plot” is equivalent to “selection from the plot”. Instead of having one rule for every preposition, the grammar uses a generic variable that matches any preposition. POS tagging helps analyze the basic semantic structure of a query.
3.2 Function Classification
FlowSense uses keyword classification to identify the semantic meaning of words in the NL query and uses this information to decide a proper VisFlow function to execute. For instance, the verb “show” is a synonym of “visualize”, “draw”, etc. These words indicate the intention to create a visualization. Meanwhile, “find” may implicitly specify a data filtering requirement and is similar to “filter”. We compute the Wu-Palmer similarity scores 
between words and use the measured scores to classify words in the NL query that have close meaning to a set of pre-determined VisFlow function indicators. The implementation of the similarity scores is based on WordNet and NLTK .
3.3 Query Pattern Completion
After the parser identifies the existing key components of a query, FlowSense attempts to fill in the blanks where information is missing using default values or the diagram editing focus.
3.3.1 Finding Default Values
Query components may be completed using default values. Function options may have defaults. For instance, FlowSense automatically chooses two numerical columns to visualize in a scatterplot triggered by a simple query “Show a scatterplot”. Note that within a DFVS decisions like this can easily be changed by the user. So FlowSense does not necessarily need to make a best guess. Similar decisions include completing port specification. By default FlowSense applies the newly created filter to all the data a visualization node receives, rather than the data subset interactively selected in the visualization. Sometimes the default values may even be empty. A query like “Filter by mpg” results in FlowSense creating a range filter on the mpg column with no filtering range given (i.e. a no-op filter placeholder). The user can then follow up and fill in the filtering range via the DFVS interface.
3.3.2 Finding Diagram Editing Focus
Whenever the user expands the dataflow diagram there always exists an editing focus, though often the focus is implicit. For example, when the query contains a phrase like “from MyChart”, the focus (i.e.
the source node of the query) is explicitly given. However, users tend to neglect the source or target nodes in their queries, especially when there is a sequence of commands that together complete a task. When a query does not have explicit focus, FlowSense derives the user’s implicit focus based on user interaction heuristics. We compute a focus score for every nodeby:
The activeness of is re-iterated upon every user click in the system:
where if the -th click is on and
otherwise. This definition measures how actively a user focuses on a node by how many times she recently clicks on it, as well as how close it is to the mouse cursor. The activeness derived from user clicks decreases exponentially over time, while the closeness to mouse dominates under a small distance with a shifted sigmoid function333See the appendix for more explanation on the characteristics of the diagram editing focus heuristics.. We find the parameters achieve good result. FlowSense chooses the node with the highest focus score to be the diagram editing focus. If multiple source nodes are required (e.g. in a merge query), FlowSense selects the nodes in the order of their decreasing focus scores.
The focus may also be required by node type references. For instance, the user may input “show the data from the scatterplot”, in which “scatterplot” is a reference by node type that describes a scatterplot node existing in the dataflow diagram. In case of a tie during the node type search, e.g. there are multiple scatterplots in the diagram, the nodes with higher focus scores are chosen.
3.3.3 Query Completion Ambiguity
There may be multiple syntactically correct ways to execute a same query. Consider the query “Show the cars with mpg greater than 15” applied on a visualization node. From the grammar perspective the parsed outcome has no ambiguity: Apply an attribute filter and visualize the result. However, there are two ways of execution: One is to create a filter and then visualize the filtered cars in a new visualization; Alternatively we may apply the filter on the input of the current visualization so that the existing visualization shows only the filtered cars. Both can be desired under some circumstances. FlowSense has the default behavior that prefers filtering the input when the source node is a visualization, which we find empirically more intuitive. Such ambiguity can often be resolved with a slightly refined query, e.g. “Show the cars with mpg greater than 15 from the plot”, which would explicitly indicate that the filter should be applied to the output of the existing visualization.
3.4 Diagram Update
Once a query is successfully completed, FlowSense performs the VisFlow function(s) with the given function options. This typically results in the creation of one or more nodes, e.g. the visualization function creates one plot while the highlighting function creates three nodes (Table 1). FlowSense may also update existing nodes without creating any new nodes, e.g. when the user only changes rendering colors. Additionally, a query may operate on multiple existing nodes at once, e.g. linking and merging two tables create edges between two nodes. Operating on multiple nodes together helps simplify dataflow interaction, as these operations previously require multiple drag-and-drop interactions.
After new nodes and edges are created, the diagram may become more cluttered. FlowSense locally adjusts the diagram layout after each diagram update. We use a force-directed layout modified from the D3 library  that manipulates the vicinity of the current diagram editing focus. We extend the force to take rectangular node sizes into account so that larger nodes such as embedded visualizations have stronger repulsive force for avoiding node overlap. User-adjusted node positions are remembered by the system, and the layout algorithm avoids moving nodes that have been positioned by the user. Currently FlowSense does not look for an optimal dataflow layout. We leave layout improvement  for future work.
3.5 Error Recovery
There are several types of potential errors in executing a query:
(1) The query cannot be accepted by the grammar. For example, out-of-context input (“What time is it now”) and unsupported functionality (“Split the data into two halves”) would receive grammar rejection;
(2) The query is grammatically correct but invalid based on the dataflow context, possibly due to incorrect references of dataset and diagram elements. For example, the user may attempt to show data from a non-existing node, e.g. asking to “Highlight the selected cars from the scatterplot” when there is no scatterplot in the dataflow. Such errors are captured at the query pattern completion step.
(3) The query is executed fully but does not meet the user’s expectation. For example, “Show the data” by default creates a scatterplot but the user instead wants a heatmap, or “Merge these two nodes” merges an unexpected pair of nodes when “these” appears to be a vague reference (the system chooses two nodes with the highest focus scores).
Upon the first two types of errors the system displays a message and asks for a query correction. For the last type of error it is up to the user to adjust the dataflow diagram. Since the user is simultaneously using the underlying VisFlow DFVS while using FlowSense, she always has the flexibility to undo the FlowSense action or to make partial adjustments when the NLI does not yield exactly the desired outcome.
To evaluate the effectiveness of FlowSense, we describe the results of one case study and one formal user study.
4.1 Speed Reduction Study
We invite several users to try out the FlowSense prototype in different data analysis domains and analyze their usage of our NLI. In this paper we introduce one case study in which we work with two domain experts in person to address a practical research task using a comprehensive set of NL queries. The analysts are researching the city regulation issued on November 7, 2014 that reduces the default speed limit on all New York City streets from 30 MPH to 25 MPH. The data contain the estimated average hourly speed for each road segment in Manhattan from January 2009 to June 2016. The speed estimation was performed based on the TLC yellow taxi records  that only have pickup and dropoff information. The analysts are familiar with the data, and the visualizations to be created are similar to the visualizations they previously generated for the project using Tableau . However they have no prior experience with either VisFlow or FlowSense. We met the analysts in person and first introduced VisFlow and FlowSense in a 30-minute session. Then we guided the analysts through how FlowSense can be used to create visualizations to study the speed reduction. We observed in this study that almost all the analysts’ visualization requests (excluding those that exceed the scope of the VisFlow subset flow) can be effectively supported by FlowSense. Here we summarize the NL queries applied in the speed reduction study.
Initially, the analysts would like to look at the speed reduction impact at a larger scale. They first load a pre-computed speed table (Figure 4(1)) with the FlowSense data loading utility function (the analysts know the dataset name). The table contains the monthly average speed aggregated by the speed limits of the streets. The analysts ask the system to present a histogram of speed by “Show speed distribution” (Figure 4(2)). The first histogram has no color encoding but the analysts are able to immediately add a color scale by “Encode speed limit by color”. FlowSense inserts a color mapping node with a red-green color scale at the input of the histogram (Figure 4(3)). The histogram shows the street groups with higher speed limit in green, and lower speed limit in red. To view the speed changes over time, the analysts use the query “Draw speed over time grouped by speed limit” (Figure 4(4)). The query result is a line chart showing average speed changes for different speed limit groups. The analysts observe that overall there is a speed reduction in all speed limit groups that started around middle 2013.
Seeing the overall trend, the analysts move on to a comparative analysis between individual streets from two slow zones. They load and visualize a table about speed limit sign installation in a map (FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System(1)) by “Show the data in a map”. This dataset has for each road segment in Manhattan its speed limit, geographical location, and whether the street has speed limit signs installed (signs are shown as dots in the map). As the slow zones mostly have speed limit signs installed, the analysts narrow down the data in the map by placing a filter on the “sign” column (FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System(2)). The filtered map reveals two slow zone neighborhoods with densely located signs: Alphabet City and West Village. The analysts apply one map visualization for each zone for a comparison between the two zones. They label the two maps by the slow zone names and select a few streets from each zone (marked in the maps of FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System). To study the speed changes of these selected streets, another table (named “segment monthly speed”, also known to the analysts) that includes monthly average speed for each road segment is added to the diagram (FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System(3)). The analysts then use the link queries to create a sequence of nodes that extract segment IDs from the selected streets and find their monthly average speed from the segment monthly speed table (FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System(4)). Blue and red colors are assigned to the streets in West Village and Alphabet City respectively to visually differentiate them (FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System(5)). The two groups of streets are then merged by a subset manipulation function (FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System(6)). Note that the query “Merge” only has a single word. It works because the query completion of FlowSense automatically locates the recently focused color editors as the source nodes for this query. Finally, the two groups are rendered together in a speed series visualization (FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System(7)), which compares the speed changes between the two groups of streets. As the visualizations produced by FlowSense are linked, the analysts can easily change the street selection in the maps to compare different groups of streets.
This case study demonstrates that FlowSense can be applied to a practical, comprehensive analytical task. The generated visualizations may guide the analysts towards further data analysis. The analysts participating in this study think FlowSense is helpful, especially since it exemplifies how to build VisFlow diagrams and facilitates their learning of the DFVS.
4.2 User Study
We conduct a formal user study to evaluate the effective of FlowSense together with the VisFlow framework. Through the user study we validate whether a user is able to smoothly apply FlowSense for dataflow diagram construction, and how well FlowSense responses meet the user’s expectation. We design an experiment that introduces FlowSense and VisFlow to the participant and assigns analytical tasks to be solved within the system.
4.2.1 Experiment Overview
The user study is carried out in a fully automated manner using an online system with step-by-step instructions. The participants join the study using a web browser on their own machines. Participants may ask the experiment assistant for help and clarification via web chat or phone call during the experiment session.
We recruited participants ( male, female, all with an age between and ) who work or study in the field of computer science. participants have a data visualization background. are graduate students, and the other are professionals (software engineer, researcher, faculty). participants have prior experience with VisFlow. No participants have prior knowledge about FlowSense. The participants are chosen to have a variety of specialities so as to represent potential DFVS users. The participant group includes visualization designers, data scientists, and software engineers who share data analysis interest but have different skill sets. The study is structured into two phases:
Tutorial Phase. The participant completes a tutorial of the VisFlow dataflow framework, and then a tutorial of the FlowSense NLI. After each tutorial, the participant is asked to complete the tutorial diagram to demonstrate familiarity with the introduced tool. Each tutorial is expected to take 10 to 20 minutes. After the tutorials there is an on-demand practice session with a flexible duration.
Task Phase. The participant explores an SDE Test dataset and constructs dataflow diagrams using FlowSense and VisFlow to answer questions about the data. The participant is encouraged to use FlowSense as much as possible. The usage of the NLI is not enforced because the goal of the NLI design is to improve the user experience of the DFVS, rather than to completely replace the traditional DFVS interactions (which is likely infeasible). The entire task phase is expected to take 30 to 60 minutes.
At the end of the study, the participant takes a survey to give comments and quantitative feedback about FlowSense and VisFlow.
4.2.2 Dataset and Tasks
The SDE Test dataset includes the test results of software development engineer (SDE) candidates stored in two tables. The first table describes the test results for each candidate. A test consists of answering several multi-choice questions selected by the system from a large question pool. Each question has a unique ID, a pre-determined difficulty, its supported programming language(s), and possibly a time limit. For each question, the candidate receives a result (correct, wrong, skipped, unanswered)44footnotetext: See the appendix for additional remarks and results of the user study.. The dataset also has a “TimeTaken” column that stores how much time a candidate took to answer a question. The second table includes background information about each candidate, such as the candidate’s highest degree level, field of study, and institution. We give three analytical tasks about this dataset. The tasks are designed to reflect common tasks performed in visual data exploration:
(T1) Overview Task. The participant is asked to visualize the overview distribution of the question answering results, and figure out the total number of questions that were skipped, and the percentage of a question being answered correctly.
(T2) Outlier Task. The participant is first asked to find a candidate with an outlier background information value (who incorrectly entered the current year “2018” in place of his own information). Then the participant is asked to investigate a data recording discrepancy regarding the “TimeTaken” column: Some of the “TimeTaken” values are erroneously large numbers when a question is unanswered.
(T3) Comprehensive Task. The participant is asked to identify one question that Masters candidates answer significantly better than Bachelors candidates. This task requires comprehensive usage of the dataflow features: attribute filtering, brushing, and heterogeneous table linking.
All the three tasks have definitive correct answers to ensure that participants explore the data and draw conclusions reasonably. Each user study session is logged with anonymous full diagram editing history. We analyze the study results based on task answers and completion time, comments and quantitative feedback, and NL query logs.
4.2.3 Task Completion Quality
Figure 5(a) shows the verdict distribution of the participants’ answers. It can be seen that the majority of the participants were able to come up with the correct answers to the tasks. Figure 5(b) shows the completion time distribution for each step of the user study.
It can be observed that the time taken for the tutorials and tasks are mostly as expected. Yet the time required for a task increases when the task involves heterogeneous tables and interactive data filtering to find solutions (T3). After reading the user comments in the feedback, we believe this may be due to the fact that many participants are first-time VisFlow users and need to digest the concept of the VisFlow subset flow model. In particular, linking heterogeneous tables can be challenging to understand at first. However, most users were able to get the idea and formulate a solution. This is reflected by one of the feedback comments: “The linker functions are confusing at first. But after experimenting with the tool for a while and getting to know how they work, things become easier.” We believe such a learning curve is natural for DFVS.
4.2.4 Quantitative Feedback
We ask for feedback on six aspects regarding FlowSense (and also VisFlow4.2.2) in our survey. Each aspect is presented with a statement and a – Likert scale for the participant to express agreement () or disagreement (). Table 2 lists the feedback for the FlowSense NLI. The quantitative feedback shows that most users were able to understand the scope of FlowSense, and apply it for dataflow diagram construction. The users were also asked to compare the NLI-assisted dataflow usage against their earlier experience in the tutorial phase with the standalone VisFlow framework. Twelve users agree (with a feedback score of at least 4) that FlowSense simplifies the diagram construction, and ten users agree that FlowSense speeds up the data exploration.
The feedback also reveals space for improving the NLI. In particular, it is unclear to most users how to update a rejected query to make it accepted. It may be helpful to design an algorithm that provides suggested corrections or changes to a failed query. However, this is technically challenging as changing minimally a query to fit it into the parse tree is algorithmically non-trivial. We would like to leave query correction suggestions for future work.
4.2.5 Query Log Analysis
To closely study where FlowSense does not accept a query, we manually went over the rejected queries and categorized each rejected query by its reason of rejection. Overall, we analyzed queries, out of which were accepted by FlowSense. Excluding the invalid and mistyped queries, the raw acceptance rate was . We found some of the rejection issues straightforward to resolve: the requested functionality was not implemented, bugs in the query execution code, etc. We were able to fix those issues in a short iteration of the NLI implementation, resolving “not implemented” queries and software bugs. The improved acceptance rate would be 76.911%. In general, it requires systematic engineering efforts to thoroughly increase query coverage for the “not implemented” category, which is beyond the scope of this paper. The remaining unresolved failures are summarized in Figure 6 with their counts555See the appendix for the detailed definition and examples for each category..
|I understand what queries FlowSense may accept and execute.|
|The responses of FlowSense meet my expectations.|
|FlowSense simplifies dataflow diagram construction.|
|FlowSense speeds up my data exploration.|
|FlowSense helps me learn VisFlow features that I was not aware of.|
|When my query got rejected, I can figure out how to update it to let it be accepted.|
Some of those failures are more challenging to resolve. Specifically, FlowSense does not make logical inferences and deals only with the raw values in the data. If the user rephrases the query by natural language variation or implication (26 occurrences in Figure 6), the query would be difficult to parse. The query “Show only segments with signs” is more natural than that in FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System(2). Yet FlowSense does not infer that a segment with a “sign” value of “yes” implies that it is a segment “with sign”. In T3 the dataset has “HighestLevelOfEducation” as a column name, but if the user mentions “degree”, FlowSense does not know that it is equivalent. There needs to be additional knowledge base added to the system so that the NLI can detect concept equivalence, which is generally difficult to achieve. In a “composite” query, the user intends to perform several VisFlow functions in one query (e.g. creating nodes, applying filter, and assigning color together). It is difficult to write concise grammar rules to accept composite queries. In practice, by informing the users of these limitations, in most cases the issues can be circumvented via rephrasing the queries, e.g. composite queries can be split into smaller steps that are easier to parse and execute.
When an operation requested is not supported by the DFVS, a “not supported” failure arises, e.g. VisFlow without its data mutation extension666https://visflow.org/extension. The extension is not yet supported cannot aggregate and mutate data. When the special utterance tagging over-aggressively tags a non-special word, its placeholder fails to resolve, leading to a “tagging error”. The user may use the token dropdown in the FlowSense input box to correct tagging mistakes, or disambiguate tokens with multiple meanings (Figure 2(c)).
5 Discussion and Limitations
5.1 Scalability and Generalizability
Technically there are many ways to create a set of rules that implement the same dataflow function. Following the active development and enhancement of the system functionality, from time to time grammar rules can be combined and rewritten to make the grammar more concise. We keep iterating and refining the FlowSense grammar to expand its functionality. FlowSense currently includes about 200 variables and a rule set of around 500 rules in its grammar. Our grammar development practice employs continuous integration and maintains a test set (currently of 131 test queries) to ensure that all categories of VisFlow functions may execute properly during iterations and extensions of the grammar. Approximately 10 to 20 rules need to be added to support a new dataflow function category.
Though the grammar rules of FlowSense are coupled with the underlying VisFlow functionality, its approach of utilizing special utterance placeholders is generalizable to other dataflow systems that employ similar modular component design. Once the data- and diagram-independent dataflow elements are identified, these elements can be represented by special utterances in the grammar and dataflow implementation can subsequently be extended to process them. For example, we may extend the grammar to support more data processing power obtainable from a computational dataflow system like KNIME .
5.2 User Behavior and Engagement
The effectiveness of a grammar-based semantic parser couples with the grammar design. One design flaw in the grammar may result in unexpected rejections of seemingly acceptable queries. Despite careful grammar design, the user is likely to come up with questions that exceed the scope of the grammar. However, we find that users are willing and able to refine rejected queries with a small number of trial-and-error attempts. Besides users may become more proficient with the NLI after reading query examples so as to understand the NLI capability. Yet showing too many examples may limit the user’s thoughts and forfeit the benefit of using an NLI. We would like to further study user behavior regarding NLI usage in DFVS in the future to better identify when and what query examples need to be provided.
We also observe that users tend to perform composite queries and ask for batch operations using the NLI. With traditional mouse/keyboard interaction, the results of such queries have to be achieved by a sequence of interactions. FlowSense increases the data exploration efficiency by naturally enabling batch operations. In fact, we notice some users were able to repeat successful short queries that achieved the most batched result. The convenience of using NL to carry out multiple operations may improve the user’s engagement , provide interaction “shortcuts”, and make dataflow features more accessible by simplifying the creation of rather complicated sub-diagrams, e.g. “highlighting”.
5.3 Technique and Scope
We prefer semantic parsing to deep learning mainly because the latter has a bottleneck of requiring a large volume of training examples. Though there are benchmark datasets for general NLP, there has not yet been a training set catered for visualization-oriented NLI or DFVS. In the future with more users working with the NLI, we may collect more user queries that constitute a rich training set in order to support methods like neural networks for text classification.
Currently FlowSense only works with dataflow diagram editing. But it may be desirable for the NLI to answer analytical questions such as “Does the vehicle speed decrease over years in NYC?” by creating a visualization like Figure 4(4). To that end we need further research on the dataflow functions and their application to answering analytical questions. One possible direction is to study how DFVS diagrams can be constructed for knowledge-based Q&A .
In this work we design FlowSense, a novel NLI for visual data exploration within a DFVS. We build FlowSense for the VisFlow framework and show that it improves the DFVS usability and simplifies diagram construction. FlowSense applies semantic parsing to map NL input to VisFlow functions. Its emphasis on special utterances and usage of special utterance placeholders make the semantic parsing independent of datasets and diagrams, but at the same time aware of the dataflow context. The real-time feedback of tagged special utterances, as well as query and token auto-completion features, largely helps the user understand the underlying parsing state. Our case study and user study results demonstrate the effectiveness of the proposed NLI, and help identify future research directions for its improvement.
We would like to thank BlindData.com for providing the user study dataset. This work was supported in part by: the Moore-Sloan Data Science Environment at NYU; NASA; NSF awards CNS-1229185, CCF-1533564, CNS-1544753, CNS-1730396, CNS-1828576. B. Yu and C. T. Silva are partially supported by the DARPA MEMEX and D3M programs. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA.
-  D3: Data-Driven Documents. https://d3js.org
-  IBM SPSS Modeler. https://www.ibm.com/products/spss-modeler/
-  IBM Watson Analytics. https://www.ibm.com/watson-analytics
-  KNIME data analysis platform. http://www.knime.org/
-  Microsoft Power BI. https://powerbi.microsoft.com/
-  NLTK. http://www.nltk.org/
-  Tableau Software. http://www.tableausoftware.com/
-  Thoughtspot. http://www.thoughtspot.com/
-  TLC trip records. http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
-  Wolfram Alpha. http://www.wolframalpha.com/
-  M. Allahyari, S. A. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. A brief survey of text mining: Classification, clustering and extraction techniques. In Proc. KDD Bigdas, 2017.
-  R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. In IEEE Symposium on Information Visualization (InfoVis’05), pages 111–117, 2005.
-  I. Androutsopoulos, G. D. Ritchie, and P. Thanisch. Natural language interfaces to databases – an introduction. Natural Language Engineering, 1(1):29–81, 1995.
-  C. Batini, E. Nardelli, and R. Tamassia. A layout algorithm for data flow diagrams. IEEE Trans. Software Engineering, 12(4):538–546, 1986.
-  L. Bavoil, S. P. Callahan, C. E. Scheidegger, H. T. Vo, P. Crossno, C. T. Silva, and J. Freire. VisTrails: Enabling interactive multiple-view visualizations. In Proc. IEEE Visualization Conference, pages 135–142, 2005.
-  J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing on Freebase from question-answer pairs. In Proc. Empirical Methods in Natural Language Processing (EMNLP’13), pages 1533–1544, 2013.
-  P. Clark, J. Thompson, and B. Porter. A knowledge-based approach to question-answering. In Proc. AAAI Fall Symposium on Question-Answering Systems, pages 43–51, 1999.
-  P. R. Cohen. The role of natural language in a multimodal interface. In Proc. 5th Annual ACM Symposium on User Interface Software and Technology (UIST’92), pages 143–149, 1992.
-  K. Cox, R. E. Grinter, S. L. Hibino, Lalita, J. Jagadeesan, and D. Mantilla. A multi-modal natural language interface to an information visualisation environment. International Journal of Speech Technology, 4:297–314, 2001.
-  L. Deng. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Signal and Information Processing, 3, 2014.
-  K. Dhamdhere, K. McCurley, M. Sundararajan, Q. Yan, and R. Nahmias. Analyza: Exploring data with conversation. In Proc. 22nd International Conference on Intelligent User Interfaces, pages 493–504, 2017.
E. Fast, B. Chen, J. Mendelsohn, J. Bassen, and M. S. Bernstein.
Iris: A conversational agent for complex tasks.In Proc. CHI Conference on Human Factors in Computing Systems (CHI’18), 2018.
-  C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.
-  J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In Proc. Provenance and Annotation of Data: International Provenance and Annotation Workshop, pages 10–18, 2006.
-  T. Gao, M. Dontcheva, E. Adar, Z. Liu, and K. G. Karahalios. DataTone: managing ambiguity in natural language interfaces for data visualization. In Proc. 28th Annual Symposium on User Interface Software and Technology (UIST’15), pages 489–500, 2015.
-  I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT press, 2016.
-  L. Grammel, M. Tory, and M. A. Storey. How information visualization novices construct visualizations. IEEE Trans. Visualization and Computer Graphics, 16(6):943–952, 2010.
-  P. E. Haeberli. ConMan: A visual programming language for interactive graphics. ACM SigGraph Computer Graphics, 22(4):103–111, 1988.
-  E. Hoque, V. Setlur, M. Tory, and I. Dykeman. Applying pragmatics principles for interaction with visual analytics. IEEE Trans. Visualization and Computer Graphics, 24(1):309–318, 2018.
-  W. Javed and N. Elmqvist. ExPlates: Spatializing interactive analysis to scaffold visual exploration. Computer Graphics Forum, 32(2):441–450, 2013.
-  A. Kumar, J. Aurisano, B. D. Eugenio, A. Johnson, A. Gonzalez, and J. Leigh. Towards a dialogue system that supports rich visualizations of data. In Proc. 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2016.
-  V. I. Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10:707, 1966.
-  Y. Li, H. Yang, and H. V. Jagadish. NaLIX: A generic natural language search environment for XML data. ACM Trans. Database Systems, 32(4), 2007.
P. Liang and C. Potts.
Bringing machine learning and compositional semantics together.Annual Review of Linguistics, 1:355–376, 2014.
-  J. Mackinlay, P. Hanrahan, and C. Stolte. Show Me: Automatic presentation for visual analysis. IEEE Trans. Visualization and Computer Graphics, 13(6):1137–1144, 2007.
-  C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, P. Inc, S. J. Bethard, and D. Mcclosky. The Stanford CoreNLP natural language processing toolkit. In Proc. 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14): System Demonstrations, pages 55–60, 2014.
-  J. Meyer-Spradow, T. Ropinski, J. Mensmann, and K. Hinrichs. Voreen: A rapid-prototyping environment for ray-casting-based volume visualizations. IEEE Computer Graphics and Applications, 29(6):6–13, 2009.
-  G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88, 2001.
-  S. G. Parker and C. R. Johnson. SCIRun: A scientific programming environment for computational steering. In Proc. ACM/IEEE Conference on Supercomputing, 1995.
-  P. Pasupat and P. Liang. Compositional semantic parsing on semi-structured tables. In Proc. Annual Meeting of the Association for Computational Linguistics (ACL’15), 2015.
-  J. Poco, H. Doraiswamy, H. T. Vo, J. L. D. Comba, J. Freire, and C. T. Silva. Exploring traffic dynamics in urban environments using vector-valued functions. Computer Graphics Forum, 34(3):161–170, 2015.
-  J. C. Roberts. Waltz - an exploratory visualization tool for volume data, using multiform abstract displays. In Proc. SPIE Visual Data Exploration and Analysis V, volume 3298, pages 112–122, 1998.
-  J. C. Roberts. On encouraging coupled views for visualization exploration. In Proc. SPIE Visual Data Exploration and Analysis VI, volume 3643, pages 14–24, 1999.
-  V. Setlur, S. E. Battersby, M. Tory, R. Gossweiler, and A. X. Chang. Eviza: A natural language interface for visual analysis. In Proc. 29th Annual Symposium on User Interface Software and Technology (UIST’16), pages 365–377, 2016.
-  B. Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Proc. IEEE Symposium on Visual Languages, pages 336–343, 1996.
Introduction to the Theory of Computation. Cengage Learning, 3rd edition, 2012.
-  A. Srinivasan and J. Stasko. Orko: Facilitating multimodal interaction for visual exploration and analysis of networks. IEEE Trans. Visualization and Computer Graphics, 24(1):511–521, 2018.
-  A. Srinivasan and J. T. Stasko. Natural language interfaces for data analysis with visualization: Considering what has and could be asked. In Eurographics Conference on Visualization (EuroVis’17 short paper), 2017.
-  Y. Sun, J. Leigh, A. Johnson, and S. Lee. Articulate: A semi-automated model for translating natural language queries into meaningful visualizations. In Proc. 10th International Conference on Smart Graphics, pages 184–195, 2010.
-  B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. MIT Press, 2003.
-  C. Upson, J. Faulhaber, T.A., D. Kamins, D. Laidlaw, D. Schlegel, J. Vroom, R. Gurwitz, and A. van Dam. The application visualization system: a computational environment for scientific visualization. IEEE Computer Graphics and Applications, 9(4):30–42, 1989.
-  Y. Wang, J. Berant, and P. Liang. Building a semantic parser overnight. In Proc. Annual Meeting of the Association for Computational Linguistics (ACL’15), 2015.
-  K. Wolstencroft, R. Haines, D. Fellows, A. R. Williams, D. Withers, S. Owen, S. Soiland-Reyes, I. Dunlop, A. Nenadic, P. Fisher, J. Bhagat, K. Belhajjame, F. Bacall, A. Hardisty, A. N. de la Hidalga, M. P. B. Vargas, S. Sufi, and C. A. Goble. The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Research, 41(W1):557–561, 2013.
-  Z. Wu and M. Palmer. Verbs semantics and lexical selection. In Proc. 32nd Annual Meeting on Association for Computational Linguistics (ACL’94), pages 133–138, 1994.
P. Yin, Z. Lu, H. Li, and B. Kao.
Neural Enquirer: Learning to query tables with natural language.
Proc. International Joint Conference on Artificial Intelligence (IJCAI’16), 2016.
-  T. Young, D. Hazarika, S. Poria, and E. Cambria. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3):55–75, 2018.
-  B. Yu and C. T. Silva. VisFlow – Web-based visualization framework for tabular data with a subset flow model. IEEE Trans. Visualization and Computer Graphics, 23(1):251–260, 2017.
-  V. Zhong, C. Xiong, and R. Socher. Seq2SQL: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103, 2017.