GenNI: Human-AI Collaboration for Data-Backed Text Generation

10/19/2021
by   Hendrik Strobelt, et al.
ibm
5

Table2Text systems generate textual output based on structured data utilizing machine learning. These systems are essential for fluent natural language interfaces in tools such as virtual assistants; however, left to generate freely these ML systems often produce misleading or unexpected outputs. GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text. The tool utilizes a deep learning model designed with explicit control states. These controls allow users to globally constrain model generations, without sacrificing the representation power of the deep learning models. The visual interface makes it possible for users to interact with AI systems following a Refine-Forecast paradigm to ensure that the generation system acts in a manner human users find suitable. We report multiple use cases on two experiments that improve over uncontrolled generation approaches, while at the same time providing fine-grained control. A demo and source code are available at https://genni.vizhub.ai .

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 5

page 6

05/10/2020

Posterior Control of Blackbox Generation

Text generation often requires high-precision output that obeys task-spe...
11/27/2017

Neural Text Generation: A Practical Guide

Deep learning methods have recently achieved great empirical success on ...
01/18/2022

Youling: an AI-Assisted Lyrics Creation System

Recently, a variety of neural models have been proposed for lyrics gener...
06/06/2019

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

Accessibility is a major challenge of machine learning (ML). Typical ML ...
02/24/2021

Themisto: Towards Automated Documentation Generation in Computational Notebooks

Computational notebooks allow data scientists to express their ideas thr...
05/26/2019

Technical Report of the Video Event Reconstruction and Analysis (VERA) System - Shooter Localization, Models, Interface, and Beyond

Every minute, hundreds of hours of video are uploaded to social media si...
06/02/2021

Detecting Bot-Generated Text by Characterizing Linguistic Accommodation in Human-Bot Interactions

Language generation models' democratization benefits many domains, from ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Model: Table2Text with Controls

Underlying GenNI is a model designed for controllable text generation to enable visual interaction. The model extends standard ML models for text generation with explicit control states (i.e., discrete latent variables) that allow an end-user to alter the model’s output through constraints. Additionally, given an input and output, the model provides a method for inferring the control states. These control states are the main interface used by GenNI.

Table2Text generation aims to produce a textual description consisting of word tokens from an input of data represented as a table. While one could generate this description directly from the data, we distinguish controllable systems as ones that provide intermediate control states for directing the structure of the description. We use one control state for each word, , that are generic discrete values from a small label set, for example, represented as letters A to Z. Each control state corresponds to a high-level cluster of the corresponding word’s semantics learned by the model for the problem. An underlying assumption of this work is that an end-user can craft higher-level constraints with these states than by acting directly on words.

Figure 1: The different modes (M1-M3) of the model which are the building blocks that enable visual interaction for GenNI. Modes M1 and M2 are outputs and inferred from by free and controlled generation. Mode M3 describes the inference of control states from data and text . (see subsection 1.1)

Formally, the model outputs control states and one word at a time. Starting from data , the model generates the description auto-regressively (left-to-right) by first generating a state and then the corresponding word.

This process produces the probability of the description and states given the data

. Each part is implemented using a deep learning model. We utilize a recurrent neural network to predict both the

and outputs and an attention based encoder to condition on the input . Specifically the probabilities are given as:

(1)
(2)

where

is the output of a recurrent neural network over the input, previous words, and control states,

are parameters, and is a function of the current control state.

A key aspect of the system will be the reverse process, i.e. inferring control states from examples tables and sentences . Unfortunately, the posterior distribution is intractable to compute exactly. We therefore employ variational inference to approximate this distribution using a parameterized inference network

. We train this approximation jointly with the forward model in the standard variational autoencoder framework 

[30, 41, 50].

For the inference network, we use a linear-chain conditional random field (CRF) with a neural parameterization. This family of distributions is particularly suited for labeling segments of the text with control states, and has been used effectively in similar tasks [33, 14]. To train this part of the system, we use the Gumbel-CRF method proposed by Fu et al.[14] which allows us both to learn the variational approximation and also train the rest of the model with concrete control states.

Finally, during training we enforce a soft correspondence between control states and table properties. These constraints enforce weak supervision in the form of a heuristic alignment between data and text, i.e. whether some part of the sentence is describing a table field using identical text as in the table. Following Li et al.

[33], we use a technique known as posterior regularization which allows the model to follow or ignore these alignments. We find this can produce more human-legible control states for some of our tasks.

1.1 Model Interaction

Directly visualizing deep learning models is challenging [20]. Instead of visualizing the internals of the model, we interact with it through its outputs and control states through three distinct modes shown in Figure 1: (M1) free generation where the model searches for the most likely textual output given the input data; (M2) controlled generation where the model searches for an output that obeys a constraint graph; (M3) inferred control a reversed version of generation, where the user provides a goal text, and the control states are inferred. These three modes will act as the building blocks for the interactive system.

Free Generation (M1): The most basic operation is to allow the AI model to generate freely. This mode produces the highest-scoring output from the model, formally . Note, though, that high scores from the model often do not correspond to generated text that a human user would have wanted.

Practically, computing this argmax is a search problem over a very large search space. It is common to utilize an algorithm known as beam search as an approximation. Beam search works by exploring the search tree using a fixed number of hypotheses per time step, i.e., considering five different hypothesis sentences of words long before moving on to five different hypothesis sentences of words. We can extend the beam search for controlled generation to alternate between exploring the best succeeding control state and the best next word. This search is shown in Figure 1 (top).

Controlled Generation (M2): An alternative to free generation of the output text is to control the generation to fit specific cases of an end-user application. For this mode, the user provides an explicit constraint graph to the model. These constraints are applied to the control states , which act to restrict the word outputs . For controlled generation, we solve where is the constrained set of possible outputs.

Formally, the constraint graph is equivalent to a regular expression on control states restricting the set . Regular expressions allow a user to encode complex sets for shaping the space of possible outputs. To leverage this control, we use a constrained beam search algorithm where the constraint graph ensures that is correct for the model output (Figure 1 (middle)). For instance, given the constraint A.B*, during generation, we ensure that the control state of the first word is A, the second is unconstrained ., and the remaining words must have control state B.

It is important to note that these constraints do not directly constrain the words generated . Each control state can generate many possible words. This can be seen in Figure 5 (g) where, given a control state, the model has options over the word to generate.

Inferred Control (M3): As described above, the system is trained to also allow for reverse computation of control states, using an inference network. This mode allows for the inferred control where the user can write an expected output sentence, and the model will produce plausible control states . While generation allows us to infer output given an input, this term allows us to find the control states for a human written input and output (Figure 1 (bottom)).

Figure 2: Motivating use case, steps 1–3. (a) Model output during free generation (M1). The control states are indicated by color below the produced output tokens. (b) Alice provides the custom output “the phoenix is a french pub .” The matching control states are inferred and mapped to colors (M3). (c) The constraint “FFJKECT” is applied to other random inputs (M2).

2 Motivating Case Study: Collaborative Generation

We now consider a motivating case study of human-AI collaboration for text generation in GenNI. Alice is a user building an ML-based chatbot system. She is designing a module that generates a restaurant description from a table [43] such as the following:

name eat type food area near
the phoenix pub french city center cafe sicilia

Alice first tried developing a system using free generation (M1) but found that the generated text is highly variable and does not suit her specific use case, which requires high precision constraints. In particular, Alice wants the system to focus exclusively on the cuisine (food) and the type of establishment (eat type). However, she found that, under free generation, the model would use all of the fields, e.g., generating:

the phoenix is a french pub near cafe sicilia in the city centre .

To benefit from our controllable system, Alice needs to provide specific constraints on the control states to produce high-precision outputs that fulfill her specified goals. Concretely, she will want to develop a constraint graph that ensures that her system outputs are correct. We describe her use of GenNI to achieve this goal.

Step 1: Observing Control States on an Example To begin to gain intuition into model prediction and the control states, Alice starts with a specific example input , e.g., the table row shown above. GenNI produces text in free model generation (M1) while also showing the control states for each word token – Figure 2a.

This step grounds the collaboration process in a specific starting point. This visual representation maps concrete textual outputs to the underlying control states. For this example, Alice infers that the model has allocated the blue state for the restaurant name, red for food type, and cyan for location.

Step 2: Inferring Control States for Manual Output

Next, Alice can actively posit a counterfactual: “What would the control states have been for a user-generated target output?” To do this, Alice can provide her own textual description (“

the phoenix is a french pub .”) to the system in the form of the sequence . Utilizing the model’s inference mode (M3), the model will infer the control states that would have most likely lead to that output – Figure 2b.

This step is the start of the refinement procedure that allows Alice to build up a constraint graph on the model itself. Specifically, if she is happy with the control states that the model assigned, she can add this sequence of control states to the constraint graph. This refinement tells GenNI that this sequence of control states is appropriate for the AI model to generate.

Step 3: Forecasting AI Generations Under Constraints The constraint graph allows Alice to ensure that the model generates outputs from a set of acceptable control state sequences. However, it can be difficult to tell how constraints will generalize across examples. GenNI allows users to do this through forecasting, applying controlled generation (M2) across a set of different diverse inputs. In this case, GenNI allows Alice to randomly sample different inputs with different properties to observe generations from the system or, alternatively, probe a range of targeted inputs – Figure 2c.

Alice can then view all these outputs simultaneously to observe patterns and relationships. She confirms her hypothesis about the control states for the restaurant name (blue) and food type (red). The forecasting feature of GenNI makes it easy for her to see specific regions where the AI failed to generate the correct output.

Step 4: Precision Refinement of Constraints: The set of control states obtained in Step 2 can be applied as a constraint for other inputs (see Step 3). However, this constraint is specific to the input used to obtain it and may not generalize well. Alice can use a regular expression editor to refine the constraint so that it may generalize better.

When browsing through the forecast results, Alice spots a particular problem. The model copies the establishment name “strada” twice from the input and also fantasizes a food type – Figure 3a.

Alice knows from her exploration that the blue control state instructs the model to copy the establishment name. However, her current constraint is too rigid, since some restaurants like “strada” are only a single word (vs. “the phoenix”). Similarly, some tables do not reveal the food type. By switching to this example and correcting (using step 2) for the repetition, Alice can include the corrected sequence of control states into the constraint graph to allow for both outputs – Figure 3b.

Alternatively, using the constraint editor, Alice can manually replace the sequence of two blue boxes to a variable length repeat of blue boxes and make the red box optional – Figure 3c.

After applying the newly created constraint, she can forecast again and confirm that the issue has been fixed – Figure 3d.

Figure 3: Motivating use case, step 4. (a) Alice observes that “strada” is copied twice and no information about cuisine is available. (b) The selected example sequences of control states are merged into a combined constraint graph shown in the Constraint Editor view. (c) The constraint graph can be edited in the Constraint Editor by either using the text editor or the visual editing tool. (d) After applying the refined constraint, Alice observes the correct output: “strada” is copied once and no food type is mentioned.

Step 5: Building Constraints into a Model: Alice can repeat this process of forecasting and refinement to obtain a constraint that generalizes well. She repeats Steps 1-4, each time alternating between observing generation, inferring control states for custom outputs, transferring across input types, and merging constraints. In each iteration, the constraint graph grows with the addition of more rules. Finally, Alice can save the constraints to use the constrained model on a more extensive test set and in production.

Figure 4: Overview of GenNI domain goals, interaction tasks, and addressed user groups. GenNI aims at supporting goals for working with control state models: (G1) ensuring controlled generation, (G2) evaluating these constraints on subsets of relevant data and demonstrate model constraint generalization, and (G3) deploying the outcome as a controlled Human-AI system. From these high-level goals, a series of interaction tasks (middle part) is inferred. These tasks are the building blocks for the main interaction loop of forecasting constraint effects and refining constraints as a result. GenNI targets end-users and model architects alike for most tasks (right). (see section 3)

3 Goals, Tasks and User Groups

GenNI aims to support the collaborative development of data-backed generation systems. AI tools for generation can efficiently produce textual outputs on a variety of inputs; however, without close inspection of these free generations, it is difficult for a human user to find issues or correct errors. Alternatively, human users can produce careful example outputs, but each is slowly crafted and hard to generalize.

At each alternating round of GenNI’s use, either the user can formulate explicit constraints on the AI system, or the AI system can generate a set of outputs based on its current state. In this manner, the user can quickly observe that the system is over-constrained or under-constrained while at the same time having assistance from the AI system to help produce generalizable constraints. When the collaboration is over and both sides reach an equilibrium, the full set of constraints produced in the process can be incorporated into the AI system.

To act as a tool for reaching this human-AI equilibrium for constraints, GenNI was designed with three high-level user goals in mind (Figure 4):

G1: Ensure controlled generation AI systems with free generation can produce unexpected outputs which do not follow the guidelines that the user prefers. The goal is to provide feedback controls in the form of constraints. The language of control states may be difficult for a user to apply directly, so the tool must convey how these work and make it easy for the user to link these to specific examples. Once constraints are created, a user needs to be able to manipulate the constraints in an intuitive and precise manner.

G2: Demonstrate model constraint generalization Upon specifying constraints, the user needs to understand how the AI will apply and interpret them in a global manner. While the user may have an intuitive sense of the constraints, they will not know whether they will act consistently and naturally across any input the AI receives. As such, a tool needs to provide guidance about general outputs such that a user can build intuition and trust the system.

G3: Deploy as a controlled Human-AI system Many approaches for debugging neural models find issues but do not provide a path for remediation. The final goal of GenNI is to produce a constraint set that can be packaged and deployed as part of a production model. The constructed constraint graph contains all of the final information about the appropriate controlled use of the system, and controlled generation can be efficiently run on real systems. After deployment, if an issue comes up in production use, new constraints can be refined into the model.

The case study in section 2 takes advantage of these goals. The user is interested in the targeted use of the generation system in Step 1. Step 2 is a step towards defining constraints based on the user goals (G1). These constraints need to be explored on a larger set of examples (G2) in Step 3. However, upon observation, further refinement is needed in step 4 (return to G1). Finally, the negotiation leads to a model in Step 5 that can be deployed in a production system (G3).

These three domain goals motivate the main interaction and tasks of the system:

Task 1: Browse and Modify AI Generations by observing the textual output and control states for possible inputs . The user should be able to modify outputs manually or by using alternative model predictions. The user should develop an intuition about the control states and about the variety of outputs the model generates [G1, G2].

Task 2: Derive Constraints from Control Sequences in order to produce controls on the AI system. The user should be able to define an initial constraint graph from various preferred examples [G1]

Task 3: Modify Constraint Graph to allow for finer grained control of the final constraints. Some generalizations cannot be derived from examples directly but require user adjustments. These should be given enabled via a textual and a visual interface. [G1]

Task 4: Produce AI Outputs on General Examples to observe the generalization behavior and to confirm the correctness of the systems. A user should be able to forecast what the current change of model constraints would mean on a more global scale. This should be at close to interactive rates. [G2]

Task 5: Export controlled model to allow deployment of the model in production or for broader testing. [G3]

GenNI targets two user groups (see [55]). One group is a technically versed end-user that does not need to know about the underlying model, just about the task at hand. We call this group technical editors. The other group is model architects that want to evaluate their model under real human constraints. See Figure 4 for reference.

Figure 5: The GenNI user interface is split into a Constraint Refinement component (a) and a Generation Forecast component (b). A user can edit constraints directly in the Constraint editor (c) or derive them from a set of examples (d). Examples can be created (e) by inferring control states from a user output (f) or from the model’s beam tree (g). The effect of specific constraints can be forecast on random samples (h) or tested on data ranges (i). The controlled model code can be exported for deployment or further testing (j). Model architects can investigate the control state alignments for each control state (k). Details are given in section 4.

4 Design

GenNI is an interactive prototype for facilitating collaborative interaction for controlled generation. The visual layout and the interactions are the results of an iterative design process between visualization experts, NLP researchers developing controllable models, and users aiming to deploy these models in practice.

The GenNI interface is constructed out of two meta components that immediately reflect the domain goals G1 and G2. These components are juxtaposed to facilitate the continuous iteration between forecasting constraints on global examples and refining the constraint as a result of observing global effects. Accordingly, the left half collects Refine Constraint views. (subsection 4.1) The right half provides Generation Forecast views. (subsection 4.2)

4.1 Constraint Refinement Component

The Constraint Refinement component (Figure 5a) facilitates the construction and exploration of constraints by the user (G1). It allows direct editing of constraints and constructing constraints from well-crafted examples. This component has three supporting views: 1) a Constraint Editor (5c) which allows direct textual and visual modification of constraints, 2) a Refine-by-Example (5d) view to collect and utilize examples for constraint refinement, 3a) an Example Creation view (5e) to construct output examples by manual edits (5f), and 3b) to generate examples utilizing alternative model decisions (5g).

The Constraint Editor (Figure 5c) encodes the entire collaborative state of the system, i.e., all information collected to constrain the generation procedure. A user can add multiple different valid control state sequences, and they will be merged into this graph [T2]. The constraint graph is represented in two ways. First, it is encoded as an editable node-link diagram. The user can modify and update the constraint by visually adding, deleting, and modifying nodes and meta-nodes (like OR) [T3]. Secondly, the constraint graph is represented by a simple language borrowing the syntax from regular expressions. At the top of the Constraint Editor, this textual representation can be modified directly [T3]. For keeping track of provenance, a history field collects previous iterations of constraint formulations in textual form. On click, they are available for re-editing.

To develop new constraints from user preferred examples, the user needs a place to collect examples and merge them into complex constraints. The Refine-by-Examples view (Figure 5d) provides these functions. Each example represents as text and as a color. Their origin (i.e., human-generated or model alternative) is encoded as a postfix symbol (Figure 5d1) followed by buttons to trigger propagation to the Constraint Editor [T2] or forecasting in the Global Forecast view [T4] (see Figure 5h). If multiple examples are selected, the Merge Constraints button attempts to merge all of them into one combined constraint graph for the Constraint Editor [T2].

Examples are created by using the Example Creation view [T1] (Figure 5e). The user starts by first setting a reference input . This input can be acquired by selecting an input ID to point to one item in the model’s test set directly or randomly. Alternatively, examples can be selected from the Global Forecast view.

To produce a matching output, the user can write a custom freehand text (Manual Output, Figure 5f) and derive the matching control states from the inference network (M3) of the model.

A second way to produce outputs is by meaningfully interacting with the model internals to modify its predictions (5g). The user can create these alternative model outputs by constraining the beam tree (lower part) from the visual tree representation or by selecting alternative tokens by clicking on a token. The beam tree tool allows the user to see the paths taken by the model during beam search, probe its decisions at specific locations and even alter its decisions (when choosing the control state) to see in real-time the effect of changes to the constraint graph. While this view might be complex for the end-user group, it provides a way to generate outputs that the model can reproduce.

In both cases, the control states sequence of the produced outputs defines a simple constraint and can be tested by forecasting it, or it can be added to the Refine-by-Example collection to create a new constraint graph.

4.2 Generation Forecast Component

The right-hand component of the GenNI ( Figure 5b) visualizes the model’s response to the constraints. It presents a global insight into the AI system by providing either free or controlled generation on a more extensive set of examples [T4].

The Global Forecast view ( Figure 5h) conveys a global perspective on the effect of constraints and utilizes random sampling from the test set to produce different values, which are input to constrained generation and produce and values. This sampling and generation results are shown as a tuple of two rows containing input table and output text with color highlighting. Each tuple can become the next reference example for the Example Creation view and, in this way, contribute to the refinement of the constraint. All values are summarized in a heatmap on top of the view to better see an alignment between constraint outputs.

The Range Forecast view ( Figure 5i) provides the same features like the Global Forecast view, but the values are selected from value ranges or lists of values. This allows more systematic testing in a local neighborhood of examples. E.g., for the use case of producing date strings (see section 5), a user could test which influence the day value has by producing and generations for all days of a specific month.

4.3 Encodings

The concept of a control state is central to the functioning of the collaboration and provides a shared space between the human user and the underlying AI system. All constraints are developed on these states, and they are the single unit of transfer between the two views (see  Figure 5). As such, GenNI uses a visual encoding of control states as colors in all locations, both in the constraint formulation and the evaluation side. Unlike words, which are very fine-grained, control states allow for a high-level color encoding. This visual encoding makes it easier for a user to see differences and anomalies in sequences quickly.

The use of color as a central encoding poses some design challenges. Legibility is drastically decreased if the contrast between background and text color is low. That restricts the use of colors to either very dark or very light palettes. To be less restricted in color choices, we changed our encoding for the combination of and from full background coloring to only color underlines. Only in the very space-limited beam tree view, we use color bleeding.

Since color encoding is a core part of our prototype, we also thought about methods to support two scenarios where the color encoding might not be sufficient: 1) when modifying the constraint graph by textual input, the user has to refer to the colors in a meaningful way; 2) our color choices are not colorblind-safe. To address these issues, we added an optional representation of control states as letters. On click, the user can reveal the letters for single generation tuples. See Figure 6.

Figure 6: On click, the control states are represented by letters to support formulation of constraint graphs and to support color-blind users.

The selection and arrangement of all functional parts of the GenNI interface underwent many iterations. E.g., during the experimentation phase, all views were organized as rows to a single vertical list. The idea of juxtaposing Constraint Definition and Generation Forecast and arranging the subviews to support this bifold character results from understanding the interplay between forecast and refinement as a forth-and-back loop and not as a strict sequential order first-a-then-b.

5 Use Cases

We apply the GenNI prototype to build controllable generation systems for two different domains. In both cases, a model architect utilized the system and explored the insights it gives for the problem and the underlying model. First, we build a model for a date conversion problem, where the model is simple enough such that all constraints can be explored. Next, we apply it to a real-world system using the E2E restaurant recommendation dataset [43].

Figure 7: Date Generation use case. (a) Model outputs during free generation [T1]. (b) Inferred control states for provided text output [T2]. (c) Constraint graph inferred and edited for this output format [T3].

Date Generation The first model is a synthetic date generation dataset where the input is a table representing a date consisting of (day, month, year). The corresponding output text is a sentence describing the date. Eight different formats of representing the same date data were created using nominal or ordinal days, changing the order of day and month, and deciding to use commas or not before the year. For example, consider the input:

day month year
14 9 2015

This date can be generated as “today is the fourteenth of september, 2015 .” or “today is september the fourteenth in the year 2015 .” as well as six other formats. Model control is used to select the preferred output form, e.g., the ordering of the days and months, use of commas, and ordinal vs. numerical ways of writing the day.

Under this well-specified task, the goal is to test if GenNI allows for reasonable clusters of words for the control states. We also want to see if we can construct constraint graphs for formats that generalize as expected. In particular, the text format allows for variable day lengths, so the constraint graph must allow for this output. GenNI provides tools for performing these tests through interactions with the AI system.

Figure 8: Final constraint graph produced in the Restaurant use case. Section (a) ensures generation of a location either in the order (area, near) or (near, area). Section (b) forces the use of family-friendly allowing an optional state for not. Section (s) is the seed that allows for additional descriptive information.
Figure 9: Alignment between control states, table field and text. Cyan is the control state for area. Orange for the near.
Figure 10: Generation output for constraint graph section (a) in Figure 8. Note that the first output does not utilize the family-friendly field. This will be corrected in section (b) of the graph.
Figure 11: Outputs of Range Forecast for the constraint graph. By providing the range no,yes for family-friendly, the system generates multiple tables with different values and uses these to produce text under the constraint graph. The output shows reasonable results for both values.

Using the Forecast Generation component for free generation [T1], we can confirm from Figure 7a that the model has learned reasonable clusters for the control states. We can see that the model uses red for year, purple for month, yellow for ordinal day, light blue for nominal day, and so on. This view confirms the model structure is correct and that the system will alternate between different styles of generation.

Furthermore, the Example Creation view in Figure 7b shows us that the control states inferred by the model [T2] also agree with the clustering observed. (Anecdotally, in early testing, the model architect was able to find an error in the model implementation based on the interaction mode through this process.)

Finally, the Constraint Graph view can be used to enforce that output text obeys a specified format. We consider the constraining to the format “the date is september the fourteenth, 2015 .”. Note, that the yellow state has been edited to add a repeat loop allowing for variable length (in practice, one or two) day output text. See Figure 7c.

Restaurant Recommendation For more complex use-cases, we turn to the problem of refining text for an assistive agent or bot. The E2E dataset [43] is a standard data set designed for benchmarking Table2Text generation systems to simulate conversational responses in a constrained environment. Here the input is a table containing information about a restaurant (subset of eight different possible fields). The corresponding sentence is a description of the data table .

For this use case, we assume the challenge is to constrain the output to highlight the location of the restaurant and whether it is family-friendly. There are several challenges that make this difficult. In the beginning, we do not know the right alignments between the control states and the relevant table fields. We also need to allow for different possible orderings of these fields. Additionally, we do not know how fields like family-friendly with yes,no values that cannot be copied directly are described.

Let us start with the location of the restaurant. There are two table fields area and near related to location and we would like to include both of them. Using the Control States (see Figure 9) section of the Generation Forecast view we can determine how these fields are used. This section shows overall alignment between control states, table fields (inputs) and text (output). We can see that the model uses cyan (A) control states for area, outputting words such as city and riverside from this state, and orange (G) control states for near.

We can now ensure that these fields are both used. We observe that location fields can appear in either order - (area, near) or (near, area). We encode this in a constraint graph shown in Figure 8a. The (a) section of this graph ensures that some descriptive text comes first (seed s), and then the model can fork to generate the location in either order. Upon constructing this section of the graph, we can check that this constrained generation is looking correct. The generation results in Figure 10 show that the generation with constraints works as expected, producing initial text followed by full location descriptions such as near raja indian cuisine in the city centre.

However, this constraint graph does not yet ensure that the text mentions whether the restaurant is family-friendly. To first determine how the system might encode this property, we use the Manual Output tool in the Refine Constraints section. Ignoring the rest of the text, we manually type out the phrases it is family friendly . and it is not family friendly .. The system then finds the best control states for each of the input tokens. In this case, the system produces the same control states for family friendly but differs in the not state. We can then combine these two control paths using the Refine by Examples tool. This effectively creates an optional not state in the graph, allowing both possibilities.

Finally, we combine this section (b) into the full constraint graph in  Figure 8. Together this ensures we have both the location and the description of family friendliness. We can ensure this works using the Range Forecast tool. This tool allows us to generate a range of possible input tables to test the output text. The outputs in Figure 11 for yes and no values for the family friendly table field shows that our constraint works well in both cases and even generates text in a different manner (kid friendly) than our manual input.

6 Implementation

GenNI requires interaction with a live model designed to facilitate interactive visualization and refinement. To support this, it uses tight integration of a model with the visual client. We based the interface between both parts on a REST API, and we used a custom generation system for the underlying model framework using Torch-Struct 

[51]. We designed an API to allow easy access of free generations, controlled generation, inferred control, and interaction with the model’s beam tree. Both the backend and frontend communicate constraints only through the control states that form the central white-box component of the model.

The model framework works within a FastAPI server to deliver content via a REST interface to the client. The client is written in Typescript. Most visualization components are using the d3js library. Source code, a demo instance, and a descriptive webpage are available at https://genni.vizhub.ai.

7 Related Work

7.1 Table2Text Generation Models

Methods for table2text generation are commonly divided into rule-based approaches, statistical methods, and neural models. Rule-based approaches merge domain knowledge into the text generation systems [42, 22, 49, 1, 2]. The domain knowledge can be encoded using hand crafted templates that map the data directly to language [18, 57, 37] or through rule-based transformations of semantic representations to produce the output text [48, 4, 47, 49]. Some systems combine template-based methods with standard rule-based approaches [3]. Our system does not use manual rule-based approaches; however, the learned control states are reminiscent of templates since the codes learn to align with specific characteristics of the text output. In this way, our approach has some similarities to statistical approaches that learn rules from training data [31, 12, 23]. Dou et al. [11] built a model called Data2Text Studio for automated text generation from structured data by extracting templates. Like our system, it provides the user tools to edit templates for models and APIs to generate text. However, our system uses a neural model and also constructs constraint graphs rather than hard-coded templates.

As with most tasks involving language modeling, neural network models have become popular in conditional text generation. These models have provided significant improvements in performance as compared to rule-based and statistical models. The most popular models are seq2seq models that use recurrent neural networks, especially LSTMs, [56] and transformer-based models that replace recurrence with multi-headed attention in a feed-forward set-up [58]. These seq2seq models have been used for conditional text generation by encoding the data as a source sequence and employing standard transduction methods [27, 38, 13, 32]. Transformer-based models that have been trained on huge corpus of data [45, 10, 27] such as GPT2 and BERT are commonly used to warm-start such models [46, 62]. Recently, transformer-based models similar to BERT [10] have been pre-trained on table dataset  [19]. Our system uses recurrent network models for its different components. However, since the working of our system depends on the probabilistic generative model (similar to a statistical modeling approach) and not the underlying implementation, it should be able to leverage larger transformer-based models, pre-trained or not.

7.2 Controllable Text Generation

Standard neural network models trained end-to-end are black-box text generators, and it is difficult to control the generated text. To this extent, recently developed methods allow injecting control into these models. The controllable attributes can vary from topic, sentiment, politeness, tense, ordering of information, content, etc. These models learn control codes [29] that only moderate high-level attributes such as sentiment [24, 35] and style [44, 52], and thus can still generate text that differ at the word and phrase levels. Other models manipulate the syntactic structure of generated text [7, 26, 8, 9].

For more fine-level properties, some models learn templates [61], alignment between data and text [6, 53]. Our system, which is built upon the model proposed by Li et al. [33] with a linear chain conditional random field in the inference network and trained using Gumbel approximation following Fu et al. [14] is most similar to these approaches. The control states learned to control some high-level semantics of the words generated and can be used to extract templates (in the form of constraint graphs). The semi-supervised training done for posterior regularization performs soft alignment between the text and the data.

7.3 Interactive Interfaces for Text Generation

Interactive interfaces for free text generation are increasingly popular. “Write with transformer” [25] completes paragraphs that have been started by user input using transformer models like GPT-2. Some commercial applications like GMail use language models to improve their sentence completion. The Google Translate UI uses text generation for translation. TabNine [34] offers language generation for programming languages integrated into multiple IDEs. MixingBoard [15] demonstrates interfaces for knowledge grounded stylized text generation. Text generation models can also be used to detect if the models have created an input text themselves [17, 63]. Note, though, that these differ from systems that focus on conditional generation.

CSI:Summary [16]

describes a system for text summarization that uses a controlled generation model. Outputs can be constrained as a response to user interactions. Data2Text Studio 

[11] allows formulation of constraints as set of Boolean rules. GenNI builds on the work of CSI:Summary.

7.4 Explainable AI for Sequence Models

Visualization for explainable AI is a very active research topic resulting in high-frequent publications. Hohman et al. [20] provide a comprehensive start into this topic. Here, we exclusively focus on approaches for sequence models.

As the fundamental and earliest building block, RNNs have been the subject of study. The “unreasonable” effectiveness of RNNs for encoding sequential information [28] can be interactively explored by approaches like LSTMVis [55], RNNVis [39] or ProtoSteer [40]. Several methods utilize the model’s gradient and map them to model input for analysis [5, 36].

Current state-of-the-art deep learning NLP methods, like seq2seq models or transformers, are more complex and require interactive methods to investigate this complexity. Seq2SeqVis [54] enables interactive what-if analysis of the five parts of a seq2seq model. BertViz[59] and exBert[21] allow a deep look into the attention mechanisms of transformer models. These tools provide interactivity to analyze single examples. GenNI extends from this idea and aims to generalize from concrete examples to a set of applicable rules for the whole dataset.

8 Conclusions and Future Work

We present GenNI, a system for collaborative development of data-backed text generation systems. Unlike many systems developed for understanding deep learning models for NLP, GenNI is designed to help users produce actionable constraints that can be used with systems designed for user control. The system facilitates a collaborative interaction with users refining explicit constraints on the model and the AI system forecasting generations on new data.

This style of controllable model can be designed for many different tasks in NLP and related domains. Building models with user understandable controls opens up the ability for explicitly collaborative systems as opposed to the trade-off of rule-based systems and full AI-driven outputs. Visual interaction plays a key role for making it possible for a user to intuit, develop, and apply these constraints in a test environment as well as deploy them in real systems. In GenNI, the encodings and structure used were targeted specifically to a class of controllable generation models, but the approach of a single control state shared in a refine/forecast setting can be applied much more broadly. Future work will look to develop shared encodings that can be applied to a wide class of controllable NLP models.

9 Acknowledgements

This work was partially supported by NSF grant III-1901030 and a Google Faculty Research Award.

References

  • [1] A. Belz. Probabilistic Generation of Weather Forecast Texts. In HLT-NAACL, 2007.
  • [2] N. Bouayad-Agha, G. Casamayor, and L. Wanner. Content selection from an ontology-based knowledge base for the generation of football summaries. In ENLG, 2011.
  • [3] S. Busemann and H. Horacek. A Flexible Shallow Approach To Text Generation. INLG, 1998.
  • [4] L. Cahill, C. Doran, R. Evans, C. Mellish, D. Paiva, M. Reape, and D. Scott. In Search of a Reference Architecture for NLG Systems. 2000.
  • [5] D. Cashman, G. Patterson, A. Mosca, N. Watts, S. Robinson, and R. Chang. RNNbow: Visualizing Learning via Backpropagation Gradients in Recurrent Neural Networks. IEEE Computer Graphics and Applications, 38(6):39–50, Nov. 2018. arXiv: 1907.12545. doi: 10 . 1109/MCG . 2018 . 2878902
  • [6] A. T. S. Chan, Y. Ong, B. Pung, A. Zhang, and J. Fu. CoCon: A Self-Supervised Approach for Controlled Text Generation. ArXiv, 2020.
  • [7] M. Chen, Q. Tang, S. Wiseman, and K. Gimpel. Controllable Paraphrase Generation with a Syntactic Exemplar. ACL, 2019. doi: 10 . 18653/v1/P19-1599
  • [8] E. Colin and C. Gardent. Generating Syntactic Paraphrases. In EMNLP, 2018. doi: 10 . 18653/v1/D18-1113
  • [9] J. Deriu and M. Cieliebak. Syntactic Manipulation for Generating more Diverse and Interesting Texts. In INLG, 2018. doi: 10 . 18653/v1/W18-6503
  • [10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT, 2019. doi: 10 . 18653/v1/N19-1423
  • [11] L. Dou, G. Qin, J. Wang, J. Yao, and C.-Y. Lin. Data2Text Studio: Automated Text Generation from Structured Data. In EMNLP, 2018. doi: 10 . 18653/v1/D18-2003
  • [12] P. A. Duboué and K. McKeown. Statistical Acquisition of Content Selection Rules for Natural Language Generation. In EMNLP, 2003. doi: 10 . 3115/1119355 . 1119371
  • [13] O. Dusek and F. Jurcícek. Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings. ACL, 2016. doi: 10 . 18653/v1/P16-2008
  • [14] Y. Fu, C. Tan, B. Bi, M. Chen, Y. Feng, and A. M. Rush. Latent Template Induction with Gumbel-CRFs. 34th Conference on Neural Information Processing Systems, 2020. arXiv: 2011.14244.
  • [15] X. Gao, M. Galley, and B. Dolan. MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 224–231. Association for Computational Linguistics, Online, July 2020. doi: 10 . 18653/v1/2020 . acl-demos . 26
  • [16] S. Gehrmann, H. Strobelt, R. Krüger, H. Pfister, and A. M. Rush. Visual Interaction with Deep Learning Models through Collaborative Semantic Inference. IEEE Transactions on Visualization and Computer Graphics, 26(1):884–894, Jan. 2020. Conference Name: IEEE Transactions on Visualization and Computer Graphics. doi: 10 . 1109/TVCG . 2019 . 2934595
  • [17] S. Gehrmann, H. Strobelt, and A. M. Rush. GLTR: Statistical Detection and Visualization of Generated Text. arXiv:1906.04043 [cs], June 2019. arXiv: 1906.04043.
  • [18] S. Geldof and W. Velde. An architecture for template based (hyper)text generation. 1997.
  • [19] J. Herzig, P. Nowak, T. Müller, F. Piccinno, and J. M. Eisenschlos. TAPAS: Weakly Supervised Table Parsing via Pre-training. In ACL, 2020. doi: 10 . 18653/v1/2020 . acl-main . 398
  • [20] F. Hohman, M. Kahng, R. Pienta, and D. H. Chau. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics, 25(8):2674–2693, Aug. 2019. doi: 10 . 1109/TVCG . 2018 . 2843369
  • [21] B. Hoover, H. Strobelt, and S. Gehrmann. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 187–196. Association for Computational Linguistics, Online, July 2020. doi: 10 . 18653/v1/2020 . acl-demos . 22
  • [22] E. Hovy. Automated Discourse Generation Using Discourse Structure Relations. Artif. Intell., 1993. doi: 10 . 1016/0004-3702(93)90021-3
  • [23] B. Howald, R. Kondadadi, and F. Schilder. Domain Adaptable Semantic Clustering in Statistical NLG. In IWCS, 2013.
  • [24] Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. Xing. Toward Controlled Generation of Text. In ICML, 2017.
  • [25] Huggingface. Write With Transformer, Mar. 2021.
  • [26] M. Iyyer, J. Wieting, K. Gimpel, and L. Zettlemoyer. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. NAACL-HLT, 2018. doi: 10 . 18653/v1/N18-1170
  • [27] M. Kale. Text-to-Text Pre-Training for Data-to-Text Tasks. INLG, 2020.
  • [28] A. Karpathy, J. Johnson, and L. Fei-Fei. Visualizing and Understanding Recurrent Networks. arXiv:1506.02078 [cs], Nov. 2015. arXiv: 1506.02078.
  • [29] N. Keskar, B. McCann, L. R. Varshney, C. Xiong, and R. Socher. CTRL: A Conditional Transformer Language Model for Controllable Generation. ArXiv, 2019.
  • [30] D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. 2nd International Conference on Learning Representations, 2014. arXiv: 1312.6114.
  • [31] I. Langkilde and K. Knight. Generation that Exploits Corpus-Based Statistical Knowledge. In COLING. doi: 10 . 3115/980451 . 980963
  • [32] R. Lebret, D. Grangier, and M. Auli. Neural Text Generation from Structured Data with Application to the Biography Domain. In EMNLP, 2016. doi: 10 . 18653/v1/D16-1128
  • [33] X. L. Li and A. M. Rush. Posterior Control of Blackbox Generation. ACL, 2020. doi: 10 . 18653/v1/2020 . acl-main . 243
  • [34] C. D. C. Ltd. TabNine - Code Faster with AI Code Completions.
  • [35] F. Luo, D. Dai, P. Yang, T. Liu, B. Chang, Z. Sui, and X. Sun. Learning to Control the Fine-grained Sentiment for Story Ending Generation. In ACL, 2019. doi: 10 . 18653/v1/P19-1603
  • [36] A. Madsen. Visualizing memorization in RNNs. Distill, 4(3):e16, Mar. 2019. doi: 10 . 23915/distill . 00016
  • [37] S. W. Mcroy, S. Channarukul, and S. S. Ali. An augmented template-based approach to text realization. Natural Language Engineering, 9(4):381–420, 2003. doi: 10 . 1017/S1351324903003188
  • [38] H. Mei, M. Bansal, and M. R. Walter. What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment. HLT-NAACL, 2016. doi: 10 . 18653/v1/N16-1086
  • [39] Y. Ming, S. Cao, R. Zhang, Z. Li, Y. Chen, Y. Song, and H. Qu. Understanding Hidden Memories of Recurrent Neural Networks. In 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 13–24, Oct. 2017. doi: 10 . 1109/VAST . 2017 . 8585721
  • [40] Y. Ming, P. Xu, F. Cheng, H. Qu, and L. Ren. ProtoSteer: Steering Deep Sequence Model with Prototypes. IEEE Transactions on Visualization and Computer Graphics, 26(1):238–248, Jan. 2020. doi: 10 . 1109/TVCG . 2019 . 2934267
  • [41] A. Mnih and K. Gregor. Neural Variational Inference and Learning in Belief Networks. CoRR, 2014. arXiv: 1402.0030.
  • [42] J. Moore and C. Paris. Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information. Comput. Linguistics, 1993.
  • [43] J. Novikova, O. Dušek, and V. Rieser. The E2E Dataset: New Challenges For End-to-End Generation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pp. 201–206. Association for Computational Linguistics, Saarbrücken, Germany, 2017. doi: 10 . 18653/v1/W17-5525
  • [44] S. Oraby, L. I. Reed, S. Tandon, S. SharathT, S. M. Lukin, and M. Walker. Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators. SIGDIAL Conference, 2018. doi: 10 . 18653/v1/W18-5019
  • [45] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language Models are Unsupervised Multitask Learners. p. 24, 2019.
  • [46] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu.

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.

    J. Mach. Learn. Res., 2020.
  • [47] E. Reiter. NLG vs. Templates. ArXiv, 1995.
  • [48] E. Reiter and R. Dale. Building applied natural language generation systems. Natural Language Engineering, 1997. doi: 10 . 1017/S1351324997001502
  • [49] E. Reiter and R. Dale. Building natural language generation systems. Cambridge University Press, USA, 2000.
  • [50] D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. Proceedings of ICML, 2014. arXiv: 1401.4082.
  • [51] A. Rush. Torch-Struct: Deep Structured Prediction Library. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 335–342. Association for Computational Linguistics, Online, July 2020. doi: 10 . 18653/v1/2020 . acl-demos . 38
  • [52] T. Shen, T. Lei, R. Barzilay, and T. Jaakkola. Style Transfer from Non-Parallel Text by Cross-Alignment. In NIPS, 2017.
  • [53] X. Shen, E. Chang, H. Su, J. Zhou, and D. Klakow. Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence. ACL, 2020. doi: 10 . 18653/v1/2020 . acl-main . 641
  • [54] H. Strobelt, S. Gehrmann, M. Behrisch, A. Perer, H. Pfister, and A. M. Rush. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Transactions on Visualization and Computer Graphics, 25(1):353–363, Jan. 2019. Conference Name: IEEE Transactions on Visualization and Computer Graphics. doi: 10 . 1109/TVCG . 2018 . 2865044
  • [55] H. Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. IEEE Transactions on Visualization and Computer Graphics, 24(1):667–676, Jan. 2018. Conference Name: IEEE Transactions on Visualization and Computer Graphics. doi: 10 . 1109/TVCG . 2017 . 2744158
  • [56] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to Sequence Learning with Neural Networks. In NIPS, 2014.
  • [57] M. Theune, E. Klabbers, J. D. Pijper, E. Krahmer, and J. Odijk. From data to speech: a general approach. Natural Language Engineering, 2001. doi: 10 . 1017/S1351324901002625
  • [58] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is All you Need. NIPS, 2017.
  • [59] J. Vig. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 37–42. Association for Computational Linguistics, Florence, Italy, July 2019. doi: 10 . 18653/v1/P19-3007
  • [60] S. Wiseman, S. Shieber, and A. M. Rush. Challenges in Data-to-Document Generation. In EMNLP, 2017. doi: 10 . 18653/v1/D17-1239
  • [61] S. Wiseman, S. Shieber, and A. M. Rush. Learning Neural Templates for Text Generation. EMNLP, 2018. doi: 10 . 18653/v1/D18-1356
  • [62] T. Wolf, V. Sanh, J. Chaumond, and C. Delangue.

    TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents.

    ArXiv, 2019.
  • [63] R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, and Y. Choi. Defending Against Neural Fake News. arXiv:1905.12616 [cs], Dec. 2020. arXiv: 1905.12616.