The ability to discover a program consistent with a given user intent (specification) is considered as one of the central problems in artificial intelligence[Green1969]. While significant progress has been made in synthesizing programs in different domains [Alur et al.2013], current synthesis techniques do not scale to larger and more complex programs. Moreover, the state-of-the-art synthesis techniques [Gulwani et al.2012]
require a great deal of domain expertise with manually designed heuristics and rules to develop an efficient search procedure. In this paper, we presentDapip, or Deep API Programmer, a system that aims to overcome some of these shortcomings by automatically learning a synthesis algorithm in the domain of data transformation tasks.
The process of transforming data from raw data into usable formats (also known as data wrangling) is a key problem faced by data scientists for any data analysis task. Some studies have reported that this data wrangling process can sometimes take up to 80% of the total data analysis time [Dasu and Johnson2003, Kandel et al.2011]. Recently, Programming-By-Example (PBE) techniques such as FlashFill [Gulwani2011, Gulwani et al.2012] and BlinkFill [Singh2016] were developed to help users perform data transformation tasks using examples instead of having to write complex programs. These techniques encode the space of programs using a domain-specific language (DSL), and then develop algorithms based on version-space algebra (VSA) [Polozov and Gulwani2015, Lau et al.2003] to efficiently search the space of programs. There are two key shortcomings of these approaches. First, the DSL is limited to only certain low-level syntactic regular expression-based operators that allow for an efficient structuring of search space. This limits the expressiveness of the PBE systems; for example, they do not allow semantic data transformations using arbitrary transformation functions such as obtaining month names from a date or abbreviating the state name in an input address. Second, building an efficient synthesizer using VSA requires a large engineering effort with manually designed heuristic rules.
We tackle the first shortcoming by designing Dapip’s DSL to have function APIs as the core element, which allows for composition of APIs with constant strings. The DSL consists of three kinds of APIs: regular expression-based APIs, lookup APIs, and transformation APIs. The regular expression-based APIs perform a regular expression-based transformation on the input strings, which are needed for syntactic data transformations. The lookup APIs search for a particular string in the input data based on a dictionary of strings, and the transformation APIs perform some transformation on top of a lookup operation based on a predefined mapping between two sets of strings. The lookup and transformation APIs allow for semantic data transformations.
The second shortcoming is handled by learning the synthesis algorithm in Dapip automatically from data using two recently introduced neural modules [Parisotto et al.2016]. The first module called the cross-correlational encoderHochreiter and Schmidhuber1997, Graves and Schmidhuber2005]
on the input and output strings and computing their cross correlation. The second module, the recursive-reverse-recursive neural network, orR3NN, encodes a partial derivation in the DSL and given the example encoding vector, returns a distribution over the space of possible expansions to the partial derivation. The R3NN incrementally builds a program in the DSL that is consistent with the input-output examples. The input-output encoder and the R3NN modules are trained end-to-end using thousands of programs and corresponding input-output examples, which are automatically sampled from the DSL.
We evaluate Dapip
on a set of synthetic and 238 real-world FlashFill benchmarks. Our experiments indicate that our deep learning based approach is able to effectively model and predict the presence of different types of APIs. It is able to solve 45% of the FlashFill benchmarks and significantly outperforms the enumerative search based baseline.
To summarize, the key contributions of this paper are:
We design an expressive DSL with APIs that can encode both syntactic and semantic data transformation tasks.
We automatically learn a synthesis algorithm for synthesizing programs in the DSL using neural architectures.
We evaluate our system Dapip on 238 real-world FlashFill benchmarks and thousands of synthetic benchmarks.
2 Motivating Examples
We present a few real-world examples to motivate the DSL.
An Excel user wanted to transform names to first initial followed by last name as shown in Figure 1. Since some input examples had optional middle names, the user was struggling to find a macro to perform the desired task.
|1||John S. Henry||J. Henry|
|2||Mike Stanley||M. Stanley|
|3||Bernie John Smith||B. Smith|
|4||Martha S Johnson||M. Johnson|
Dapip learns the following program for this task: . The learned program uses the GetFirstChar and GetLastWord APIs that belong to the class of regex APIs, which extract substrings from the input string based on regular expressions.
|1||500 Mem Dr., Cambridge, 02139||Cambridge, MA|
|2||22 NE Street, Redmond, USA||Redmond, WA|
|3||Seattle, 98002||Seattle, WA|
|4||21 Peace Ave., Kirkland, 98034||Kirkland, WA|
An Excel user had a list of addresses and wanted to extract the city and state values as shown in Figure 2.
This is an example of a very common task that can not be performed by systems such as FlashFill. Since the data is in many different formats, there is no consistent regular expression that can be used to extract the city names. Moreover, to obtain the state name, the system needs to use a transform API GetStateFromCity. Dapip learns the following program:
3 Overview of Approach
We now present an overview of our end-to-end system that learns to synthesize programs in a DSL that are consistent with a set of examples. The training phase of our system is shown in Figure 4 and the test phase is shown in Figure 3. We first design a DSL that allows for composition of nested API calls with constant strings. We designed this DSL after studying a large family of real-world string transformation tasks so that it is expressive enough to encode these tasks. During the training phase, we use a program sampler to uniformly sample a large number of programs from this DSL. For each program, we use a rule-based approach to construct 5 input strings for the program such that the prerequisites of the program are met. We obtain the output strings by executing the program on the input strings.
During training, each sampled program together with the corresponding input-output examples is used to train the R3NN model, a neural architecture that learns distributions over the expansions in the DSL conditioned on the examples. The examples are encoded using a second neural architecture called the cross-correlational encoder, which produces a fixed-dimensional vector. The R3NN system takes as input the input-output conditioning vector, the DSL, and the training program, and is trained to predict a conditional distribution over the set of DSL expansions. The next expansion is sampled from this conditional distribution, leading to the partial tree, and the procedure repeats; one can observe a potential order of the nodes growing in the respective figures.
The trained R3NN model can then be used to synthesize programs in the DSL given a set of examples. The trained model takes the input-output conditioning vector as input, and generates a distribution over the set of DSL expansions that are likely to be the expansions required to construct the desired program. The distribution is then sampled to derive programs in the DSL, where the order of expansions is specified by the distribution, as shown in the respective figure, and the system returns the first program that is consistent with the input-output examples.
4 Domain-Specific Language
The syntax of the domain-specific language for API-based string transformations is shown in Figure 5. The top-level construct of the language is the Concat function that returns the concatenation of its argument substrings . A substring expression can either be a constant string , the input string , or the result of an API function with as its argument. The Concat operator allows for composition of API calls with constant strings. The DSL consists of 3 types of APIs: regex APIs , lookup APIs , and transformation APIs .
Regex API : The regex APIs search for certain regular expression-based patterns in the input string and return the matched string. Some examples of regex APIs are GetFirstNum, GetBetFirstAndSecondCommas, etc. Our DSL consists of 104 such regex APIs.
Lookup API : The lookup APIs look for presence of certain strings in the input string and return the lookup string. Each lookup API consists of a dictionary of a finite collection of strings, which are used for searching input substrings. Some examples of lookup APIs are GetCity,GetState, GetStockSymbol etc. For example, the GetState API contains a dictionary of US state names, whereas the GetCity API contains a dictionary of US cities. Our DSL consists of such lookup APIs.
Transformation API : The transformation APIs consists of a dictionary , which maps a finite collection of strings to another finite collection of strings . These APIs search for a string in the input string and return the corresponding output string . Some examples of such APIs include GetStateFromCity, GetFirstNameInitial, etc. For example, the transformation API GetStateFromCity consists of a dictionary mapping a collection of US cities to the corresponding US states. Our DSL consists of such functions.
The full list of all functions is provided in Appendix A.
5 Neural Architecture for Search
The neural search over the programs in the DSL conditioned on the input-output examples is performed using the model outlined in [Parisotto et al.2016]. First, the input-output examples are encoded into a fixed length feature vector that aims to capturing shared patterns between the input and output strings. This example representation is then passed to a neural tree-based generative model over program trees, called R3NN, to generate the desired hidden program. We provide a high level overview of the both the architectures.
5.1 Neural Input-Output Encoder
The cross-correlational encoder generates a fixed-dimensional vector representation of a set of input-output (I/O) examples. Intuitively, the encoder needs to capture three key information: parts of the output strings that are likely to be constant strings, parts of the output strings that can be computed from input strings, and some characteristics of the example strings that will help the program generator module identify the set of useful APIs for the given task. To simplify the DSL, we assume a fixed universe of possible constant strings so that we can focus on training the encoder to produce the likely set of APIs.
The I/O encoder first runs two bidirectional LSTM networks separately on the input and output strings in each example pair, which produces two matrices of size , where is the LSTM hidden dimension and is the maximum length of the I/O string. The encoder then slides the output matrix over the input matrix for each time step and computes the outer product between respective matrix columns. There are in total alignments as we slide the matrices and we obtain vectors in total after the dot product. Finally, the encoder concatenates the values for overlapping time steps to obtain a -dimensional vector encoding for each example pair.
5.2 Tree-Structured Generation Model
The tree generation model incrementally constructs a program tree starting from the start symbol of the DSL grammar and expanding the tree with one derivation at a time until obtaining a tree with that consists only non-terminal nodes. The R3NN network assigns posterior probabilities to every valid expansion of a partial tree to guide the search algorithm. In other words, given a partial program tree, the R3NN network decides which non-terminal node to expand in the tree and with which expansion rule in the grammar.
The R3NN is defined by the following parameters: i) an -dimensional representation for every symbol in the grammar, ii) an -dimensional representation for each grammar rule , iii) a deep neural network for each grammar rule that takes as input a vector (where Q is the number of RHS symbols of ) and outputs a vector , and iv) a deep neural network (reverse of ) which takes as input a vector and outputs a vector .
Given a partial program tree, R3NN first assigns the representation to each leaf node , where denotes the grammar symbol of node . It then performs a standard recursive pass over the tree from bottom-to-top, by recursively applying for every non-leaf node on its RHS node representations to compute the representation of , where denotes the rule associated with node . This pass continues until we reach the root node. The represents information about all tree nodes, but does not encode any notion of the node positions in the tree. To solve this issue, R3NN performs a reverse-recursive pass starting from the root node to compute updated representations of all child nodes using the reverse deep network . After performing the reverse-recursive pass, each leaf node
is assigned a new distributed representation, which intuitively captures the global information about every other node in the tree.
The scores for each expansion can now be obtained from the global leaf representations . Let be the expansion type (production rule that applies) and let be the leaf node that is applied to for an expansion . The score of an expansion is calculated using and the probability of the expansion is obtained by exponentiated normalized sum over the scores: .
We now present results from two major sets of experiments and analyze the model in more detail with the goal of assessing its expressiveness. We demonstrate that our model is capable of learning to synthesize simple programs when provided with a library of over 100 API functions. We also show that the model is capable of strong generalization, where it can not only generalize across different I/O examples for a given program, but also across new, unseen programs.
6.1 Experimental Setup and Training Details
We use both synthetic benchmarks and real-world FlashFill benchmarks for evaluation. The synthetic benchmarks are obtained by sampling the programs in the DSL uniformly, and then using a rule-based approach to generate corresponding input-output examples. For example, if we sample a program consisting of GetThirdNum and GetState APIs, the rule-based approach would ensure that the input strings in the example consist of at least three numbers and one state strings. For each benchmark, we sample five input strings and the corresponding output strings are obtained by executing the sampled program on the input strings. Several examples of training data are shown in Appendix B.
We first train the R3NN on a DSL consisting of only one family of APIs to evaluate its effectiveness on learning individual API family. We call the models trained on only the regex APIs (and constant strings) as the FF models and call the corresponding DSL as the regex-only DSL. We then train the R3NN with all APIs to evaluate the effectiveness of learning programs in the DSL consisting of different APIs and their composition with constant strings; we call this DSL the full DSL. The models trained on the full DSL are called the FF++ models. Since the FlashFill benchmarks can be solved using only the regex APIs and the set of constant strings, we also evaluate the FF model on the FlashFill benchmarks.
We train the cross-correlation encoder and R3NN jointly with the principle of maximum likelihood; the model produces posterior probabilities over possible expansions and we backpropagate an error signal based on the ground truth programs. We use the Adam optimizer[Kingma and Ba2014]
, with an initial learning rate of 0.001 and clipping gradients at 10 for both modules. We found that small learning rates are crucial for R3NN to prevent unstable learning. Every epoch consists of 1000 training batches of 10 instances, where each instance contains a ground truth program and 5 input-output pairs. The evaluation on synthetic data is performed on programs that are not seen during training. We report results when evaluating with both 1-best inference and with stochastic search (10, 50, or 100 samples), where we resample a program conditioned on the same input-output examples multiple times. This way, we allow the model to have small errors in its final posterior probabilities for selecting an expansion.
6.2 Learning API Types
Each of the three classes of API functions, while much more interpretable, still pose nontrivial challenges for the model to learn to compose. The lookup API functions contain large dictionaries and the model must learn when to call such APIs given the input-output examples. For example, while the difference between names and cities may seem trivial to human practitioners, the model must learn to disambiguate each of these entities. The transformation API functions pose an additional challenge; with programs that require these types of API calls, not only does the model need to learn some encoding of the hidden dictionary, but the output string may not contain any obvious matching substring in the input string because of the nature of the API function. As a result, a simple string matching algorithm between the inputs and outputs will not work to solve this problem, and the input-output encoder must learn useful representations of pairs of them, and be expressive enough to capture the implicit string transformation. Lastly, the regex API functions do not encode dictionaries but represent syntactic substring operations, and the model must learn to recognize which API functions to call based on which parts of the output are present in the input.
We first present an ablative study of what class of APIs are the easiest to learn in isolation, and which one is the most challenging in the full DSL.
In Table 1, we report the training and validation set accuracies of different models trained on the regex-only DSL (FF model). The length column denotes the maximum length of programs that each model was trained on. The length 7 model was trained with 9000 programs, length 8 with 16000, length 9 with 616510, and length 10 with 1263000 programs. For validation, we select 1000 randomly chosen held-out programs from this set and generate new I/O examples to test the generalization power of the trained model.
Of particular note is the performance on programs of length 10. At this length, the DSL can generate programs with API nesting, API composition, and concatenation with a constant string; this represents all possible constructs in our DSL.
Lookup and Transform APIs
In this experiment, we fix the maximum size of the programs in the training and validation set to size 10 and only include the lookup and transform APIs in the DSL. The results are shown in Table 2. We find that when the DSL is restricted to these APIs, the trained models achieve a very high accuracy and are able to identify composition of APIs with very high precision.
All APIs: Regex + Transform + Lookup
We now present the model evaluation that was trained on the full DSL. Recall that because we’ve trained on the full DSL, these models are referred to as the FF++ models.
The performance of the FF++ models is shown in Table 3. We observe that both training and validation accuracies decreased as compared to the FF models, which is expected since we now have an increased set of APIs that also include more complex APIs encoding large dictionaries. However, the length 10 model is still able to get 44% accuracy.
We analyze these results further to understand the learnability of different APIs when trained together as shown in Table 4. The regex APIs seem to be the easiest to learn for the network, which may be accredited to the specific nature of the IO encoder, as it was designed to detect patterns in substrings between the input and output examples. Interestingly, the lookup APIs are harder to learn than the transformation APIs, which can be attributed to the fact that they encode larger dictionaries as compared to the dictionaries of transform APIs.
6.3 FlashFill using API Compositions
We now present the results of the best FF and FF++ models on the FlashFill benchmarks obtained from the authors of FlashFill [Gulwani et al.2012]. These benchmarks correspond to real-world string transformation tasks in Excel, where each benchmark comprises of 5 input-output string examples.
6.3.1 Ff models
Baseline performance with uniform search
We first present the results we obtain with a baseline uniform search model on the FlashFill benchmarks in Table 5. The baseline model performs a uniform search over the DSL expansions and is biased towards small programs. We also present stochastic sampling results for a fair comparison with the performance of the FF models.
The uniform search does surprisingly well considering the large space of all possible programs because the DSL we designed with APIs allows many of the benchmarks to be solved with a single call, e.g. GetFirstWord, and the uniform search sampler is biased towards shorter programs.
Ff Model performance on FlashFill Benchmarks
We now evaluate the trained models whose accuracies on synthetic data are reported in Table 1. Note that unlike in Table 1, each of the model is evaluated on the same dataset and so the results are comparable across rows. In this case, we not only report the results with stochastic sampling, but also report the 1-best programs under the 1 column in Table 6.
In this case, we observe that with 100 samples, the length-10 model is able to solve of the benchmarks. It surpasses the performance of Neural FlashFill [Parisotto et al.2016], which achieves an accuracy of 23% with 100 samples and 34% with 1000 samples. On further inspection of the benchmarks, we find that only of the benchmarks can be solved with programs of length in our DSL. If we normalize across this, we see that we can solve of all solvable benchmarks. This indicates that our model is capable of learning to synthesize realistic programs.
6.3.2 Ff++ Model
Baseline Performance with uniform search
We first present the baseline results of uniform search. Since the DSL has expanded, the uniform search performs slightly worse and can only achieve an accuracy of about 11% with 100 samples.
FlashFill benchmark performance
The results for evaluating the FF++ model on the FlashFill benchmarks is shown in Table 8. The length 10 models can still remarkably solve 37% of the benchmarks even with the extended DSL.
7 Related Work
We describe the related work from the domains of VSA-based programming by example systems and neural program induction and synthesis systems.
Programming By Example for String Manipulations
There has been much recent work on designing version space algebra-based PBE systems for performing data transformation and extraction. FlashFill [Gulwani2011, Gulwani et al.2012] is a PBE system that performs regular expression based string transformations using examples. Given an input-output example string, FlashFill first searches over all possible ways to decompose the output string and represent the set of those sub-programs concisely using a DAG data structure. This VSA-based approach has then been extended to also build PBE systems for number transformations [Singh and Gulwani2012b], table joins [Singh and Gulwani2012a], data extraction [Le and Gulwani2014], and data reshaping [Barowy et al.2015]. While these methods are interpretable and tractable, they are unscalable to any additions of new functionality. Dapip, unlike the VSA-based PBE systems, is trained automatically using the R3NN network by sampling several thousands of programs from arbitrary DSLs.
Neural Program Induction and Synthesis
There has been a plethora of recent work in both neural program induction and neural program synthesis. The goal in neural program induction is to teach neural networks the functional behavior of a program by augmenting the neural networks with additional computational modules such as Neural GPU [Kaiser and Sutskever2015]Graves et al.2014], and stacks-augmented RNNs [Joulin and Mikolov2015]. One limitation of these architectures is that although they are able to learn the functional behavior, they do not expose an interpretable program back to the user. In addition, they need to be trained per task separately, representing a lack of strong generality. More recent work, such as Terpret [Gaunt et al.2016] and Neural-RAM [Kurach et al.2015] seek to mitigate the interpretability issue but they need to be trained for each individual benchmark problem, which is prohibitively expensive.
A recent approach was proposed to use the R3NN-based neural architectures to synthesize programs in a DSL similar to that of FlashFill [Parisotto et al.2016]. We employ the same architecture but in a different DSL consisting of APIs at the core level of expressions. The APIs allows the program depth to be shallower than programs in a DSL with more primitives, and we investigate if that can make the task of automatically learning a search strategy easier for the R3NN. We argue that imposing higher-order functions is much more extensible and more akin to human-like programming.
8 Future Work
There are a number of ways that we can extend the results and techniques presented in this paper to yield both improvements in the current numbers as well as allow us to scale to larger programs.
8.1 Function embeddings
We rely on the R3NN and the input-output encoder to implicitly encode the semantics of each function, and we’ve shown through a number of experiments that the tree model is capable of doing so. This is impressive in its own right, but in order to improve the performance further, we should extend the model to support explicit, continuous representations of each function. This can be achieved in a number of ways - the simplest of which involves encoding each function as a randomly initialized vector and allowing the model to attend to API functions that may be relevant to the input-output examples. We can freeze the embeddings, or we can elect to backpropagate errors through both the attention mechanism and the embeddings, and jointly learn these representations. This represents a principled approach to adding new functions and method is easy to extend to additional API functions that we may choose to add.
8.2 Divide and conquer
Function embeddings allow us to perform better on existing problems by giving the model more information as to what choices to make when generating the tree. However, this does not resolve the issue of scalability. Even with function embeddings, as the inputs and outputs grow in size and complexity, we have no scalable method of performing inference over which programs to synthesize. However, instead of viewing the problem as a whole, we can break up the problem into smaller pieces and try to solve each subpiece and concatenate the answers together. This divide-and-conquer approach allows us to treat larger problems as conglomerations of a number of smaller problems. This procedure requires two general mechanisms: one module will need to predict how to split the output string into smaller, meaningful chunks, and the second module will consume each input-output piece, synthesize the correct program, and each piece will eventually be concatenated together. This is especially convenient in this problem setting because the FlashFill language is one that is focused on concatenations, so we lose no generality in being able to solve the problem.
8.3 Extending the DSL
An interesting extension of the DSL is to add multi-argument API function calls. This could yield more general API functions, such as GetNthObj(n, o), and could replace functions like GetFirstWord, GetSecondNumber. In addition, we can also add multi-argument Concat functions; this idea goes neatly with the divide-and-conquer approach and can be used to help scale the model to synthesize larger programs.
8.4 Batching Trees
While the divide-and-conquer approach is an algorithmic improvement to speed up the process of training the model, we can also take advantage of the model to incorporate faster batching proocols. Using a tree-based generative models allows us to batch operations together that occur at the same depth in the tree because each operation is indepenedent of all of its siblings. Moreover, we can also batch multiple trees together for increased performance.
In this paper, we presented Dapip, a system that tries to automatically learn a synthesis algorithm given a DSL. In particular, we designed a DSL consisting of APIs as first class constructs that allows the system to perform richer tasks using small sized programs. We used the recently introduced R3NN neural architecture to automatically learn a synthesis algorithm for our DSL. The preliminary results suggest that the system is able to efficiently learn programs up to size 10 with about 45% accuracy on real-world benchmarks. We believe this direction of using neural architectures to automatically develop synthesis algorithms for PBE systems can lead to big advancements in program synthesis techniques and make it more generally applicable to many new domains.
- [Alur et al.2013] Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo MK Martin, Mukund Raghothaman, Sanjit A Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design (FMCAD), 2013, pages 1–8. IEEE, 2013.
- [Barowy et al.2015] Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin G. Zorn. Flashrelate: extracting relational data from semi-structured spreadsheets using examples. In PLDI, pages 218–228, 2015.
- [Dasu and Johnson2003] Tamraparni Dasu and Theodore Johnson. Exploratory data mining and data cleaning, volume 479. John Wiley & Sons, 2003.
- [Gaunt et al.2016] Alexander L Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. Terpret: A probabilistic programming language for program induction. arXiv preprint arXiv:1608.04428, 2016.
- [Graves and Schmidhuber2005] Alex Graves and Jürgen Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5):602–610, 2005.
- [Graves et al.2014] Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
- [Green1969] Cordell Green. Application of theorem proving to problem solving. In Proceedings of the 1st International Joint Conference on Artificial Intelligence, IJCAI’69, pages 219–239, 1969.
- [Gulwani et al.2012] Sumit Gulwani, William R. Harris, and Rishabh Singh. Spreadsheet data manipulation using examples. Commun. ACM, 55(8):97–105, 2012.
- [Gulwani2011] Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, pages 317–330, 2011.
- [Hochreiter and Schmidhuber1997] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- [Joulin and Mikolov2015] Armand Joulin and Tomas Mikolov. Inferring algorithmic patterns with stack-augmented recurrent nets. In Advances in Neural Information Processing Systems, pages 190–198, 2015.
- [Kaiser and Sutskever2015] Łukasz Kaiser and Ilya Sutskever. Neural gpus learn algorithms. arXiv preprint arXiv:1511.08228, 2015.
- [Kandel et al.2011] Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. Wrangler: interactive visual specification of data transformation scripts. In Proceedings of the International Conference on Human Factors in Computing Systems, CHI 2011, Vancouver, BC, Canada, May 7-12, 2011, pages 3363–3372, 2011.
- [Kingma and Ba2014] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- [Kurach et al.2015] Karol Kurach, Marcin Andrychowicz, and Ilya Sutskever. Neural random-access machines. arXiv preprint arXiv:1511.06392, 2015.
- [Lau et al.2003] Tessa A. Lau, Steven A. Wolfman, Pedro M. Domingos, and Daniel S. Weld. Programming by demonstration using version space algebra. Machine Learning, 53(1-2):111–156, 2003.
- [Le and Gulwani2014] Vu Le and Sumit Gulwani. Flashextract: a framework for data extraction by examples. In PLDI, page 55, 2014.
- [Parisotto et al.2016] Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. Neuro-symbolic program synthesis. arXiv preprint arXiv:1611.01855, 2016.
- [Polozov and Gulwani2015] Oleksandr Polozov and Sumit Gulwani. Flashmeta: a framework for inductive program synthesis. In OOPSLA, pages 107–126, 2015.
- [Singh and Gulwani2012a] Rishabh Singh and Sumit Gulwani. Learning semantic string transformations from examples. PVLDB, 5(8):740–751, 2012.
- [Singh and Gulwani2012b] Rishabh Singh and Sumit Gulwani. Synthesizing number transformations from input-output examples. In CAV, pages 634–651, 2012.
- [Singh2016] Rishabh Singh. Blinkfill: Semi-supervised programming by example for syntactic string transformations. PVLDB, 9(10):816–827, 2016.
Appendix A The Complete Set of APIs
|LookUp (18)||Transform (13)||Regex (104)|
Appendix B Samples of Training Data
|DAPIP Prediction: (Concat (ConstStr CONST10) (GetStreetName (arg inp)))|
|} summer Impulse St. Pellerin||Mr.Impulse St.||Mr.Impulse St.|
|Hensley Bag St. HI Rinaldo Nolan @||Mr.Bag St.||Mr.Bag St.|
|hook Gertha % Plate St. hobbies MT||Mr.Plate St.||Mr.Plate St.|
|discussion Mcfarlin . Straw St.||Mr.Straw St.||Mr.Straw St.|
|hobbies Anger St. Twitty Downing ?||Mr.Anger St.||Mr.Anger St.|
|DAPIP Prediction: (Concat (ConstStr CONST36) (GetStateName (arg inp)))|
|MA , North Carolina Zehr Gilma||North Carolina||North Carolina|
|Utah Evelia % Nancy||Utah||Utah|
|Josh skin . Missouri Agudelo||Missouri||Missouri|
|yarn drawer ‘ Indiana||Indiana||Indiana|
|Sandidge ) key Indiana||Indiana||Indiana|
|DAPIP Prediction: (Concat (GetStateAbbrFromState (arg inp)) (ConstStr CONST25))|
|Elza Foot Locker Illinois @bo.com Mollett||IL *||IL *|
|$ can Sound St. mist Nevada||NV *||NV *|
|Harpin Utah . Reali RI Laurinda Borden||UT *||UT *|
|) Connecticut Belt Mortimer||CT *||CT *|
|Danita Tennessee throat||TN *||TN *|
|DAPIP Prediction: (GetSecondToLastWS (arg (GetCEO (arg inp))))|
|Eldora John Thain Marotta||John||John|
|Marya clover Sundar Pichai||Sundar||Sundar|
|327 drawer Gregory Wasson Kristian||Gregory||Gregory|
|! AOL Inc. Rinaldo quicksand James Gorman||James||James|
|Richard Johnson Barbie Gasaway||Richard||Richard|
Appendix C Samples of Solved FlashFill Benchmarks
|FlashFill Program: (SubStr (RegPos (RegexStr (ConstStr ”0-0”)) (k 1) (dir End))(RegPos (RegexStr REGEX4) (k 4) (dir End)))|
|DAPIP Prediction: (TrimLeadingZeros (arg (GetFirstDashToSecondDash (arg Inp))))|
|FlashFill Program: (Concat (SubStr (RegPos (RegexStr REGEX8) (k 1) (dir End)) (RegPos (RegexStr REGEX1) (k 1) (dir End))) (ConstStr ”@”))|
|DAPIP Prediction: (Concat (ToLowercase (arg (GetFirstWord (arg Inp))) (ConstStr CONST13))|
|FlashFill Program: (Concat (SubStr (RegPos (RegexStr REGEX8) (k 1) (dir End)) (RegPos (RegexStr REGEX4) (k 1) (dir End))) (ConstStr ”]”))|
|DAPIP Prediction: (Concat (GetStartToEndOfFirstNumber (arg (ToUppercase (arg Inp))) (ConstStr CONST12))|
|FlashFill Program: (SubStr (RegPos (RegexStr REGEX8) (k 1) (dir End)) (RegPos (RegexStr REGEX4) (k 1) (dir End)))|
|DAPIP Prediction: (GetLastNumber (arg (TrimSpaces (arg GetFirstAlpha (arg Inp)))))|
Appendix D Samples of Unsolved FlashFill Benchmarks
|FlashFill Program: (Concat (Concat (Concat (Concat (Concat (Concat (Concat (Concat (Concat (Concat (SubStr (RegPos (RegexStr REGEX8) (k 1) (dir End)) (RegPos (RegexStr REGEX1) (k 1) (dir End))) (ConstStr ”,”)) (SubStr (RegPos (RegexStr REGEX7) (k 1) (dir End)) (RegPos (RegexStr REGEX1) (k 2) (dir End)))) (ConstStr ”,”)) (SubStr (RegPos (RegexStr REGEX7) (k 2) (dir End)) (RegPos (RegexStr REGEX1) (k 3) (dir End)))) (ConstStr ”,”)) (SubStr (RegPos (RegexStr REGEX7) (k 3) (dir End)) (RegPos (RegexStr REGEX1) (k 4) (dir End)))) (ConstStr ”.”)) (ConstStr ”and”)) (ConstStr ”.”)) (SubStr (RegPos (RegexStr REGEX7) (k 4) (dir End)) (RegPos (RegexStr REGEX10) (k 1) (dir End))))|
|Tom Mickey Minnie Donald Daffy||Tom,Mickey,Minnie,Donald.and.Daffy|
|Ben Bill Jerry Meyer Rahul||Ben,Bill,Jerry,Meyer.and.Rahul|
|Shahrukh Aamir Salman Amitabh Ajay||Shahrukh,Aamir,Salman,Amitabh.and.Ajay|
|Kobe Lebron Dwayne Chris Kevin||Kobe,Lebron,Dwayne,Chris.and.Kevin|
|Earth Fire Wind Water Sun||Earth,Fire,Wind,Water.and.Sun|
|FlashFill Program: (Concat (Concat (Concat (SubStr (RegPos (RegexStr REGEX8) (k 1) (dir End)) (RegPos (RegexStr REGEX4) (k 1) (dir End))) (SubStr (RegPos (RegexStr (ConstStr ”1-”)) (k 1) (dir End)) (RegPos (RegexStr REGEX4) (k 2) (dir End)))) (SubStr (RegPos (RegexStr (ConstStr ”-”)) (k 2) (dir End)) (RegPos (RegexStr REGEX4) (k 3) (dir End)))) (SubStr (RegPos (RegexStr (ConstStr ”-”)) (k 3) (dir End)) (RegPos (RegexStr REGEX4) (k 4) (dir End))))|