Programming is a frustrating process: as the computer executes your code literally, any error in communicating how the computer should run would result in a bug. Program synthesis Solar Lezama (2008) aims to address this problem by allowing the user to specify what the program should do; provided this specification, a program synthesizer infers a program that satisfies it. One of the most well-known program synthesizers is FlashFill Gulwani (2011), which synthesizes string transformations from input/output examples. For instance, “Gordon Freeman” “G”, the FlashFill system infers the program: “first_letter(first_word(input))”. FlashFill works inside Microsoft Excel, and this program can then run on the rest of the spreadsheet, saving time for end-users. However, most specifications, especially those provided by a naive end-user, leave the synthesis problem ill-posed as there may be many programs that satisfy the specification. Here we introduce a new paradigm for resolving this ambiguity. We think of program synthesis as a kind of communication between the user and the synthesizer. Framed as communication we can deploy ideas from computational linguistics, namely pragmatics, the study of how informative speakers select their utterances, and how astute listeners infer intent from these “pragmatic” utterances Grice (1975). Intuitively, a pragmatic program synthesizer goes beyond the literal meaning of the specification, and asks why an informative user would select that specification.
Resolving the ambiguity inherent in program synthesis has received much attention. Broadly, prior work imposes some form of inductive bias over the space of programs. In a program synthesizer without any built-in inductive bias Solar Lezama (2008), given a specification , the synthesizer might return any program consistent with . Interacting with such a synthesizer runs the risk of getting an unintuitive program that is only “technically correct”. For instance, given an example “Richard Feynman” “Mr Feynman”, the synthesizer might output a program that prints “Mr Feynman” verbatim on all inputs. Systems such as Singh and Gulwani (2015) introduce a notion of syntactic naturalness in the form a prior over the set of programs: , where means is consistent with spec , and is a prior with parameters . For instance might disprefer constant strings. However, purely syntactic priors can be insufficient: the FlashFill-like system in Polozov and Gulwani (2015) penalizes constant strings, making its synthesizer explain the “r” in “Mr Feynman” with the “r” from “Richard”; when the program synthesized from “Richard Feynman”“Mr Feynman” executes on “Stephen Wolfram”, it outputs “Ms Wolfram.” This failure in part motivated the work in Ellis and Gulwani (2017), which addresses failure such as these via handcrafted features. In this work we take a step back and ask: what are the general principles of communication from which these patterns of inductive reasoning could emerge?
We will present a qualitatively different inductive bias, drawing insights from probabilistic recursive reasoning models of pragmatics Frank and Goodman (2012). Confronted with a set of programs all satisfying the specification, the synthesizer asks the question, “why would a pragmatic speaker use this particular specification to communicate that program?” Mathematically our model works as follows. First, we model a synthesizer without any inductive bias as a literal listener : . Second, we model a pragmatic speaker, which is a conditional distribution over specifications, : . This “speaker” generates a specification
in proportion to the probabilitywould recover the program given . Last, we obtain the pragmatic listener, : , which is the synthesizer with the desirable inductive bias. It is worth noting that the inductive biases present in are derived from first principles of communication and the synthesis task, rather than trained on actual data of end-user interactions.
Algorithmically, computing these probabilities is challenging because they are given as unnormalized proportionalities. Specifically, requires summing over the set of consistent programs given , and requires summing over the set of all possible specifications given . To this end, rather than tackling the difficult problem of searching for a correct program given a specification, a challenging research field in its own right Feser et al. (2015); Ellis et al. (2019); Nye et al. (2019); Polosukhin and Skidanov (2018); Zohar and Wolf (2018); Chen et al. (2018); Bunel et al. (2016); Balog et al. (2016); Kalyan et al. (2018), we work over a small enough domain such that the search problem can be efficiently solved with a simple version space algebra Lau et al. (2003). We develop an efficient inference algorithm to compute these probabilities exactly, and then build a functioning program synthesizer with these inference algorithms. In conducting a user study on Amazon Mechanical Turk, we find that naive end-users communicate more efficiently with a pragmatic program synthesizer compared to its literal variant. Concretely, this work makes the following contributions:
a systematic formulation of recursive pragmatics within program synthesis
an efficient implementation of an incremental pragmatic model via version space algebra
a user study demonstrating that end-users communicate their intended program more efficiently with pragmatic synthesizers
2 Program Synthesis as a Reference Game
We now formally connect program synthesis with pragmatic communication. We describe reference game, a class of cooperative 2-player games from the linguistic literature. We then cast program synthesis as an instance of a reference game played between a human speaker and a machine listener.
2.1 Program Synthesis
In program synthesis, one would like to obtain a program without explicitly coding for it. Instead, the user describes desirable properties of the program as a specification, which often takes in the form of a set of examples. Given these examples, the synthesizer would search for a program that satisfies these examples. In an interactive setting Cohn-Gordon et al. (2018), rather than giving these examples all at once, the user gives the examples in rounds, based on the synthesizer’s feedback each round.
2.2 Reference Game
In a reference game, a speaker-listener pair cooperatively communicate a concept using some atomic utterances . Given a concept , the speaker chooses a set of utterances to describe the concept. The communication is successful if the original concept is recovered by the listener, i.e. . The communication is efficient if is small. Therefore, it should be unsurprising that, given a reference game, a human speaker-listener pair would act pragmatically Grice (1975): The speaker is choosing didactic utterances that are most descriptive yet parsimonious to describe the concept, and the listener is aware that the speaker is being didactic while recovering the intended concept.
2.3 Program Synthesis as a Reference Game
It is easy to see why program synthesis is an instance of a reference game: The user would like to obtain a “concept” in the form of a “program”, the user does so by using “utterances” in the form of “examples”. See Figure 1. This formulation can explain in part the frustration of using a traditional synthesizer, or machine in general. Because while the user naturally assumes pragmatic communication, and selects the examples didacticly, the machine/synthesizer is not pragmatic, letting the carefully selected examples fall on deaf ears.
2.4 Reaching Consensus in Human-Machine Communication
Two strangers who speak different languages would not perform as well in a reference game as two close friends. Clearly, there needs to be a protocol shared between the speaker and the listener for effective communication to occur. Approaches such as Mao et al. (2016); Kazemzadeh et al. (2014) use a corpus of human annotated data so that the machine can imitate the protocols of human communication directly. Works such as Andreas and Klein (2016); Monroe et al. (2017) leverage both annotated data and pragmatic inference to achieve successful human-machine communication over natural language. This work shows that, in the context of program synthesis by examples, by building the concept of pragmatic communication into the synthesizer, the user can quickly adapt to communicate with the synthesizer effectively via human learning 111 which is far more powerful than machine learning
which is far more powerful than machine learning. This is advantageous because annotated user data is expensive to obtain. In this regard, our work is most similar to SHRDLURN Wang et al. (2016), where a pragmatic semantic parser was able to translate natural language utterances into a desirable program without being trained first on human annotated data.
3 Communicating Concepts with Pragmatics
We now describe how to operationalize pragmatics using a small, program-like reference game, where by-hand calculation is feasible. This exposition adapts formalism from Cohn-Gordon et al. (2018) for efficient implementation within program synthesizers.
Consider the following game. There are ten different concepts and eight atomic examples . Each concept is a contiguous line segment on a horizontal grid of 4 cells, and each atomic example indicates whether a particular cell is occupied by the segment. One can view this example as an instance of predicate synthesis, where the program takes in the form of a predicate function , and the atomic examples as input-output pairs obtained by applying the predicate function on some input: i.e. . We can visualise the game with a meaning matrix (Figure 2), where each entry denotes whether ( is consistent with ). Given a set of examples , we say if .
If a human speaker uses the set of examples , what is the most likely concept being communicated? We should expect it is , as and marks the end-points of the segment, despite the concepts are also consistent with
. We now demonstrate an incremental pragmatic model that can capture this behaviour with recursive Bayesian inference.
3.1 Communication with Incremental Pragmatics
The recursive pragmatic model derives a probabilistic speaker and listener pair given a meaning matrix, and the resulting communication protocol is shown to be both efficient and human usable Yuan et al. (2018). Clearly, there are other ways to derive a speaker-listener pair that are highly efficient, for instance, training a pair of agents in a RL setting Lewis et al. (2017). However, agents trained this way tends to deviate from how a human would communicate, essentially coming up with a highly efficient yet obfuscated communication protocol that is usable by the agents alone.
Literal Listener .
We start by building the literal listener from the meaning matrix. Upon receiving a set of examples , samples uniformly from the set of consistent concepts:
Applying to our example in Figure 2, we see that .
Incrementally Pragmatic Speaker .
We now build a pragmatic speaker recursively from . Here, rather than treating as an unordered set, we view it as an ordered sequence of examples, and models the speaker’s generation of incrementally, similar to autoregressive sequence generation in language modeling Sundermeyer et al. (2012). Let , then:
where the incremental probability is defined recursively with :
Applying this reasoning to our example in Figure 2, we see that is:
Informative Listener .
Finally, we construct an informative listener which recursively reasons about the informative speaker :
In our example, ,,,. As we can see, the intended concept is ranked first, in contrast to the uninformative listener .
4 Efficient Computation of Incremental Pragmatics for Synthesis
We now describe an efficient computation of incremental pragmatics tailored to program synthesis. Our approach is tractable when the meaning matrix can be tractably enumerated: i.e. the product space of hypotheses and atomic examples, , is not too large, and describe an algorithm that runs in worst polynomial of time.222When the meaning matrix is sparse, as is typical, it is faster State-of-the-art program synthesizers consider combinatorialy large hypothesis spaces, and while our algorithm cannot yet scale to this regime, we believe computational principles elucidated here could pave the way for pragmatic synthesizers over combinatorially large program spaces, particularly with when this combinatorial space is manipulated with version space algebras, as in Polozov and Gulwani (2015); Gulwani (2011); Lau et al. (2003). To this end, we employ version space algebra with aggressive precomputation to memoize the cost of pragmatic inference.
We start by redefining some terms of pragmatics into the language of program synthesis. Let be a program and be the set of programs. Let be the domain of the program and be the range of the program: . An example is a pair . A program is consistent with an example, , if .
We use a simple form of version space algebra Lau et al. (2003) to precompute and cache two kinds of mappings. First, we iterate over the rows of the meaning matrix and store, for each atomic example , the set of programs that are consistent with it: . Here is a map or a dictionary data structure, which can be thought of as an atomic listener, that returns a set of consistent programs for every atomic example. Second, we iterate over the columns of meaning matrix, and store, for each program , the set of atomic examples that are consistent with it . can be thought of as an atomic speaker, that returns a set of usable atomic examples for every program.
To compute , we first compute the set intersection , which corresponds to the set of programs consistent under . Note . Therefore, from Eq. 1 we derive if , and otherwise.
Computing amounts to computing a sequence of the incremental probability defined in Eq. 3. The brunt of computing lies in the normalisation constant, . We speed up this computation in two ways: First, we note that if , the probability would be . Thus, we can simplify this summation using the atomic speaker like so: , which vastly reduce the number of terms within the summation. Second, recall that computing amounts to computing the consistent set . We note that the only varying example inside the summation is , while all the previous examples remains constant. This allows caching the intermediate results of the set intersection to be re-used in computing where .
Again, the brunt of the computation lies in the normalisation constant of Eq 5. However, note that in case , . This would allow us to leverage the consistent set to vastly reduce this summation:
5 A Program Synthesis System with Pragmatics
To describe our program synthesis system with pragmatics, we only need to specify the space of programs, the space of atomic examples, and the meaning matrix; the rest will follow.333code : https://github.com/evanthebouncy/program_synthesis_pragmatics
We consider a simple domain of programs that can layout grid-like patterns like those studied in Ellis et al. (2018); Pu et al. (2018). Specifically, each program is a function that takes in a coordinate of a grid, and place a particular symbol at that location. Symbols can be one of three shapes: chicken, pig, pebble, and be one of three colors: red, green, blue, with the exception that pebble is always colorless. A DSL and some of the programs renderings are shown in Figure 3. Here, is the bounding box where the main pattern should be placed. is a function that takes two shapes and makes the outside shape wrap around the inside shape with a thickness of . is a function that takes in a shape and a color and outputs an appropriate symbol. We consider two programs and equivalent if they render to the same pattern over a grid. After such de-duplication, there are a total of programs in our space of programs.
The space of atomic examples consists of tuples of form , where is a grid coordinate, and is a symbol. As there are a total of distinct symbols and the grid is , there are a total of atomic examples in our domain.
An entry of the meaning matrix denotes whether a program, once rendered onto the grid, would be consistent with an atomic example. For instance, let the upper-left pattern in Figure 3 be rendered from program , then, it will be consistent with the atomic examples and , while be inconsistent with .
6 Human Studies
We conduct an user study to evaluate how well a naive end-user interacts with a pragmatic program synthesizer () versus a non-pragmatic one (). We hypothesized that to the extent that the pragmatic models capture computational principles of communication, humans should be able to communicate with them efficiently and intuitively, even if the form of communication is new to them.
Subjects (N = 55) were recruited on Amazon Mechanical Turk and paid $2.75 for 20 minutes. Subjects gave informed consent. Seven responses were omitted for failing to answer an instruction quiz. The remaining subjects (N=48) (26 M, 22 F), (Age = 40.9 +/- 12.1 (mean/SD)) were included. The study was approved by our institution’s Institutional Review Board.
Stimuli were 10 representative renderings of program sampled from the DSL, capturing different concepts such as stripes vs checkered colour patterns and solid vs hollow ring shapes.
The communication task.
The subjects were told they are communicating with two robots, either white () or blue (). The subjects were given a stimuli (a rendering), and were asked to make a robot recreate this pattern by providing the robots with few, strategically placed symbols on a scratch grid (set of examples). Each time the subject places a symbol, the robot guesses the most likely program given the examples, and display its guess as a rendering as feedback to the subject. The subject may proceed to the next task if the pattern is successfully recreated. See Figure 6.1.
First, the subjects read the instructions followed by a quiz. Subjects who failed the quiz twice proceeded with the experiment, but their responses were omitted. Next, the subjects practice with selecting and placing symbols. Subjects proceed with the communication task presented in two blocks, one with white robot and one with blue robot , in random order between subjects. Each block contains 10 trials of the 10 stimuli, also in random order. In the end of the experiment subjects fill a survey: which robot was easier, and free-form feedback about their communication strategies.
We first compared the mean number of symbols subjects used to communicate with each robot. A paired t-test was significant (
), with a mean difference of 2.8 moves, and a 95% confidence interval. The numbers of symbols used for both robots by subjects is shown in Figure 5 (a).
A linear regression model for the mean number ofused as a dependent variable, and , as independent variables, was significant (adjusted ), with significant coefficients for robot (), and trial (). The regression equation is given by: , where robot = , and trial is the order in which the stimulus was shown to subjects. This concludes that subjects’ communication with robots became more efficient over time. The interaction between the variables was small but not significant (), suggests that this communication improvement might have been driven by the pragmatic listener (blue robot) (Figure 5 (b)).
A significant majority of subjects (77%, ) reported that the blue(L1) robot was easier. This was true regardless of which robot they saw first (Figure 5 (c)).
Communication Efficiency Analysis.
Next, we compare communication efficiency between different speaker-listener pairs. We consider 3 speakers: S0 (a random speaker that uses any consistent examples, as a lower bound), S1 (the pragmatic speaker that L1 was expecting, as an upper bound), and human. We consider two listeners: L0 and L1. We first measure the probability of successful communication, , as a function of numbers of symbols used by sampling444instead of picking the top-1 program from the speaker and listener distributions (Figure 6 (a)). We find that both human and S1 communicate better with an informative listener L1 rather than L0. We then measure the mean number of symbols required for successful communication between a speaker-listener pair555taking the top-1 program from the listeners instead of sampling (Figure 6 (b)). A one-way ANOVA testing the effect of speaker-listener pair on number of symbols used was significant (), with significant multiple comparisons between means given by Tukey test for the following pairs: S0-L0 vs human-L0 (), S1-L0 vs human-L0 () and human-L0 vs human-L1 (). There were no significant differences between S1-L1 vs human-L1 () and between S1-L1 vs S1-L0 (). This means that human communication is significantly more efficient compared to the uninformative speaker (S0), and for the pragmatic listener, human efficiency is indistinguishable from the pragmatic speaker (S1). Further, compared to the pragmatic model S1, humans were significantly less efficient when communicating with the literal listener L0. This suggests that humans intuitively assume that a listener is pragmatic, and find communication difficult when this assumption is violated. This may have implications when engineering systems that do few-shot learning from human demonstration.
7 Looking Forward
In this work, we show that it is possible to obtain a pragmatic program synthesis system by building the principles of pragmatic communication into the synthesis algorithm rather than having it train on actual human interaction data. However, interaction data is still valuable, and we believe much benefit could be gained by building a system that can learn and adjust to a human communicator interactively. It is also interesting to see whether version space algebra approaches would scale to more complex program synthesis domains, and whether we can use a neural network to cheaply approximate the more computationally-intensivelistener. In general, we believe interactive learning systems are a prime target of future research: not only do we desire machines that learn from massive data, but also machine intelligence which can acquire knowledge from pedagogy and communication.
We hope that naive end-users would benefit from this research, as we aim for a more natural interaction between human and machine. This would democratize computation to allow boarder assess to computes by non-programmers, so that we may work along-side the machines rather than being replaced by them. We believe that one can also better assess the properties of a machine learning system (such as safety) through communication as well as through dissection of its architectures (looking at its neurons firing while showing it different stimuli). One potential risk is that it may become more complicated to prove and verify whether an AI system is working as intended in a complex communication setting, which can lead to errors.
Thanks Maxwell Nye for drawing the glasses and hat figure on the board and introducing me to the wonderful world of pragmatics. Thanks MH Tessler for explaining RSA to me in detail. Thanks Beilei Ren for making the clay figures used in the user study, and designing the page layout. Thanks twitch chat for support POG. Funded by the National Science Foundation under Grant No. 1918839.
-  (2016) Reasoning about pragmatics with neural listeners and speakers. arXiv preprint arXiv:1604.00562. Cited by: §2.4.
-  (2016) DeepCoder: learning to write programs. ICLR. Cited by: §1.
-  (2016) Learning to superoptimize programs. arXiv preprint arXiv:1611.01787. Cited by: §1.
-  (2018) Execution-guided neural program synthesis. Cited by: §1.
-  (2018) An incremental iterated response model of pragmatics. Proceedings of the Society for Computation in Linguistics. Cited by: §2.1, §3.
-  (2017) Learning to learn programs from examples: going beyond program structure. IJCAI. Cited by: §1.
-  (2019) Write, execute, assess: program synthesis with a repl. In Advances in Neural Information Processing Systems, pp. 9165–9174. Cited by: §1.
-  (2018) Learning to infer graphics programs from hand-drawn images. NIPS. Cited by: §5.
-  (2015) Synthesizing data structure transformations from input-output examples. In PLDI, Cited by: §1.
-  (2012) Predicting pragmatic reasoning in language games. Science 336 (6084), pp. 998–998. Cited by: §1.
-  (1975) Logic and conversation. In Speech acts, pp. 41–58. Cited by: §1, §2.2.
-  (2011) Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, Vol. 46, pp. 317–330. Cited by: §1, §4.
-  (2018) Neural-guided deductive search for real-time program synthesis from examples. ICLR. Cited by: §1.
Referitgame: referring to objects in photographs of natural scenes.
Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 787–798. Cited by: §2.4.
-  (2003) Programming by demonstration using version space algebra. Machine Learning 53 (1-2), pp. 111–156. Cited by: §1, §4.2, §4.
-  (2017) Deal or no deal? end-to-end learning for negotiation dialogues. arXiv preprint arXiv:1706.05125. Cited by: §3.1.
-  (2016) Generation and comprehension of unambiguous object descriptions. In , pp. 11–20. Cited by: §2.4.
-  (2017) Colors in context: a pragmatic neural model for grounded language understanding. Transactions of the Association for Computational Linguistics 5, pp. 325–338. Cited by: §2.4.
-  (2019) Learning to infer program sketches. ICML. Cited by: §1.
-  (2018) Neural program search: solving programming tasks from description and examples. arXiv preprint arXiv:1802.04335. Cited by: §1.
-  (2015) FlashMeta: a framework for inductive program synthesis. ACM SIGPLAN Notices 50 (10), pp. 107–126. Cited by: §1, §4.
-  (2018) Selecting representative examples for program synthesis. In International Conference on Machine Learning, pp. 4158–4167. Cited by: §5.
-  (2015) Predicting a correct program in programming by example. In CAV, pp. 398–414. Cited by: §1.
-  (2008) Program synthesis by sketching. Ph.D. Thesis. Cited by: §1, §1.
-  (2012) LSTM neural networks for language modeling. In Thirteenth annual conference of the international speech communication association, Cited by: §3.1.
-  (2016) Learning language games through interaction. arXiv preprint arXiv:1606.02447. Cited by: §2.4.
-  (2018) Understanding the rational speech act model.. In CogSci, Cited by: §3.1.
-  (2018) Automatic program synthesis of long programs with a learned garbage collector. In Advances in Neural Information Processing Systems, pp. 2094–2103. Cited by: §1.