Semantic expressive capacity with bounded memory

06/27/2019 ∙ by Antoine Venant, et al. ∙ Universität Saarland 0

We investigate the capacity of mechanisms for compositional semantic parsing to describe relations between sentences and semantic representations. We prove that in order to represent certain relations, mechanisms which are syntactically projective must be able to remember an unbounded number of locations in the semantic representations, where nonprojective mechanisms need not. This is the first result of this kind, and has consequences both for grammar-based and for neural systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Semantic parsers which translate a sentence into a semantic representation compositionally must recursively compute a partial semantic representation for each node of a syntax tree. These partial semantic representations usually contain placeholders at which arguments and modifiers are attached in later composition steps. Approaches to semantic parsing differ in whether they assume that the number of placeholders is bounded or not. Lambda calculus Montague (1974); Blackburn and Bos (2005) assumes that the number of placeholders (lambda-bound variables) can grow unboundedly with the length and complexity of the sentence. By contrast, many methods which are based on unification Copestake et al. (2001) or graph merging Courcelle and Engelfriet (2012); Chiang et al. (2013) assume a fixed set of placeholders, i.e. the number of placeholders is bounded.

Methods based on bounded placeholders are popular both in the design of hand-written grammars Bender et al. (2002) and in semantic parsing for graphs Peng et al. (2015); Groschwitz et al. (2018). However, it is not clear that all relations between language and semantic representations can be expressed with a bounded number of placeholders. The situation is particularly challenging when one insists that the compositional analysis is projective in the sense that each composition step must combine adjacent substrings of the input sentence. In this case, it may be impossible to combine a semantic predicate with a distant argument immediately, forcing the composition mechanism to use up a placeholder to remember the argument position. If many predicates have distant arguments, this may exceed the bounded “memory capacity” of the compositional mechanism.

In this paper, we show that there are relations between sentences and semantic representations which can be described by compositional mechanisms which are bounded and non-projective, but not by ones which are bounded and projective. To our knowledge, this is the first result on expressive capacity with respect to semantics – in contrast to the extensive literature on the expressive capacity of mechanisms which describe just the string languages.

More precisely, we prove that tree-adjoining grammars can describe string-graph relations using the HR graph algebra Courcelle and Engelfriet (2012) with two sources (bounded, non-projective) which cannot be described using linear monadic context-free tree grammars and the HR algebra with sources, for any fixed (bounded, projective). This result is especially surprising because TAG and linear monadic CFTGs describe the same string languages; thus the difference lies only in the projectivity of the syntactic analysis.

We further prove that given certain assumptions on the alignment between tokens in the sentence and edges in the graph, no generative device for projective syntax trees can simulate TAG with two sources. This has practical consequences for the design of transition-based semantic parsers (whether grammar-based or neural).

Plan of the paper. We will first explain the linguistic background in Section 2 and lay the formal foundations in Section 3. We will then prove the reduced semantic expressive capacity for aligned generative devices in Section 4 and for CFTGs in Section 5. We conclude with a discussion of the practical impact of our findings (Section 6).

2 Compositional semantic construction

The Principle of Compositionality, which is widely accepted in theoretical semantics, states that the meaning of a natural-language expression can be determined from the meanings of its immediate subexpressions and the way in which the subexpressions were combined. Implementations of this principle usually assume that there is some sort of syntax tree which describes the grammatical structure of a sentence. A semantic representation is then calculated by bottom-up evaluation of this syntax tree, starting with semantic representations of the individual words and then recursively computing a semantic representation for each node from those of its children.

2.1 Compositional mechanisms

Mechanisms for semantic composition will usually keep track of places at which semantic arguments are still missing or modifiers can still be attached. For instance, when combining the semantic representations for “John” and “sleeps” in a derivation of “John sleeps”, the “subject” argument of “sleeps” is filled with the meaning of “John”. The compositional mechanism therefore assigns a semantic representation to “sleeps” which has an unfilled placeholder for the subject.

The exact nature of the placeholder depends on the compositional mechanism. There are two major classes in the literature. Lambda-style compositional mechanisms use a list of placeholders. For instance, lambda calculus, as used e.g. in Montague Grammar Montague (1974), CCG Steedman (2001), or linear-logic-based approaches in LFG Dalrymple et al. (1995) might represent “sleeps” as . Placeholders are lambda-bound variables (here: ).

By contrast, unification-style compositional mechanisms use names for placeholders. For example, a simplified form of the Semantic Algebra used in HPSG Copestake et al. (2001) might represent “sleeps” as the feature structure . This is unified with . The placeholders are holes with labels from a fixed set of argument names (e.g. ). Named placeholders are also used in the HR algebra Courcelle and Engelfriet (2012) and its derivatives, like Hyperedge Replacement Grammars Drewes et al. (1997); Chiang et al. (2013) and the AM algebra Groschwitz et al. (2018).


Figure 1: (a) Nonprojective and (b) projective analysis.

2.2 Boundedness and projectivity

A fundamental difference between lambda-style and unification-style compositional mechanisms is in their “memory capacity”: the number of placeholders in a lambda-style mechanism can grow unboundedly with the length and complexity of the sentence (e.g. by functional composition of lambda terms), whereas in a unification-style mechanism, the placeholders are fixed in advance.

There is an informal intuition that unbounded memory is needed especially when an unbounded number of semantic predicates can be far away from their arguments in the sentence, and the syntax formalism does not allow these predicates to combine immediately with the arguments. For illustration, consider the two derivations of the following Swiss German sentence from Shieber1985 in Fig. 1:

(dass) (mer) d’ chind em Hans es huus lönd hälfed aastriiche
(that) (we) the-children-ACC Hans-DAT the-house-ACC let help paint
‘(that we) let the children help Hans paint the house’

The lexical semantic representation of each verb comes with a placeholder for its object () and, in the case of “lönd” and “hälfed”, also one for its verb complement (). The derivation in Fig. 1a immediately combines each verb with its complements; the placeholders that are used at each node never grow beyond the ones the verbs originally had. However, this derivation combines verbs with nouns which are not adjacent in the string, which is not allowed in many grammar formalisms. If we limit ourselves to combining only adjacent substrings (projectively, see Fig. 1b), we must remember the placeholders for all the verbs at the same time if we want to obtain the correct predicate-argument structure. Thus, the number of placeholders grows with the length of the sentence; this is only possible with a lambda-style compositional mechanism.

There is scattered evidence in the literature for this tension between bounded memory and projectivity. ChiangABHJK13 report (of a compositional mechanism based on the HR algebra, unification-style) that a bounded number of placeholders suffices to derive the graphs in the AMR version of the Geoquery corpus, but groschwitz18:_amr_depen_parsin_typed_seman_algeb find that this requires non-projective derivations in 37% of the AMRBank training data Banarescu et al. (2013). Approaches to semantic construction with tree-adjoining grammar either perform semantic composition along the TAG derivation tree using unification (non-projective, unification-style) Gardent and Kallmeyer (2003) or along the TAG derived tree using linear logic (projective, lambda-style) Frank and van Genabith (2001). bender08:_radic_non_config_shuff_operat discusses the challenges involved in modeling the predicate-argument structure of a language with very free word order (Wambaya) with projective syntax. While the Wambaya noun phrase does not seem to require the projective grammar to collect unbounded numbers of unfilled arguments as in Fig. 1b, Bender notes that her projective analysis still requires a more flexible handling of semantic arguments than the HPSG Semantic Algebra (unification-style) supports.

In this paper, we define a notion of semantic expressive capacity and prove the first formal results about the relationship between projectivity and bounded memory.

3 Formal background

Let be the nonnegative integers. A signature is a finite set of function symbols , each of which has been assigned a nonnegative integer called its rank. We write for the symbols of rank . Given a signature , we say that all constants are trees over ; further, if and are trees over , then is also a tree. We write for the set of all trees over . We define the height of a tree to be , and for .

Let , and let (with as a constant of rank 0). Then we call a tree a context if it contains exactly one occurrence of , and write for the set of all contexts. A context can be seen as a tree with exactly one hole. If , we write for the tree in that is obtained by replacing with .

Given a string , we write for the number of times that occurs in .


Figure 2: Semantic construction with TAG: (a) TAG derivation, (b) derivation tree, (c) derived tree, (d) semantic graph. (e) s-graph interpretations of the boxed node in (c); (f,g) s-graph interpretations at the boxed nodes in (b).

3.1 Grammars for strings and trees

We take a very general view on how semantic representations for strings are constructed compositionally. To this end, we define a notion of “grammar” which encompasses more devices for describing languages than just traditional grammars, such as transition-based parsers.

We say that a tree grammar over the signature is any finite device that defines a language . For instance, regular tree grammars Comon et al. (2007) are tree grammars, and context-free grammars can also be seen as tree grammars defining the language of parse trees.

We say that a string grammar over the signature and the alphabet is a pair consisting of a tree grammar over and a yield function which maps trees to strings over Weir (1988). A string grammar defines a language . We call the trees derivations.

A particularly common yield function is the function , defined as if and if has rank 0. This yield function simply concatenates the words at the leaves of . Applied to the phrase-structure tree in Fig. 2c, is the Swiss German sentence in (2.2). Context-free grammars can be characterized as string grammars that combine a regular tree grammar with . By contrast, we can model tree-adjoining grammars (TAG, Joshi and Schabes, 1997) by choosing a tree grammar that describes derivation trees as in Fig. 2b. The function could then substitute and adjoin the elementary trees as specified by the derivation tree (see Fig. 2a) and then read off the words from the resulting derived tree in Fig. 2c.

We say that a string grammar is projective if its yield function is . Context-free grammars as construed above are clearly projective. Tree-adjoining grammars are not projective: For instance, the yield of the subtree below “aastriiche” in Fig. 2b consists of the two separate strings “es Huus” and “aastriiche”, which are then wrapped around “lönd hälfed” further up in the derivation.

If the grammar is projective, then for any context there exist two strings and such that for any tree , .

3.2 Context-free tree languages

Below, we will talk about linear monadic context-free tree grammars (LM-CFTGs; rounds69:_contex, ComonDGJLTL07). An LM-CFTG is a quadruple , where is a ranked signature of nonterminals of rank at most one, is a ranked signature of terminals, is the start symbol, and is a finite set of production rules of one of the forms

  • with and

  • with and ,

where . The trees in are obtained by expanding with production rules. Nonterminals of rank zero are expanded by replacing them with trees. Nonterminals of rank one must have exactly one child in the tree; they are replaced by a context, and the variable in the context is replaced by the subtree below the child.

We can extend an LM-CFTG  to a string grammar . Then LM-CFTG is weakly equivalent to TAG Kepser and Rogers (2011); that is, LM-CFTG and TAG generate the same class of string languages. Intuitively, the weakly equivalent LM-CFTG directly describes the language of derived trees of the TAG grammar (cf. Fig. 2c). Notice that LM-CFTG is projective.

Below, we will make crucial use of the following pumping lemma for LM-CFTLs:

Lemma 1 (maibaum78:_pumpin_lemmas_term_languag).

Let be an LM-CFTG. There exists a constant such that for any with , there exists a decomposition with and such that for any , , where we let and

We call the pumping height of .

3.3 The HR algebra

The specific unification-style semantic algebra we use in this paper is the HR algebra Courcelle and Engelfriet (2012). This choice encompasses much of the recent literature on compositional semantic parsing with graphs, based e.g. on Hyperedge Replacement Grammars Chiang et al. (2013); Peng et al. (2015); Koller (2015) and the AM algebra Groschwitz et al. (2018).

The values of the HR algebra are s-graphs: directed, edge-labeled graphs, some of whose nodes may be designated as sources, written in angle brackets. S-graphs can be combined using the forget, rename, and merge operations. Rename changes an -source node into a -source node. Forget makes it so the -source node in the s-graph is no longer a source node. Merge combines two s-graphs while unifying nodes with the same source annotation. For instance, the s-graphs and are merged into .

The HR algebra uses operation symbols from a ranked signature to describe s-graphs syntactically. contains symbols for merge (rank 2) and the forget and rename operations (rank 1). It also contains constants (symbols of rank 0) which denote s-graphs of the form and , where are sources and is an edge label. Terms over this signature evaluate recursively to s-graphs , as usual in an algebra. Each instance of the HR algebra uses a fixed, finite set of source names which can be used in the constant s-graphs and the rename and forget operations. The class of graphs which can be expressed as values of terms over the algebra increases with . We write for the HR algebra with source names (and some set of edge labels).

Let be an s-graph, and let be a subgraph of , i.e. a subset of its edges. We call a node a boundary node of if it is incident both to an edge in and to an edge that is not in . For instance, the s-graph in Fig. 2e is a subgraph of the one in Fig. 2d; the boundary nodes are drawn shaded in (d). The following lemma holds:

Lemma 2.

Let be an s-graph, and let be a subgraph of such that the s-graph contains the same edges as . Then every boundary node in is a source in .

3.4 Grammars with semantic interpretations

Finally, we extend string grammars to compositionally relate strings with semantic representations. Let be a string grammar. The tree grammar generates a language of trees. We will map each tree into a term over some algebra over a signature using a linear tree homomorphism (LTH) Comon et al. (2007), i.e. by compositional bottom-up evaluation. This defines a relation between strings and values of :

For instance, could be some HR algebra ; then will be a binary relation between strings and s-graphs. In this case, we abbreviate as .

If we look at an entire class of string grammars and a fixed algebra, this defines a class of such relations:

In the example in Fig. 2, we can define a linear homomorphism to map the derivation tree in (b) to a term which evaluates to the s-graph shown in (d). At the top of this term, the s-graphs at the “chind” and “hälfed” (f,g) nodes are combined into (d) by :

This non-projective derivation produces the s-graph in (d) using only two sources, and . By contrast, a homomorphic interpretation of the projective tree (c) has to use at least four sources, as the intermediate result in (e) illustrates.

4 Projective cross-serial dependencies

We will now investigate the ability of projective grammar formalisms to express . We will define a relation and prove that cannot be generated by projective grammar formalisms with bounded . We show this first for arbitrary projective , under certain assumptions on the alignment of words and graph edges. In Section 5, we drop these assumptions, but focus on .

4.1 The relation

To construct , consider the string language , where

and analogously for . An example string in is Note that can be chosen independently for each segment.

Every string can be uniquely described by , , and a sequence of numbers specifying the ’s used in each segment, where each contain numbers and contain numbers. In the example, we have , , and .

Figure 3: The graph for ((2), (1, 0), (1), (0, 0)); blocks indicated by gray boxes.

We associate a graph with each string by the construction illustrated in Fig. 3. For each , we define the -th -block to be the graph consisting of nodes with a further outgoing -edge from . In addition, connects to a linear chain of edges with label , and to a linear chain of -edges. consists of a linear chain of the -blocks, followed by the -blocks (defined analogously). We let .

Note that is a more intricate version of the cross-serial dependency language. can be generated by a TAG grammar along the lines of the one from Section 3.4, using a HR algebra with two sources; thus .

4.2  with bounded blocks

The characteristic feature of  is that edges which are close together in the graph (e.g. the and edge in an -block) correspond to symbols that can be distant in the string (e.g.  and tokens). Projective grammars cannot combine predicates () and arguments () directly because of their distance in the string; intuitively, they must keep track of either the ’s or the ’s for a long time, which cannot be done with a bounded .

Figure 4: An derivation of ((0), (0,0), (0), (0,0)).

Before we go into exploiting this intuition, we first note that its correctness depends on the details of the construction of , in particular the ability to select arbitrary and independent for the different . Consider the derivation on the left of Fig. 4 with its projective yield ; this is the case of , corresponding to the  graph shown in Fig. 4 (a). We can map to this graph by applying the following linear tree homomorphism into :

A derivation of the form evaluates to the same graph as ; the graph value of is ignored. Thus if we assume that the subtree of for evaluates to some arbitrary graph , the complete derivation evaluates to . Some intermediate results are shown on the right of Fig. 4.

If we let be the subset of where all are zero, we can generalize this construction into an LM-CFTG which generates . Thus, can be generated by a projective grammar that is interpreted into . But note that the derivation in Fig. 4 is unnatural in that the symbols in the string are not generated by the same derivation steps that generate the graph nodes that intuitively correspond to them; for instance, the graphs generated for the tokens are completely irrelevant. Below, we prevent unnatural constructions like this in two ways. We will first assume that string symbols and graph nodes must be aligned (Thm. 1). Then we will assume that the can be arbitary, which allows us to drop the alignment assumption (Thm. 2).

4.3 -distant trees

Let be some relation containing at least the string-graph pairs of , e.g.  itself. Assume that  is generated by a projective grammar with and a fixed number of sources, i.e. we have . We will prove a contradiction.

Given a pair , we say that two edges in are equivalent, , if they belong to the same block. We call a derivation tree -distant if has a subtree such that we can find edges with for all and further edges such that for all . For such trees, we have the following lemma.

Lemma 3.

A -distant tree has a subtree such that has at least sources.

Proof 1.

Let be the -th block in ; we let and do not distinguish between - and -blocks. Let be the subtree of claimed by the definition of distant trees. For each , let be the edges in the -th block generated by , and let .

By definition, and are both non-empty for at least blocks. Each of these blocks is weakly connected, and thus contains at least one node which is incident both to an edge in and in . This node is a boundary node of . Because are all distinct, it follows from Lemma 2 that has at least sources.

We also note the following lemma about derivations of projective string grammars, which follows from the inability of projective grammars to combine distant tokens. We write .

Lemma 4.

Let be a projective string grammar. For any there exists such that any with has a subtree such that contains occurrences of and no occurrences of , for some .

4.4 Projectivity and alignments

A consequence of Lemma 3 is that if certain string-graph pairs in  can only be expressed with -distant trees, then (which contains these pairs as well) is not in , because only admits sources.

However, as we saw in Section 4.2, pairs in can have unexpected projective derivations which make do with a low number of sources. So let’s assume for now that the string grammar and the tree homomorphism produce tokens and edge labels that fit together. Let us call aligned if for all constants , is a graph containing a single edge with label . The derivation in Fig. 4 cannot be generated by an aligned grammar because the graph for the token contains a -edge. We write for the class of string-semantics relations which can be generated with aligned grammars.

Under this assumption, it is easy to see that any relation including  (hence, ) cannot be expressed with a projective grammar.

Theorem 1.

Let be any class of projective string grammars and . For any , .

Proof 2.

Assume that there is a and an LTH such that . Given , choose such that every tree with has a subtree such that contains occurrences of and no occurrences of , for some . Such an exists according to Lemma 4. We can choose such that .

Because are aligned, contains no -edge and at least -edges. Each of these -edges is non-equivalent to all the others, and equivalent to a -edge in , so is -distant. It follows from Lemma 3 that has sources, in contradiction to the assumption that uses only sources.

5 Expressive capacity of LM-CFTG

Thm. 1 is a powerful result which shows that  cannot be generated by any device for generating projective derivations using bounded placeholder memory – if we can assume that tokens and edges are aligned. We will now drop this assumption and prove that  cannot be generated using a fixed set of placeholders using LM-CFTG, regardless of alignment. The basic proof idea is to enforce a weak form of alignment through the interaction of the pumping lemma with very long -chains. The result is remarkable in that LM-CFTG and TAG are weakly equivalent; they only differ in whether they must derive the strings projectively or not.

Theorem 2.

, for any .

5.1 Asynchronous derivations

Assume that , for some , with an LM-CFTG. Proving that this is a contradiction hinges on a somewhat technical concept of asynchronous derivations, which have to do with how the nodes generating edge labels such as are distributed over a derivation tree. We prove that all asynchronous derivations of certain elements of  are distant (Lemma 5), and that all LM-CFTG derivations of  are asynchronous (Lemma 6), which proves Thm. 2.

In what follows, Let . let us write for any tree or context and symbol , as a shorthand for , for the number of -edges in and for the maximum length of a string in which is also substring of .

Definition 1 (-asynchronous derivation).

Let , , , We call an -asynchronous derivation iff there is a decomposition such that

We call the pair an -asynchronous split of .

Lemma 5.

For any , there is a pair such that every -asynchronous with is -distant.

Proof 3.

For and , let denote the word . Let and be the unique element of such that

Let be an -asynchronous derivation of ; other choices of are analogous. By definition, we can split such that has at most -edges and at least -edges. Notice first that contains at most different complete -blocks of , because each -block contains -edges. Having of them would require -edges, which is more than the -edges that can contain.

Next, consider distinct -blocks of . These blocks contain a total of -edges. Hence, the -edges of cannot be contained within only distinct blocks.

So we can find at least -edges in which are pairwise non-equivalent. There are at least edges among these which are equivalent to an edge in , because contains at most complete -blocks of . Thus, is -distant.

5.2 LM-CFTG derivations are asynchronous

So far, we have not used the assumption that is an LM-CFTL. We will now exploit the pumping lemma to show that all derivation trees of an LM-CFTG for  must be asynchronous.

Lemma 6.

If is an LM-CFTL, then there exists such that for every , there exists such that is -asynchronous.

We prove this lemma by appealing to a class of derivation trees in which predicate and argument tokens are generated in separate parts.

Definition 2 (-separated derivation).

Let . A tree is -separated if we can write such that and and . The triple is called an -separation of . We call an -separation minimal if there is no other -separation of with a smaller .

Intuitively, we can use the pumping lemma to systematically remove some contexts from a . From the shape of , we can conclude certain alignments between the strings and graphs generated by these contexts and establish bounds on the number of - and -edges generated by the lower part of a separated derivation. The full proof is in the appendix; we sketch the main ideas here.

Let denote the pumping height of . There is a maximal number of string tokens and edges that a context of height at most can generate under a given yield and homomorphism. We call this number in the rest of the proof.

Lemma 7.

For , let be the length of the maximal substring of consisting in only -tokens and containing the rightmost occurrence of in . If is -separated, there exists a minimal -separation of such that, letting , .

Moreover, for any -separation , letting , .

Proof 4 (sketch).

Both statements must be achieved in separated inductions on the height of , although they mostly follow similar steps. We therefore focus here only on the crucial parts of the (slightly trickier) bound on . Let be a minimal -separation of and .

Base Case If , we have . We also have , so .

Induction step If , we apply Lemma 1 to to yield a decomposition , where , and . We first observe that is -separated. By induction, there exists a minimal separation with validating the bound on . Because of pumping considerations, we need to distinguish only three configurations of and . We present only the most difficult case here.

In this case and generate only one kind of bar symbol, , and brackets. One needs to examine all possible ways , and may overlap. We detail the reasoning in the case where does not overlap with or . Then, since all -tokens are generated by , projectivity of the yield and the definition of impose that the generated -tokens contribute to the rightmost -chain i.e. . Hence .

Lemma 8.

For any , if is -separated then is -asynchronous.

Proof 5.

By Lemma 7 there is a minimal -separation such that, for , the bound on and the bound on both obtain. Observe that by definition, and since generates at most -tokens, by projectivity it generates at most -tokens (one sequence of between each occurrence of and the next, plus possibly one before the first and one after the last). Thus is -asynchronous.

Lemma 9.

For any , is -separated for some .

Proof 6 (sketch).

The proof proceeds by induction on the height of .

If , then for any , hence is trivially -separated for some .

If , Lemma 1 yields a decomposition , where , and . By induction is