Lexical and Derivational Meaning in Vector-Based Models of Relativisation

11/30/2017 ∙ by Michael Moortgat, et al. ∙ Queen Mary University of London Utrecht University 0

Sadrzadeh et al (2013) present a compositional distributional analysis of relative clauses in English in terms of the Frobenius algebraic structure of finite dimensional vector spaces. The analysis relies on distinct type assignments and lexical recipes for subject vs object relativisation. The situation for Dutch is different: because of the verb final nature of Dutch, relative clauses are ambiguous between a subject vs object relativisation reading. Using an extended version of Lambek calculus, we present a compositional distributional framework that accounts for this derivational ambiguity, and that allows us to give a single meaning recipe for the relative pronoun reconciling the Frobenius semantics with the demands of Dutch derivational syntax.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

0.1 Introduction

Compositionality, as a structure-preserving mapping from a syntactic source to a target interpretation, is a fundamental design principle both for the set-theoretic models of formal semantics and for syntax-sensitive vector-based accounts of natural language meaning, see [1] for discussion. For typelogical grammar formalisms, to obtain a compositional interpretation, we have to specify how the Syn-Sem homomorphism acts on types (basic and complex) and on proofs (derivations, again basic (axioms) or compound, obtained by inference steps). There is a tension here between lexical and derivational aspects of meaning: the derivational aspects relate to the composition operations associated with the inference steps that put together phrases out of more elementary parts; the atoms for this composition process are the meanings of the lexical constants associated with the axioms of a derivation.

Relative clause structures form a suitable testbed to study the interaction between these two aspects of meaning, and they have been well-studied in the formal and in the distributional settings. Informally, a restrictive relative clause (‘books that Alice read’) has an intersective interpretion. In the formal semantics account, this interpretation is obtained by modeling both the head noun (‘books’) and the relative clause body (‘Alice read ␣’) as (characteristic functions of) sets (type

); the relative pronoun can then be interpreted as the intersection operation. In distributional accounts such as [2], full noun phrases and simple common nouns are interpreted in the same semantic space, say N, distinct from the sentence space S. In this setting, element-wise multiplication, which preserves non-null context features, is a natural candidate for an intersective interpretation; in the case at hand this means element-wise multiplication of a vector in N interpreting the head noun, with a vector interpretation obtained from the relative clause body. To achieve this effect, [9] rely on the Frobenius algebraic structure of FVect, which provides operations for (un)copying, insertion and deletion of vector information. A key feature of their account is that it relies on structure-specific solutions of the lexical equation: subject and object relative clauses are obtained from distinct type assignments to the relative pronoun (Lambek types vs ), associated with distinct instructions for meaning assembly.

For a language like Dutch, such an account is problematic. Dutch subordinate clause order has the SOV pattern Subj–Obj–TV, i.e. a transitive verb is typed as , selecting its arguments uniformly to the left. As a result, example (1)(a) is ambiguous between a subject vs object relativisation interpretation: it can be translated as either (b) or (c). The challenge here is twofold: at the syntactic level, we have to provide a single type assignment to the relative pronoun that can withdraw either a subject or an object hypothesis from the relative clause body; at the semantic level, we need a uniform meaning recipe for the relative pronoun that will properly interact with the derivational semantics.

mannen die vrouwen haten (ambiguous)
men who hate women (subject rel)
men who(m) women hate (object rel)
(1)

The paper is structured as follows. In §0.2, we present an extended version of Lambek calculus, and show how it accounts for the derivational ambiguity of Dutch relative clauses. In §0.3.1, we define the interpretation homomorphism that associates syntactic derivations with composition operations in a vector-based semantic model. The derivational semantics thus obtained is formulated at the type level, i.e. it abstracts from the contribution of individual lexical items. In §0.3.2, we bring in the lexical semantics, and show how the Dutch relative pronoun can be given a uniform interpretation that properly interacts with the derivational semantics. The discussion in §0.4 compares the distributional and formal semantics accounts of relativisation.

Figure 1: NL. Residuation rules; extraction postulates.

0.2 Syntax

Our syntactic engine is NL [6]: the extension of Lambek’s [3] Syntactic Calculus with an adjoint pair of control modalities . The modalities play a role similar to that of the exponentials of linear logic: they allow one to introduce controlled, rather than global, forms of reordering and restructuring. In this paper, we consider the controlled associativity and commutativity postulates of [7]. One pair, , allows a -marked formula to reposition itself on left branches of a constituent tree; we use it to model the SOV extraction patterns in Dutch. A symmetric pair would capture the non-local extraction dependencies in an SVO language such as English. Lambek [4] has shown how deductions in a syntactic calculus can be viewed as arrows in a category. Figure 1 presents NL in this format.

For parsing, we want a proof search procedure that doesn’t rely on cut. Consider the rules in Figure 2, expressing the monotonicity properties of the type-forming operations, and recasting the postulates in rule form. It is routine to show that these are derived rules of inference of . In [8] it is shown that by adding them to the residuation rules of Figure 1, one obtains a system equivalent to a display sequent calculus enjoying cut-elimination. By further restricting to focused derivations, proof search is free of spurious ambiguity.

Figure 2: NL. Monotonicity; leftward extraction (rule version).

We are ready to return to our example (1)(a). A type assignment to the relative pronoun ‘die’ accounts for the derivational ambiguity of the phrase. The derivations agree on the initial steps

(2)

but then diverge in how the relative clause body is derived:

(3)

In the derivation on the left, the hypothesis is linked to the subject argument of the verb; in the derivation on the right to the object argument, reached via the reordering step.

0.3 From source to target

0.3.1 Derivational semantics

Compositional distributional models are obtained by defining a homomorphism sending types and derivations of a syntactic source system to their counterparts in a symmetric compact closed category (sCCC); the concrete model for this sCCC then being finite dimensional vector spaces (FVect) and (multi)linear maps. Such interpretation homomorphisms have been defined for pregroup grammars, Lambek calculus and CCG in [2, 5]. We here define the interpretation for , starting out from [10].

Recall first that a compact closed category (CCC) is monoidal, i.e. it has an associative with unit ; and for every object there is a left and a right adjoint satisfying

In a symmetric

CCC, the tensor moreover is commutative, and we can write

for the collapsed left and right adjoints.

In the concrete instance of FVect, the unit stands for the field ; identity maps, composition and tensor product are defined as usual. Since bases of vector spaces are fixed in concrete models, there is only one natural way of defining a basis for a dual space, so that . In concrete models we may collapse the adjoints completely.

The map takes inner products, whereas the map (with ) introduces an identity tensor as follows:

given by
given by

Interpretation: types

At the type level, the interpretation function assigns a vector space to the atomic types of ; for complex types we set , i.e. the syntactic control operators are transparent for the interpretation; the binary type-forming operators are interpreted as

Interpretation: proofs

From the linear maps interpreting the premises of the inference rules, we want to compute the linear map interpreting the conclusion. Identity and composition are immediate: , . For the residuation inferences, from the map interpreting the premise, we obtain

For the inverses, from maps , for the premises, we obtain

Monotonicity. The case of parallel composition is immediate: . For the slash cases, from and , we obtain

Interpretation for the extraction structural rules is obtained via the standard associativity and symmetry maps of FVect: and and similarly for the rightward extraction rules.

Simplifying the interpretation

Whereas the syntactic derivations of proceed in cut-free fashion, the interpretation of the inference rules given above introduces detours (sequential composition of maps) that can be removed. We use a generalised notion of Kronecker delta, together with Einstein summation notation, to concisely express the fact that the interpretation of a derivation is fully determined by the identity maps that interpret its axiom leaves, realised as the or identity matrices depending on their (co)domain signature.

Recall that vectors and linear maps over the real numbers can be equivalently expressed as (multi-dimensional) arrays of numbers. The essential information one needs to keep track of are the coefficients of the tensor: for a vector we write (with ranging from to ), an matrix A is expressed as , an cube B as , with the indices each time ranging over the dimensions. The Einstein summation convention on indices then states that in an expression involving multiple tensors, indices occurring once give rise to a tensor product, whereas indices occurring twice are contracted. Without explicitly writing a tensor product , the tensor product of a vector and a matrix thus can be written as ; the inner product between vectors is . Matrix application is rendered as , i.e. the contraction happens over the second dimension of and . For tensors of arbitrary rank we use uppercase to refer to lists of indices: we write a tensor as . Tensor application then becomes , for some tensor of lower rank.

The identity matrix is given by the Kronecker delta (left), the identity tensor by its generalisation (right):

The attractive property of the (generalised) Kronecker delta is that it expresses unification of indices: , which is simply a renaming of the index; the inner product can be computed by . Left on its own, it is simply an identity matrix/tensor.

With the Kronecker delta, the composition of matrices is expressible as , which is the same as (or ). We can show that order of composition is irrelevant:

The special cases of tensor product of generalised Kronecker deltas is given by concatenating the index lists:

expressing the fact that .

Since the generalised Kronecker delta is able to do renaming, take inner product, and insert an identity tensor, depending on the number of arguments placed behind it, it will represent precisely the maps discussed above. In this respect, the interpretation can be simplified and we can label the proof system (with formulas already interpreted) with these generalised Kronecker deltas. The effect of the residuation rules and the structural rules is to only change the (co)domain signature of a Kronecker delta, whereas the rules for axioms and monotonicity also act on the Kronecker delta itself:

In Appendix .6 we show that this labelling is correct for the general interpretation of proofs in §0.3.1.

0.3.2 Lexical semantics

For the general interpretation of types and proofs given above, a proof is interpreted as a linear map sending an element belonging to , the semantic space interpreting , to an element of . The map is expressed at the general level of types, and completely abstracts from lexical semantics. For the computation of concrete interpretations, we have to bring in the meaning of the lexical items. For , this means applying the map to , the tensor product of the word meanings making up the phrase under consideration, to obtain a meaning , the semantic space interpreting the goal formula.

With the index notation introduced above, is expressed in the form of a generalised Kronecker delta, which is applied to the tensor product of the word meanings in index notation to produce the final meaning in . In (4) we illustrate with the interpretation of some proofs derived from the same axiom leaves, and . Assuming and , these correspond to identity maps on N and S. We use the convention that the formula components of the endsequent are labelled in alphabetic order; the correct indexing for the Kronecker delta is obtained by working back to the axiom leaves.

(4)

(4)(a) expresses the linear map from to a tensor . Because we have , this is in fact the identity map. (4)(b) computes a vector with . In (4)(c) we arrive at an interpretation with . Note that we wrote the tensor product symbol explicitly.

In the case of our relative clause example (1), the derivational ambiguity of (3) gives rise to two ways of obtaining a vector . They differ in whether , the index of the hypothesis in the relative pronoun type, contracts with index for the subject argument of the verb (5) or with the direct object index (6).

(5)
(6)

The picture in Figure 3 expresses this graphically.

N

N

N

N

S

N

N

N

S

Figure 3: Matching diagrams for Dutch derivational ambiguity. Object relative (top), mannen die vrouwen haten versus subject relative (bottom) mannen die vrouwen haten.

Open class items vs function words

For open class lexical items, concrete meanings are obtained distributionally. For function words, the relative pronoun in this case, it makes more sense to assign them an interpretation independent of distributions. To capture the intersective interpretation of restrictive relative clauses, Sadrzadeh et al [9] propose to interpret the relative pronoun with a map that extracts a vector in the noun space from the relative clause body, and then combines this by elementwise multiplication with the vector for the head noun. Their account depends on the identification : noun phrases and simple common nouns are interpreted in the same space; it expresses the desired meaning recipe for the relative pronoun with the aid of (some of) the Frobenius operations that are available in a compact closed category:

(7)

In the case of FVect, takes a vector and places its values on the diagonal of a square matrix, whereas extracts the diagonal from a square matrix. The and maps respectively sum the coefficients of a vector or introduce a vector with the value for all of its coefficients.

given by
given by
given by
given by

The analysis of [9] uses a pregroup syntax and addresses relative clauses in English. It relies on distinct pronoun types for subject and object relativisation. In the subject relativisation case, the pronoun lives in the space , corresponding to , the pregroup translation of a Lambek type ; for object relativisation, the pronoun lives in , corresponding to , the pregroup translation of .

For the case of Dutch, the homomorphism of §0.3.1 sends the relative pronoun type to the space . This means we can import the pronoun interpretation for that space from [9], which now will produce both the subject and object relativisation interpretations through its interaction with the derivational semantics.

(8)

Intuitively, the recipe (8) says that the pronoun consists of a cube (in ) which has on its diagonal and elsewhere, together with a vector in the sentence space S with all its entries . Substituting this lexical recipe in the tensor contraction equations of (5) and (6) yields the desired final semantic values (9) and (10) for subject and object relativisation respectively. We write for elementwise multiplication; the summation over the S dimension reduces the rank-3 interpretation of the verb to a rank-2 matrix in , with rows for the verb’s object, columns for the subject. This matrix is applied to the vector either forward in (10), where ‘vrouwen’ plays the subject role, or backward in (9) before being elementwise multiplied with the vector for mannen.

(9)
(10)

Returning to English, notice that the pregroup type assignment for object relativisation in [9] is restricted to cases where the ‘gap’ in the relative clause body occupies the final position. To cover these non-subject relativisation patterns in general, also with respect to positions internal to the relative clause body, we would use an type for the pronoun, together with the rightward extraction postulates of Figure 1. For English subject relativisation, the simple pronoun type will do, as this pattern doesn’t require any structural reasoning.

0.4 Discussion

We briefly compare the distributional and the formal semantics accounts, highlighting their similarities. In the formal semantics account, the interpretation homomorphism sends syntactic types to their semantic counterparts. Syntactic types are built from atoms, for example , , for sentences, noun phrases and common nouns; assuming semantic atoms , and function types built from them, one can set , , , and . Each semantic type is assigned an interpretation domain , with , for some non-empty set (the discussion domain), (truth values), and funtions from to .

In this setup, a syntactic derivation is interpreted by means of a linear lambda term of type , with parameters of type — linearity resulting from the fact that the syntactic source doesn’t provide the copying/deletion operations associated with the structural rules of Contraction and Weakening.

As in the distributional model discussed here, the proof term is an instruction for meaning assembly that abstracts from lexical semantics. In (11) below, one finds the proof terms for English subject (a) and object (b) relativisation. The parameter stands for the head noun, for the verb, and for its object and subject arguments; parameter for the relative pronoun has type .

(11)

To obtain the interpretation of ‘men who hate women’ vs ‘men who(m) women hate’, one substitutes lexical meanings for the parameters of the proof terms. In the case of the open class items ‘men’, ‘hate’, ‘women’, these will be non-logical constants with an interpretation depending on the model. For the relative pronoun, we substitute an interpretation independent of the model, expressed in terms of the logical constant , leading to the final interpretations of (13), after normalisation.

(12)
(13)

Notice that the lexical meaning recipe for the relative pronoun goes beyond linearity: to express the set intersection interpretation, the bound variable is copied over the conjuncts of . By encapsulating this copying operation in the lexical semantics, one avoids compromising the derivational semantics. In this respect, the formal semantics account makes the same design choice regarding the division of labour between derivational and lexical semantics as the distributional account, where the extra expressivity of the Frobenius operations is called upon for specifying the lexical meaning recipe for the relative pronoun.

0.5 Acknowledgments

We thank Giuseppe Greco for comments on an earlier version. The second author would also like to thank Mehrnoosh Sadrzadeh for the many discussions on compositional distributional semantics and Frobenius operations, and Rob Klabbers for his interesting remarks on index notation. The second author gratefully acknowledges support by a Queen Mary Principal’s Research Studentship, the first author the support of the Netherlands Organisation for Scientific Research (NWO, Project 360-89-070, A composition calculus for vector-based semantic modelling with a localization for Dutch).

References

  • [1] Marco Baroni, Raffaela Bernardi, and Roberto Zamparelli. Frege in Space: a Program for Compositional Distributional Semantics. Linguistic Issues in Language Technology, 9:241–346, 2014.
  • [2] Bob Coecke, Edward Grefenstette, and Mehrnoosh Sadrzadeh. Lambek vs. Lambek: Functorial Vector Space Semantics and String Diagrams for Lambek Calculus. Annals of Pure and Applied Logic, 164(11):1079–1100, 2013.
  • [3] Joachim Lambek. On the Calculus of Syntactic Types. In Roman Jakobson, editor, Structure of Language and its Mathematical Aspects, volume XII of Proceedings of Symposia in Applied Mathematics, pages 166–178. American Mathematical Society, 1961.
  • [4] Joachim Lambek. Categorial and Categorical Grammars. In Richard T. Oehrle, Emmon Bach, and Deirdre Wheeler, editors, Categorial Grammars and Natural Language Structures, volume 32 of Studies in Linguistics and Philosophy, pages 297–317. Reidel, 1988.
  • [5] Jean Maillard, Stephen Clark, and Edward Grefenstette. A Type-Driven Tensor-Based Semantics for CCG. In Proceedings of the Type Theory and Natural Language Semantics Workshop, pages 46–54. EACL, 2014.
  • [6] Michael Moortgat. Multimodal Linguistic Inference. Journal of Logic, Language and Information, 5(3-4):349–385, 1996.
  • [7] Michael Moortgat. Constants of Grammatical Reasoning. In G Bouma, E Hinrichs, G.-J. Kruijff, and R.T. Oehrle, editors, Constraints and Resources in Natural Language Syntax and Semantics, pages 195–219. CSLI, 1999.
  • [8] Michael Moortgat and Richard Moot. Proof Nets for the Lambek-Grishin Calculus. In Chris Heunen, Mehrnoosh Sadrzadeh, and Edward Grefenstette, editors, Quantum Physics and Linguistics: A Compositional, Diagrammatic Discourse, pages 283–320. Oxford University Press, 2013.
  • [9] Mehrnoosh Sadrzadeh, Stephen Clark, and Bob Coecke. The Frobenius Anatomy of Word Meanings I: Subject and Object Relative Pronouns. Journal of Logic and Computation, pages 1293–1317, 2013.
  • [10] Gijs Wijnholds. Categorical Foundations for Extended Compositional Distributional Models of Meaning. MSc. Thesis, Universiteit van Amsterdam, 2014.

.6 Simplifying the Interpretation

The simplification of section 0.3.1 uses generalised Kronecker deltas to interpret the proof terms of the proof system, leading to a relabelling of the proof system with formulas interpreted. The rules that change the generalised Kronecker delta are shown in section 0.3.1, the full system is shown in Figure 4. In this appendix we show that the simplification holds.

Figure 4: NL. Rules annotated with their generalised Kronecker deltas.

Analogous to , we write for the generalised Kronecker delta associated with proof term . We define the expressions of a compact closed category for generalised Kronecker deltas. Then we show that for any proof we have that . Proving this is done by induction over the size of proofs. The crucial point is that composition of two generalised Kronecker deltas is determined by their domain and codomain.

The CCC structure of generalised Kronecker deltas

To give a generalised Kronecker delta an interpretation as a map, we need to give its domain and codomain. We will write for a generalised Kronecker delta, to indicate that is the domain, the codomain, and moreover that concrete tensors in will be labelled with list , and output tensors will be labelled with . Writing for list concatenation and for any permutation of list , we then assume that . We can now go on and define the maps of a compact closed category in the generalised Kronecker delta form.

Note the way generalised Kronecker deltas are rewritten: a generalised Kronecker delta has pairs of indices on top and bottom that are linked. Whenever has an index occuring twice, a rewrite is done: let be two pairs of indices, on top and on the bottom, that have an index in common, say . Then we remove from the pair , and replace the pair by . This lowers the rank of the generalised Kronecker delta with 2, which is in line with the idea of the tensor contraction that is to be performed by the common index. This generalises to lists of indices: writing and for lists of indices such that pairs come from and pairs come from , if we have that we can immediately remove and replace by . The whole rewriting continues until there are only unique indices left. Write (or ) for the generalised Kronecker delta obtained by rewrites from the original . Then, for two generalised Kronecker deltas, we get the relation

In particular this means for and with no indices in common that .

Another special case is when we have where occurs in , but does not. Then we have that we remove and and, for each index in and , we substitute the corresponding index in :

When and have no elements in common, then the right hand side is already fully rewritten since is unique to and , allowing us to drop the asterisk. We use these properties in the below definition for tensor product and composition of generalised Kronecker deltas, and in the proof in the next paragraph.

Definition 1.

The maps in FVect are defined in terms of generalised Kronecker deltas according the the list below:

  1. For any vector space of rank , the identity map is given by the generalised Kronecker delta

    On an element , represented in index notation by , we get simply a renaming because

  2. For any vector space of rank , the map is given by the generalised Kronecker delta

    For two elements and represented by and we get the inner product between and :

  3. For any vector space of rank , the map is given by

    It is given no elements to juxtapose with and thus simply gives the identity matrix on .

  4. Composition. Given two maps (left) and their generalised Kronecker delta representation (right)

    their composition is represented by

    We give the expression for the composition, exactly to identify the indices in the codomain of () with the indices in the domain of (). Since and have no indices in common (this may be assumed without loss of generality), we have , but since occurs in and occurs in , we will have a sequence of rewrites to do and so we get .

  5. Tensor product. Given two maps (left) and their generalised Kronecker delta representation (right)

    their tensor product is represented by

    Without loss of generality we may assume that and have no indices in common (if they had, we could rename them). Since and , we also have that . And since have no indices in common, we have that juxtaposing them gives .

  6. Associativity. Since the tensor product is associative on vectors, the associativity maps disappear in index notation. For vector spaces of rank respectively, the associativity map is represented as

    where have length , have length and have length . This acts simply as an identity map: on elements , we get

    which is simply a renaming of the input. The inverse associativity map is represented exactly the same and again works as an identity map.

  7. Symmetry. The tensor product is not commutative on vectors, but the generalised Kronecker delta for the symmetry map on vector spaces with rank respectively, performs an identity. The order of evaluation is given by the switch in indices in input and output. So is represented by

    On an input , we get

    but here the order of the indices in the codomain dictates that the elements of are placed after the elements of .

Reducing the interpretation

With the translation of maps of FVect in terms of generalised Kronecker deltas, we are ready to state our claim.

Theorem 1.

For any proof , we have that .

Proof.

By induction over the size of proofs. The base case is the case of the axiom, for which we have . The identity map is represented by