Density Matrices for Derivational Ambiguity

08/20/2019 ∙ by A. D. Correia, et al. ∙ Utrecht University 0

Recent work on vector-based compositional natural language semantics has proposed the use of density matrices to model lexical ambiguity and (graded) entailment (e.g. Piedeleu et al 2015, Bankova et al 2016, Sadrzadeh et al 2018). Ambiguous word meanings, in this work, are represented as mixed states, and the compositional interpretation of phrases out of their constituent parts takes the form of a strongly monoidal functor sending the derivational morphisms of a pregroup syntax to linear maps in FdHilb. Our aims in this paper are twofold. First, we replace the pregroup front end by a Lambek categorial grammar with directional implications expressing a word's selectional requirements. By the Curry-Howard correspondence, the derivations of the grammar's type logic are associated with terms of the (ordered) linear lambda calculus; these terms can be read as programs for compositional meaning assembly with density matrices as the target semantic spaces. Secondly, we use the density matrix spaces to model the ubiquitous derivational ambiguity of natural language syntax, opening up the possibility of an integrated treatment of lexical and derivational forms of ambiguity.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Lambek Calculus: proofs as programs

With his [2, 3] papers, Jim Lambek initiated the ‘parsing as deduction’ method in computational linguistics: words are assigned formulas of a type logic designed to reason about grammatical composition; the judgement whether a phrase is well-formed is the outcome of a process of deduction in that type logic. Lambek’s original work was on a calculus of syntactic types. Van Benthem [9] added semantics to the equation with his work on LP, a commutative version of the Lambek calculus, which in retrospect turns out to be a precursor of (multiplicative intuitionistic) linear logic. LP is a calculus of semantic types. Under the Curry-Howard ‘formulas-as-types’ approach, proofs in LP are in correspondence with terms of the (linear) lambda calculus; these terms can be seen as programs for compositional meaning assembly. To establish the connection between syntax and semantics, the Lambek-Van Benthem framework relies on a homomorphism sending types and proofs of the syntactic calculus to their semantic counterparts.

Typing rules:

Figure 1: Proofs as programs for (N)L.

In this paper, we want to keep the distinction between the left and right implications of the syntactic calculus in the vector-based interpretation we aim for. To do so, we turn to the directional lambda calculus of [10]; its Curry-Howard correspondence with proofs of (N)L is given in Fig 1. With L we refer to the simply typed (implicational) fragment of Lambek’s [2] associative syntactic calculus, which assigns types to strings; NL is the non-associative version of [3], where types are assigned to phrases (bracketed strings)222Neither of these calculi is satisfactory for modelling natural language syntax. What we report on, then, is a first step towards extended calculi that can handle the well-documented problems of over/undergeneration of (N)L in a principled way..

The presentation of Fig 1 is in the sequent-style natural deduction format. The formula language has atomic types (say s, np, n for sentences, noun phrases, common nouns) for complete expressions and implicational types , for incomplete expressions, selecting an argument to the left resp right to form a . Ignoring the term labeling, judgments are of the form , where the antecedent is a non-empty list (for L) or bracketed list (NL) of formulas, and the succedent a single formula . For each of the type-forming operations, there is an Introduction rule, and an Elimination rule.

Turning to the Curry-Howard encoding of NL proofs, we introduce a language of directional lambda terms, with variables as atomic expressions, left and right abstraction, and left and right application. The inference rules now become typing rules for these terms, with judgments of the form . The antecedent is a typing environment providing type declarations for the variables ; a proof constructs a program of type out of these variables. In the absence of Contraction and Weakening structural rules, the program contains as free variables exactly once, and in that order. Intuitively, one can see a term-labelled proof as an algorithm to compute a meaning of type with parameters of type . In parsing a particular phrase, one substitutes the meaning of the constants (i.e. words) that make it up for the parameters of this algorithm.

In what follows, we introduce density matrices as the meaning spaces of our target interpretation. Section 2 gives some background on density matrices, and on ways of capturing the directionality of our syntactic type logic in these semantic spaces. Section 3 then turns to the compositional interpretation of the programs associated with (N)L derivations. Section 4 shows how the density matrix framework can be used to capture simple forms of derivational ambiguity.

2 Density Matrices: Capturing Directionality

The semantic spaces we envisage for the interpretation of the syntactic calculus are density matrices. A density matrix or density operator is used in quantum mechanics to describe systems for which the state is not completely known. For lexical semantics, it can be used to describe the meaning of a word by placing distributional information on its components. As standardly presented333Appendix A provides some background for the non-physics reader, following [6].

, density matrices that are defined on a tensor product space indicate no preference with respect to contraction from the left or from the right. Because we want to keep the distinction between left and right implications in the semantics, we set up the interpretation of composite spaces in such a way that they indicate which parts will and will not contract with other density matrices.

The basic building blocks for the interpretation are density matrix spaces for a vector space over and and the dual space. From these basic building blocks, composite spaces are formed via the binary operation (tensor product) and an operation that sends a density matrix basis to its dual basis. In the notation, we use for density matrix spaces (basic or compound), and , or subscripted for elements of such spaces. The operation is involutive; it interacts with the tensor product as and acts as identity on matrix multiplication.

Below in (†) the general form of a density matrix defined on a single space in the standard basis, and (‡) in the dual basis:

A density matrix of a composite space can be an element of the tensor product space between the standard space and the dual space either from the left or from the right:

Recursively, density matrices that live in higher-rank tensor product spaces are constructed, taking into account the left and right product for the dual basis. The inner product can only the taken between a standard and its dual space by the following relations 444The elements in the dual density matrix space and the elements of the standard one can be related by the use a metric tensor such that

With these relations it is possible to retrieve all the calculations with respect to the standard density matrix basis. Presently, the implicit metric is the identity, with the up and down distinction on the indices allowing for a specification on what contractions are taking place. In the future a different metric can be used to introduce more structure regarding the words and their interaction.:

The multiplication between two density matrices is defined only if it agrees with the above definition of inner product:

which respects the directionality of composition.

These alterations introduce slight changes to the trace relative to the standard definitions. Given that contraction only occurs between up and down indices, its implementation varies depending on the argument, but in an unambiguous way:

showing that cyclicity still holds.

Since in §4 we will be dealing with derivational ambiguity, the concept of a permutation operation is introduced here. It extends naturally from the one in standard quantum mechanics. The permutations act between two words to swap their space assignment. If only one word is available, the change affects the respective space of tracing. An operator acts either on the bras or kets:

Only after these permutations are acted upon the vectors can the traces be calculated, taking into account the new subspace assignment. For consistency, if only one word is available the permutation affects the respective tracing space:

If no word has that assignment the permutation has no effect. For the permutations to make sense, and , or their duals, have to share the same basis.

3 Interpreting Lambek Calculus derivations

For the syntax-semantics interface, we assume a map sending syntactic types to the interpreting semantic spaces. For primitive types it acts as

with the vector space for sentence meanings, the space for nominal expressions (common nouns, full noun phrases). For compound types we have

Given the semantic spaces for the syntactic types, we can turn to the interpretation of the syntactic derivations, as coded by their proof terms. For this, we need a function that associates each term of type with a semantic value, i.e. an element of , the semantic space where meanings of type live. For proof terms, is defined relative to a assignment function , that provides a semantic value for the basic building blocks, viz. the variables that label the axiom leaves of a proof. (As we saw above, a proof term is a generic meaning recipe that abstracts from particular lexical meanings. When we consider specific lexical items in §4, we rely on an interpretation function to map them to their distributional meaning.)

Axiom

Elimitation

Recall the inference rules of Fig 1.

: Premises , ; conclusion :

: Premises , ; conclusion .

Introduction

: Premise , parametric in ; conclusion .

: Premise , parametric in ; conclusion .

The interpretation of the introduction rules lives in a compound density matrix space representing a linear map from to . The semantic value of that map, applied to any object , is given by , where is the assignment exactly like except possibly for the bound variable which is assigned the value .

To check the correctness of these definitions, Appendix B provides the calculations for the reduction proof transformation, showing that for all assignments , redex and contractum have the same semantic value.

4 Derivational Ambiguity

The density matrix construction can be successfully used to address cases of derivational ambiguity. Similarly to what has been done for lexical ambiguity, the meanings that arise due to different derivations can be stored in the diagonal elements of a density matrix. The advantage here is that, since the set-up is already on a multi-partite density matrices space, by making use of permutation operations it happens automatically that the two meanings are expressed independently. This is useful because it can be integrated with whatever lexical interpretation the density matrices might have, from lexical ambiguity to entailment information. It is also appropriate to treat the existence of these ambiguities in the context of incrementality, since it keeps the meanings separated in the interaction posterior fragments.

We give a simple example of how the trace machinery provides the passage from one reading to the other at the interpretation level and how the descriptions are kept separated. For this application, the coefficients in the interpretation of the words can be representative of the distribution of a words with respect to the context words, or can contain information obtained in another form. The final coefficients of the two outcomes, if properly normalized, indicate the probability of obtain that as the actual reading.

We illustrate the construction with the phrase "new book about logic". The lexicon below has the syntactic type assignments and the corresponding semantic spaces.

Given this lexicon, "new book about logic" has two derivations, corresponding to the bracketings "(new book) about logic" ():

[straight][r][straight][r][straight][r]x:n/n ⊢x:n/n[straight][r]y: n ⊢y:n (x:n/n, y:n) ⊢x(y) : n[straight][r][straight][r]w:(n\n)/n ⊢w:(n\n)/n[straight][r]z:n ⊢z:n(w:(n\n)/n, z:n) ⊢w(z):n\n[(x:n/n, y:n), (w:(n\n)/n, z:n)] ⊢(x(y))w(z):n

versus "new (book about logic)":

[straight][r][straight][r]x:n/n ⊢x:n/n[straight][r][straight][r]y: n ⊢y:n[straight][r][straight][r]w:(n\n)/n ⊢w:(n\n)/n[straight][r]z:n ⊢z:n(w:(n\n)/n, z:n) ⊢w(z):n\n[y:n, (w:(n\n)/n, z:n)] ⊢(y)w(z):n( x:n/n, [y:n, (w:(n\n)/n, z:n)] ) ⊢x((y)w(z) ):n

Taking "about logic" as a unit for simplicity, let us start from the following primitive interpretations:

  • ,

  • ,

  • .

Then, interpreting each step of the derivation in the way that was described in the previous section will give two different outcomes. For the first the result is

while for the second it is

While the coefficients might be different for each derivation, it is not clear how both interpretations are carried separately if they are part of a larger fragment, since their description takes place on the same space. Also, this recipe gives a fixed ordering and range for each trace. To be able to describe each final meaning separately, the concept of subspace or subsystem is introduced. These can be thought of as copies of a space with the same basis that do not interact with one another. They represent separately the state each of several identical quantum systems. For example, in case the spin state of two electrons is to be described, despite the fact that each spin state is defined in the same basis, it is necessary to distinguish which electron is in which state and so each is attributed to their own subspace. Regarding the linguistic application, because different subspaces act formally as different syntactic types and in each derivation the words that interact are different, it follows that each word should be assigned to a different subspace:

  • [noitemsep]

  • ,

  • ,
    .

Notice that the value of the coefficients given by the interpretation functions and that describe the words does not change from the ones given in , only possibly the subspace assignment does. Rewriting the derivation of the interpretations in terms of the subspaces, the ordering of the traces does not matter anymore since the contraction is restricted to its own subspace:

and for the second it is

The interpretation of each derivation belongs now to different subspaces, which keeps the information about the original word to which the free "noun" space is attached. However, it is not very convenient to attribute each word to a different subspace given the interpretation it will be part of, since that is information that comes from the derivation itself and not from the words. To solve this problem, one uses permutation operations over the subspaces. Since these have precedence over the trace, when the traces are taken the contractions change accordingly. This changes the subspace assignment at specific points so it is possible to go from one interpretation to the other, without giving different interpretations to each word initially. Thus, there is a way to go directly from the first interpretation to the second:

The reasoning behind is as follows: the permutation is used to swap the space assignment between "book" and the free space in "about_logic", after a permutation is used to change the argument space of "about_logic" from to , and then the same permutation is applied again to change the space of tracing. In this way, all the coefficients will have the correct contractions and in a different space from the first reading.

5 Conclusion and Future Work

In this extended abstract we provided a density matrix model for a simple fragment of the Lambek Calculus. The syntax-semantics interface takes the form of a compositional map assigning semantic values to the terms coding syntactic derivations. The density matrix model enables the integration of lexical and derivational forms of ambiguity. Additionaly, it allows for the transfer of methods and techniques from quantum mechanics to computational semantics. An example of such transfer is the permutation operator. In quantum mechanics, this operator permits a description of indistinguishable particles. In the linguistic application, it allows one to go from an interpretation that comes from one derivation to another, without the need to to go through the latter, but keeping this second meaning in a different subspace.

In future work, we want to extend our simple fragment with modalities for structural control (cf [4]), in order to deal with cases of derivational ambiguity that are licensed by these control modalities. Also, we want to consider derivational ambiguity in the light of an incremental left-to-right interpretation process, so as to account for the evolution of interpretations over time.

References

  • [1] Desislava Bankova, Bob Coecke, Martha Lewis, and Daniel Marsden. Graded entailment for compositional distributional semantics. CoRR, abs/1601.04908, 2016.
  • [2] Joachim Lambek. The mathematics of sentence structure. The American Mathematical Monthly, 65(3):154–170, 1958.
  • [3] Joachim Lambek. On the calculus of syntactic types. In Roman Jakobson, editor, Structure of Language and its Mathematical Aspects, volume XII of Proceedings of Symposia in Applied Mathematics, pages 166–178. American Mathematical Society, 1961.
  • [4] Michael Moortgat. Categorial type logics. In Handbook of logic and language, pages 93–177. Elsevier, 1997.
  • [5] Richard Moot and Christian Retoré. The logic of categorial grammars: a deductive account of natural language syntax and semantics, volume 6850. Springer, 2012.
  • [6] Michael A Nielsen and Isaac Chuang. Quantum computation and quantum information, 2002.
  • [7] Robin Piedeleu, Dimitri Kartsaklis, Bob Coecke, and Mehrnoosh Sadrzadeh. Open system categorical quantum semantics in natural language processing. CoRR, abs/1502.00831, 2015.
  • [8] Mehrnoosh Sadrzadeh, Dimitri Kartsaklis, and Esma Balkir. Sentence entailment in compositional distributional semantics. Ann. Math. Artif. Intell., 82(4):189–218, 2018.
  • [9] Johan van Benthem. The semantics of variety in categorial grammar. Technical Report 83-29, Simon Fraser University, Burnaby (B.C.), 1983. Revised version in W. Buszkowski, W. Marciszewski and J. van Benthem (eds) Categorial grammar, Benjamin, Amsterdam.
  • [10] Heinrich Wansing. Formulas-as-types for a hierarchy of sublogics of intuitionistic propositional logic. In David Pearce and Heinrich Wansing, editors, Nonclassical Logics and Information Processing, pages 125–145, Berlin, Heidelberg, 1992. Springer Berlin Heidelberg.

Appendix A Density Matrices in Quantum Mechanics

Let be vector spaces, components of a basis of a space and labels for different density matrices.

A quantum system, represented by , is allowed to be in one of states, which are represented as vectors in a basis. The th state is represented by , called a ket, a vector in the vector space . Dual to the ket is the bra, represented by and given by the conjugate transpose of the ket, which lives in the dual vector space 555It should be clear that the bra belongs to . However, to make the notation less heavy, we write simply in the subscript.:

The system is measured in each possible state with a probability . The basis vectors of the vector space and its dual are respectively written as the sets and . Using that , the general state of the quantum system can be rewritten with respect to this basis666Given two elements and of a vector space, we can define the tensor product. Once a basis is defined, for an times matrix and an times matrix, the tensor product can be represented by the following elements:

which is an times matrix, where and can also be tensor products themselves, represented by a matrix. The basis of is given by the tensor product between the basis vectors of and .

The density matrix is an object in a tensor product space. It is defined by the tensor product between the possible states of the system and the respective duals, weighted by the corresponding probability of being in that state:

With respect to the basis vectors, the density matrix can be expressed as

If only the diagonal states have positive coefficients, the state is pure, otherwise it is mixed.

While the application of the tensor product creates higher-rank tensors, the inner product reduces the dimension of the operators. The inner product is a map . It works on the basis vectors as

If there is a correlation between two or more quantum systems, the tensor product is used to create or represent density matrices and states that belonging to multipartite quantum systems simultaneously. The basis elements of the composite vector space, which are kets, are given by

while the basis elements of the composite dual vector space, bras, are given by

The density matrix of a multipartite system can then be defined:

with .
While a density matrix can already be describing a composite system, the tensor product of two single-system density matrices also creates a density matrix in the basis of the composite system:

again with . However, note that not all density matrices in this composite space can be separated into the tensor product of matrices of single spaces. If the density matrices are described in the same space, matrix multiplication affects them in the following way:

This product is not commutative in general. If the product is between composite density matrices that share a part in the same subspace, the components that belong to overlapping spaces are affected in the way described above.

A key operation in what follows is the trace of a density matrix. The space where the trace is taken is specified. In case the trace is not taken over all the spaces of a composite density matrix, it is called the partial trace, otherwise it is called the total trace. It acts by adding the diagonal elements of the matrix that belongs to that space:

The trace is also a map , so it can be seen as the generalization for matrices of the inner product for vectors

The density matrix is itself an operator, and in quantum mechanics the trace of the product between a measurement operator and a density matrix gives the probability that the outcome of that measurement will occur given that the initial state before the measurement corresponds to that density matrix. If one of the density matrices is considered as the measurement operator then the same interpretation can be applied. A particular instance of this occurs when has the value of for only this component: is multiplied by the basis element and the result of the trace is the projection of onto this element, in which case the matrix element is recovered. Note that the trace is cyclic:

The partial trace for a matrix in a composite space is a trace taken over only some of the spaces it is composed of: