Translating and Evolving: Towards a Model of Language Change in DisCoCat

11/08/2018 ∙ by Tai-Danae Bradley, et al. ∙ CUNY Law School University of California, Riverside University of Amsterdam 0

The categorical compositional distributional (DisCoCat) model of meaning developed by Coecke et al. (2010) has been successful in modeling various aspects of meaning. However, it fails to model the fact that language can change. We give an approach to DisCoCat that allows us to represent language models and translations between them, enabling us to describe translations from one language to another, or changes within the same language. We unify the product space representation given in (Coecke et al., 2010) and the functorial description in (Kartsaklis et al., 2013), in a way that allows us to view a language as a catalogue of meanings. We formalize the notion of a lexicon in DisCoCat, and define a dictionary of meanings between two lexicons. All this is done within the framework of monoidal categories. We give examples of how to apply our methods, and give a concrete suggestion for compositional translation in corpora.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Language allows us to communicate, and to compose words in a huge variety of ways to obtain different meanings. It is also constantly changing. The compositional distributional model of [8]

describes how to use compositional methods within a vector space model of meaning. However, this model, and others that are similar

[5, 21], do not have a built in notion of language change, or of translation between languages.

In contrast, many statistical machine translation systems currently use neural models, where a large network is trained to be able to translate words and phrases [22, 9]. This approach does not make use of the grammatical structure which allows you to build translations of phrases from the translations of individual words. In this paper we define a notion of translation between two compositional distributional models of meaning which constitutes a first step towards unifying these two approaches.

Modeling translation between two languages also has intrinsic value, and doing so within the DisCoCat framework means that we can use its compositional power. In section 3.1, we provide a categorical description of translation between two languages that encompasses both updating or amending a language model and translating between two distinct natural languages.

In order to provide this categorical description, we must first introduce some preliminary concepts. In section 3.2 we propose a unification of the product space representation of a language model of [8] and the functorial representation of [17]. This allows us to formalize the notion of lexicon in section 3.3 which had previously been only loosely defined in the DisCoCat framework. We then show how to build a dictionary between two lexicons and give an example showing how translations can be used to model an update or evolution of a compositional distributional model of meaning. In section 3.4 we give a concrete suggestion for automated translation between corpora in English to corpora in Spanish.

2 Background

2.1 Categorical Compositional Distributional Semantics

Categorical compositional distributional models [8]

successfully exploit the compositional structure of natural language in a principled manner, and have outperformed other approaches in Natural Language Processing (NLP)

[10, 15]. The approach works as follows. A mathematical formalization of grammar is chosen, for example Lambek’s pregroup grammars [19], although the approach is equally effective with other categorial grammars [7]. Such a categorial grammar allows one to verify whether a phrase or a sentence is grammatically well-formed by means of a computation that establishes the overall grammatical type, referred to as a type reduction. The meanings of individual words are established using a distributional model of language, where they are described as vectors of co-occurrence statistics derived automatically from corpus data [20]. The categorical compositional distributional programme unifies these two aspects of language in a compositional model where grammar mediates composition of meanings. This allows us to derive the meaning of sentences from their grammatical structure, and the meanings of their constituent words. The key insight that allows this approach to succeed is that both pregroup grammars and the category of vector spaces carry the same abstract structure [8], and the same holds for other categorial grammars since they typically have a weaker categorical structure.

The categorical compositional approach to meaning uses the notion of a monoidal category, and more specifically a compact closed category to understand the structure of grammar and of vector spaces. For reasons of space, we do not describe the details of the compositional distributional approach to meaning. Details can be found in [8, 17], amongst others. We note only that instead of using a pregroup as our grammar category, we use the free compact closed category generated over a set of types , as described in [25, 24].

3 Translating and Evolving

The categorical model has proved successful in a number of natural language processing tasks [10, 15], and is flexible enough that it can be extended to include ambiguity [23] and changes of the semantic category [6, 2]. These formalisms have allowed for connections between semantic meanings. By representing words as density matrices, a variant of Löwner ordering has been used to measure the degree of entailment between two words [26, 4]. A more simple notion of similarity has been implemented in the distributional model by using dot product [8]. However, these notions of similarity are not built into the formalism of the model. This section defines the notion of a categorical language model which keeps track of internal relationships between semantic meanings.

So far the implementation of these models has been static. In this section, we define a notion of translation which comprises a first step into bringing dynamics into these models of meaning. We show how a language model can be lexicalized, i.e. how vocabulary can be attached to types and vectors and introduce a category of lexicons and translations between them. This allows dictionary between phrases in one language model and the phrases in another.

3.1 Categorical Language Models and Translations

Definition 3.1.

Let be a category which is freely monoidal on some set of grammatical types. A distributional categorical language model or language model for short is a strong monoidal functor

If is compact closed then the essential image of inherits a compact closed structure. All of the examples we consider will use the canonical compact closed structure in FVect. However, this is not a requirement of the general approach, and other grammars that are not compact closed my be used, such as Chomsky grammars [12] or Lambek monoids [7].

Distributional categorical language models do not encapsulate everything about a particular language. In fact, there are many possible categorical language models for the same language and there is a notion of translation between them.

Definition 3.2.

A translation from a language model to a language model is a monoidal functor and a monoidal natural transformation . Pictorially, is the following 2-cell

Given another language model and a translation the composite translation is computed pointwise. That is, is the translation where is the vertical composite of the natural transformations and .

Definition 3.3.

Let DisCoCat be the category with distributional categorical language models as objects, translations as morphisms, and the composition rule described above.

This category allows us to define ways of moving between languages. The most obvious application of this is that of translation between two languages such as English and Spanish. However, the translation could also be from a simpler language to a more complex language, which we think of as learning, and it could be within a shared language, where we see the language evolving.

3.2 The Product Space Representation

In [8] the product space representation of language models was introduced as a way of linking grammatical types with their instantiation in FVect. The idea is that the meaning computations take place in the category where is a pregroup or free compact closed category. Let be a sentence, that is a sequence of words whose grammatical types reduce to the sentence type . To compute the meaning of you:

  • Determine both the grammatical type and distributional meaning in where is a meaning space for the grammatical type .

  • Using the monoidal product and tensor product, obtain the element

    and .

  • Let be a type reduction in

    . There is a linear transformation

    given by matching up the compact closed structure in with the canonical compact closed structure in FVect. Apply to the vector to get the distributional meaning of your sentence.

The product space provides a setting in which the meaning computations take place but it does not contain all of the information required to compute compositional meanings of sentences. To do this requires an assignment of every grammatical type to a vector space and every type reduction to a linear transformation in a way which preserves the compact closed structure of both categories. This suggests that there is a compact closed functor lurking beneath this approach. With this in mind we introduce a new notion of the product space representation using the Grothendieck construction [13]. In order to use the Grothendieck construction we first need to interpret vector spaces as categories.

In this paper, we will do this in two mostly trivial ways which do not take advantage of the vector space structure in FVect. The first way we will turn vector spaces into categories is via the discrete functor

assigns each vector space to the discrete category of its underlying set. For a linear transformation , is the unique functor from to which agrees with on the elements of .

There is another way to generate free categories from sets.

Definition 3.4.

Let be a finite dimensional real vector space. Then, the free chaotic category on , denoted , is a category where

  • objects are elements of and,

  • for all , in we include a unique arrow labeled by the Euclidean distance between and .

This construction extends to a functor . For , define to be the functor which agrees with on objects and sends arrows to .

The morphisms in for a vector space allow us to keep track of the relationships between different words in .

We now give a definitions of the product space representation in terms of the Grothendieck construction which depends on a choice of functor .

Definition 3.5.

Let be a language model and let be a faithful functor. The product space representation of with respect to , denoted , is the Grothendieck construction of . Explicitly, is a category where

  • an object is a pair where is an object of and is an object of

  • a morphism from to is a tuple where is a morphism in and is a morphism in

  • the composite of and is defined by

Remark 3.6.

Because is faithful, it is an equivalence of categories onto its essential image in Cat. Because monoidal structures pass through equivalences is a monoidal functor where denotes the essential image of .

When is equal to the discrete category functor , then the product space representation is the category of elements of . This is a category where

  • objects are pairs where is a grammatical type and is a vector in .

  • a morphism is a type reduction such that

In this context we can compare the product space representation in Definition 3.5 with the representation introduced in [8] to see that they are not the same. One difference is that only includes the linear transformations that correspond to type reductions and not arbitrary linear transformations. This narrows down the product space representation to a category that characterizes the meaning computations which can occur. Also, the meaning reductions in correspond to morphisms in the product space representation whereas before they occurred within specific objects of the product space. Using this definition of the product space representation, we are able to formally introduce a lexicon into the model and understand how these lexicons are affected by translations.

When as in Definition 3.4 the product space representation is as follows:

  • objects are pairs where is a grammatical type and is a vector in .

  • a morphism is:

    • a type reduction

    • a positive real number

Now, objects in are pairs of grammatical types and vectors, rather than vector spaces. We can therefore see as a catalogue of all possible meanings associated with grammatical types. The linear transformations available in are only those that are derived from the grammar category.

Proposition 3.7 ( is monoidal).

For and , is a monoidal category with monoidal product given on objects by

and on morphisms by

where is the natural isomorphism included in the data of the monoidal functor .

Proof.

Adapted from Theorem 38 of [3]. ∎

The fact that is monoidal enables us to use the powerful graphical calculus available for monoidal categories. Previously the monoidal graphical calculus has only been used to pictorially reason about grammatical meanings. Because the elements of the product space representation represent both the syntactic and semantic meaning, this proposition tells us that we can reason graphically about the entire meaning of our phrase.

The product space construction also applies to translations:

Proposition 3.8 (Translations are monoidal).

Let be a fully faithful functor. Then there is a functor , where MonCat is the category where objects are monoidal categories and morphisms are strong monoidal functors, that sends

  • language models to the monoidal category

  • translations to the strong monoidal functor where the functor acts as follows:

    • On objects, sends to .

    • Suppose is a morphism in so that is a reduction in and is a morphism in . Then sends to the pair .

Proof.

Adapted from Theorem 39 in [3]. ∎

3.3 Lexicons

Using our definition of the product space representation, we are able to formally introduce a lexicon into the model and describe how these lexicons are affected by translations. In what follows we fix in all product space constructions and denote as . We also use the notation for .

Definition 3.9.

Let be a categorical language model and let be a finite set of words, viewed as discrete category. Then a lexicon for is a functor . This corresponds to a function from into the objects of .

Lexicons can be extended to arbitrary phrases in the set of words . Phrases are finite sequences of words where is the free monoid on . The function assigns to each the pair corresponding to its grammatical type and its semantic meaning . Because is free, this defines a unique object in :

where is the grammatical type of and is the semantic meaning of for . The extension of to will be denoted by

Example 3.10.

Let be the free compact closed category on the grammatical types of nouns and sentences. Then, for the phrase , a lexicon for gives the unique element

In , the grammar type reduces to via the morphism and so we get a reduction

To fully specify a translation between two lexicons it is not necessary to manually match the words in each corpora. This is because a relation between the phrases in the corpora can be derived from a translation between the language models.

Definition 3.11.

Let and be lexicons and let be a translation from to . Then, the - dictionary with respect to is the comma category

denoted by . Since and are discrete categories, is a set of triples where , and is a morphism in . Explicitly, let

then is

  • a type reduction in the grammar category

  • a morphism in . Recall from Definition 3.4 that this corresponds to a real number denoting the distance between and in . Here, is the vector that results from applying the translation and any grammatical reductions. Namely, i.e., we firstly translate the vector into , then apply the linear map corresponding to the reduction , and finally send the resulting vector to its corresponding object in the chaotic category.

and allows us to keep track of the distances between phrases in to phrases in in a compositional way; similarities between phrases are derived from similarities between the constituent words.

Let be a positive real number. Then define to be the relation which pairs two words in if the distance between their semantic meanings is less than or equal to . The purpose of this is to say that we are interested in pairs of words and phrases which do not have to be identical, but whose meaning is sufficiently close.

Example 3.12 (Syntactic simplification).

We give an example of a translation from a language with several noun types, accounting for singular and plural nouns to a language with one noun type. To start, suppose is the free monoid on the set and set . Suppose and so that and , and let be a translation from to , where the language model has and where is generated by and where the extra dimension records the quantity conveyed by the noun. Let be a one-dimensional space spanned by , which denotes surprise. For the purposes of this example we will normalize all non-zero values in to the vector . This gives only two attainable values in ; meaning that the sentence is surprising, and meaning that the sentence is not surprising. The language model agrees with on both and , that is and . The functor is given by and and finally, the components of are by the identity on every space except for where it is defined as the canonical projection onto the first three coordinates.

defined as follows:

where we use to denote either of or . The type of wears is polymorphic - it can take singular, plural, or mass nouns as subject and object. Here, we consider types with singular or plural arguments.

The functor uses and to assign a tuple on the right to each to each tuple on the left, i.e.

In particular, each item on the right hand side is of the form where is an element of . Now, in , we have:

On the other hand,

We translate Rosie wears boots by computing

and applying the relevant reduction morphisms, we obtain:

To translate the sentence Rosie wears a boot we perform the same matrix multiplication to obtain a value of as well. In the language model , these two sentences had distinct meanings. However, because cannot detect the quantity of a noun, their translations are both unsurprising.

This example shows how we can map a language with one grammatical structure onto another with a differing grammatical structure. In this case we have simplified the grammar, but we could also provide a translation that maps the simpler grammar into the more complex grammar by identities and inclusion. The phenomenon of grammatical simplification is one that has been observed in various languages [28, 18]. This provides us with the beginnings of a way to describe these kinds of language evolution.

3.4 Translating Between English and Spanish

In this section we construct a partial translation from English and Spanish. The relationship between English grammar and Spanish grammar is not functional; there are multiple types in Spanish that a single type in English should correspond to and vice versa.

Let be the grammar category for English and let be the grammar category for Spanish. In English, adjectives multiply on the left to get a reduction

and in Spanish adjectives multiply on the right to get a reduction

Suppose there is a strict monoidal functor which makes the assignment . We also wish to map the reduction to the reduction . This requires that . By monoidality this means that and . A monoidal functor cannot capture this relationship because it must be single valued. However, if we choose to only translate either adjectives or nouns we can construct a translation.

Example 3.13 (Translation at the phrase level).

In this example we choose to translate the fragment of English and Spanish grammar which includes nouns but not adjectives. We can also translate intransitive verbs from English to Spanish while keeping the functor between grammar categories single-valued.

Let be the free compact closed category on the noun and sentence types in English and let be the isomorphic category generated by the corresponding types in Spanish. Consider distributional categorical language models

for English and Spanish respectively. Consider a fragment of these languages consisting of only nouns and intransitive verbs. Let , , and . Lexicons for the two languages can be populated by learning representations from text.

To specify the translation we set to be the evident functor which sends English types to their corresponding type in Spanish. To define a natural transformation it suffices to define on the basic grammatical types which are not the nontrivial product of any two other types. Because is a monoidal natural transformation, we have that for every product type .

If there were only one grammatical type , then the language models would have no grammatical content and the translation would consist of a single linear transformation between words in English to words in Spanish. Learning this transformation is in fact a known method for implementing word-level machine translation, as outlined in [22, 14].

However, in general we need the natural transformation to commute with the type reductions in . Indeed, consider , , , . We require that

commutes i.e. that if we first reduce to obtain and then translate to , we get the same as if we translate each word first, sending to and then reduce to . Because these meaning reductions are built using dot products, this requirement is equivalent to the components of being unitary linear transformations. In general, a linear transformation learned from a corpora will not be unitary. In this case we can replace

with the unitary matrix which is closest to it. This is a reasonable approximation because translations should preserve relative similarities between words in the same language.

4 Future Work

We have defined a category DisCoCat which contains the categorical compositional distributional semantics of [8] as objects and ways in which they can change as morphisms. We then outlined how this category can be used to translate, update or evolve different distributional compositional models of meaning.

There is a wide range of future work on this topic that we would like to explore. Some of the possible directions are the following:

  • In this paper, we failed to construct a complete translation from English to Spanish using the definitions in this paper. The difficulty arose from the lack of a functional relationship between the two languages. To accommodate this, translations between language models can be upgraded by replacing functors with profunctors. This would include replacing the grammar transformation with a monoidal profunctor between the grammar categories. Because relationships between semantic meanings are also multi-valued we plan on replacing the components of the natural transformation with profunctors as well.

  • This model can be improved to take advantage of the metric space structure of vector spaces to form dictionaries in a less trivial way. This would give a more intelligent way of forming translation dictionaries between two languages

  • Whilst we have taken care to ensure that the category of language models we use is monoidal, we have not yet taken advantage of the diagrammatic calculus that is available to us. This is something we would like to do in future work.

  • We can better understand how language users negotiate a shared meaning space as in [27], by modeling this as translations back and forth between agents. This will enhance the field of evolutionary linguistics by giving a model of language change that incorporates categorial grammar.

  • We would like to use the methods here to implement computational experiments by creating compositional translation matrices for corpora in two different languages. These models of translation may also be used to make the previous computational experiments such as[11] [16] more flexible.

5 Acknowledgments

Thanks go to Joey Hirsh for enlightening and fruitful discussions of the issues in this paper. We would also like to thank John Baez for catching a mistake in an early draft of this paper. We gratefully acknowledge support from the Applied Category Theory 2018 School at the Lorentz Center at which the research was conceived and developed, and funding from KNAW. Martha Lewis gratefully acknowledges support from NWO Veni grant ‘Metaphorical Meanings for Artificial Agents’.

References