Natural language semantics studies the meaning of natural language utterances. A fundamental conceptual tool for this are truth conditions: the set of conditions under which an NL utterance is true. For example, “John loves Mary” is true if and only if John indeed loves Mary. Somewhat less tautologously: two assertions have the same meaning, if they have the same truth conditions. We can therefore identify the meaning of an assertion with its truth conditions [Davidson:tam67]. This notion of “meaning” is very general, but also not very constructive. Therefore truth conditions are generally thought of as a minimal axiomatization of the domains of discourse which entails the assertion.
To understand this setup, assume that we use a formal language to express truth conditions. The meaning of “John loves Mary” could then be . If is the formal language of a logical system, we also have an interpretation function of expressions into a model and a calculus with a derivation relation . If is sound and complete, the upper rectangle in Figure 1 commutes. If the calculus is an adequate model for the natural language entailment relation – also called “textual entailment” in the linguistics literature – both rectangles in Figure 1 commute and we have a good model for truth conditions and logical entailment for natural language utterances. In this case, it suffices to specify the translation from natural language to the formal language along with a calculus . And in general, the “NL semantics” literature restricts itself to the box in Figure 1, entrusting the upper square to logicians and the equivalence of -entailment and textual entailment to the logic developers. At the same time, NL semanticists continually need extensions to and to model new NL phenomena.
In particular, it is still unachievable to describe a translation from the entirety of natural language into some formal language. Instead, researchers rather focus on particular phenomena in natural language by describing a small subset of natural language utterances (a fragment) along with the meaning of these utterances. This method of fragments was established by Richard Montague [Montague:efl70]. It typically results in the description of three components:
a grammar that fixes the language fragment and generates syntax trees
a formal system in which the semantics of utterances can be expressed
a way to transform syntax trees to expressions in the formal system, which is often referred to as semantics construction
The semantics construction is based on the compositionality principle: the idea that the meaning of a complex utterance is determined by the meaning of its constituents. Thereby, the semantics construction boils down to mapping grammar rules to corresponding semantic operations. Consider, for example, the grammar rule <sentence> ::= <sentence> "and" <sentence>. It corresponds to the semantic operation , where and are the meanings of the constituent sentences. The semantics construction may be followed by a semantic analysis111 In Anglo-Saxon literature this is sometimes called pragmatics. , which comprises various non-compositional operations such as inference, anaphora resolution, or contextual anchoring.
Symbolic natural-language understanding (NLU) systems describe the entire pipeline from strings to semantic representations (Figure 2). They have been used to describe the semantics of a variety of natural-language phenomena. In the process, many different logics have been developed. However, the experiments were mostly done with pen and paper, and have rarely been implemented in software. This can lead to researchers focusing either on the linguistic side or on the logical side of the problem, while the actual semantics construction remains vague.
When someone actually implemented such an NLU system, it was usually done in a programming language like Prolog or Haskell – see e.g. [BlaBos:rainl05, EijUng:csfp10]. In both cases, the authors claim that the programming language is an NLU framework – in the first case since Prolog is a declarative programming language and in the second since Haskell is very high-level. In any case, the NLU system requires a considerable – potentially prohibitive – amount of programming work. As far as we can tell, there is no fully declarative framework that could be used to do both, the grammar development as well as the logic development and keep them in sync.
In this paper we describe our efforts to create the Grammatical Logical Framework (GLF). It combines an existing framework for natural language grammars with an existing framework for logic development. Concretely, we combine the Grammatical Framework (GF) [ranta-2011], with the logic development tool MMT [RabKoh:WSMSML13]. This is possible, because the logical frameworks underlying these tools are compatible. GF handles the natural language parsing and generates terms (parse trees) in a logical framework (Martin-Löf type theory [Ranta:GF04]). MMT, which supports LF and various extensions, maps these terms to expressions in the desired target logic – see Figure LABEL:fig:glff.
In GLF, an NLU researcher can specify a fragment of a language in GF and, in parallel, develop a logic in MMT, along with a domain theory and the semantics construction. Our framework supports this in various ways, such as:
it allows the researcher to try out the entire pipeline from an utterance to its logical representation
it checks the totality of the semantics construction
the grammar and the logic are type-checked as usual in GF and MMT
We admit that symbolic natural language understanding has dropped from the limelight of computational linguistic research in the last two decades in favor of machine-learning-based approaches. But the success of these has only shadowed the question of semantic analysis and natural language inference. We see a cautious revival of symbolic/logic-based methods in computational linguistics, and we hope thatGLF can serve as a tool to facilitate this.
The symbolic approach to NLU needs extensive resources (e.g. grammars and ontologies). Aarne Ranta, the creator of GF, distinguish two areas of NL applications: consumer tasks and producer tasks [Ranta:GfGoogle2016]. Consumer tasks require large coverage – often achieved through machine learning – and are therefore typically limited in their precision. An example of this would be machine learning-based text translation a la Google Translate. Producer tasks, on the other hand, require high precision, but are restricted in their coverage to a few thousand concepts. An example are technical manuals for complex machinery in dozens of languages, where the consequences of mistranslation may be catastrophic. Beyond translation, producer tasks – the natural hunting grounds of GLF– include understanding of mathematical papers, laws or contracts.
First, we will describe GF and MMT (Sections 2 and 3). After an overview of the GLF system (Section LABEL:sec:glfSystem), we will describe the semantics construction and semantic analysis (Sections LABEL:sec:semConstr and LABEL:sec:semAnal) using a running example. Section LABEL:sec:examples contains more examples of how this framework can be used. Section LABEL:sec:concl concludes the paper and discusses future work.
We are grateful for the discussions with and insights from Aarne Ranta, Florian Rabe, and finally Dennis Müller, who has also prototyped an early version of GLF. The work reported here was supported by the German Research Foundation (DFG) under grant KO 2428/18.
2 Gf: The Grammatical Framework
The Grammatical Framework (GF) [ranta-2011, GF:on] can be used to create multilingual grammar applications. GF grammars are divided into two parts: abstract syntax and concrete syntaxes. The abstract syntax describes the ASTs (abstract syntax trees or abstract syntax terms) covered by the grammar. The concrete syntaxes are AST linearization rules in a specific natural language.
Let us consider a small example: Listing LABEL:lst:GF-Life shows an abstract syntax for representing some statements about everyday life such as “Joan runs and Mary loves Joan”. First, three basic types are introduced (Stmt, Person, Action) with the keyword cat (in GF they are called categories). Afterwards, several function constants are introduced with the keyword fun. The example utterance “Mary loves Joan” would correspond to the AST act mary (love joan). Formally, GF is based on a version of constructive type theory [Ranta:GF04]. It supports dependent types, but, in our experience, these are not very useful for most natural-language grammar applications.