Distilling Text into Circuits

01/25/2023
by   Vincent Wang-Mascianica, et al.
0

This paper concerns the structure of meanings within natural language. Earlier, a framework named DisCoCirc was sketched that (1) is compositional and distributional (a.k.a. vectorial); (2) applies to general text; (3) captures linguistic `connections' between meanings (cf. grammar) (4) updates word meanings as text progresses; (5) structures sentence types; (6) accommodates ambiguity. Here, we realise DisCoCirc for a substantial fragment of English. When passing to DisCoCirc's text circuits, some `grammatical bureaucracy' is eliminated, that is, DisCoCirc displays a significant degree of (7) inter- and intra-language independence. That is, e.g., independence from word-order conventions that differ across languages, and independence from choices like many short sentences vs. few long sentences. This inter-language independence means our text circuits should carry over to other languages, unlike the language-specific typings of categorial grammars. Hence, text circuits are a lean structure for the `actual substance of text', that is, the inner-workings of meanings within text across several layers of expressiveness (cf. words, sentences, text), and may capture that what is truly universal beneath grammar. The elimination of grammatical bureaucracy also explains why DisCoCirc: (8) applies beyond language, e.g. to spatial, visual and other cognitive modes. While humans could not verbally communicate in terms of text circuits, machines can. We first define a `hybrid grammar' for a fragment of English, i.e. a purpose-built, minimal grammatical formalism needed to obtain text circuits. We then detail a translation process such that all text generated by this grammar yields a text circuit. Conversely, for any text circuit obtained by freely composing the generators, there exists a text (with hybrid grammar) that gives rise to it. Hence: (9) text circuits are generative for text.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2022

Language-independence of DisCoCirc's Text Circuits: English and Urdu

DisCoCirc is a newly proposed framework for representing the grammar and...
research
01/03/2018

Sentence Object Notation: Multilingual sentence notation based on Wordnet

The representation of sentences is a very important task. It can be used...
research
12/13/2022

Category Theory for Quantum Natural Language Processing

This thesis introduces quantum natural language processing (QNLP) models...
research
12/28/2018

The role of grammar in transition-probabilities of subsequent words in English text

Sentence formation is a highly structured, history-dependent, and sample...
research
03/09/2023

Geometry of Language

In this article, we present a fresh perspective on language, combining i...
research
09/25/2017

Extracting Ontological Knowledge from Textual Descriptions

Authoring of OWL-DL ontologies is intellectually challenging and to make...
research
04/06/2019

The Mathematics of Text Structure

In previous work we gave a mathematical foundation, referred to as DisCo...

Please sign up or login with your details

Forgot password? Click here to reset