1 Introduction
The study of thought has occupied academia for millennia, from philosophy, to medicine, and everything in between. We demonstrate a promising approach to take advantage of the powerful tools developed by other disciplines in our study of cognition, by emphasising the role that category theory – a foundation of mathematics – ought to play.
We first motivate our approach by considering the effects of general relativity on a highly compact cognitive agent, and noting the structural similarities between metric field theories in physics, and the dynamics of mental content in conceptual spaces. We feel that these similarities practically beg for the study of field theories in conceptual spaces themselves, and consider the topological defects that arise; interpreting them, as is done in physics, as particles of thought.
2 Background
We first recall some required background which we build upon.
2.1 Cognitive architecture and conceptual spaces
To model a dynamical system, we require specification of two things: How the data is to be represented, and the system that operates on it. In our approach, we use conceptual spaces and cognitive architectures, respectively.
2.1.1 Conceptual spaces
Conceptual spaces[gardenforsConceptualSpacesGeometry2004, zenkerApplicationsConceptualSpaces2015]
provide a convenient middle-ground approach to knowledge representation, situated between the symbolic and associationist approaches in terms of abstraction and sparseness. A conceptual space is a tensor product of conceptual
domains, which can be arbitrary topological spaces equipped with certain measures. They provide a low-dimensional alternative to vector space embeddings, which often have no clear interpretation of their topological properties.
[boltInteractingConceptualSpaces2017]Conceptual spaces represent concepts as regions in a topological space. Because (meta-)properties of the conceptual spaces depend on context, this suggests that a proper description involves the notion of a (pre)sheaf. Conveniently, they also provide a well-motivated, natural construction for prior probabilities based on the specific symmetries of the domain.
[decockGeometricPrincipleIndifference2016]2.1.2 Cognitive architecture
Cognitive architectures are an system-level integration of many different aspects of “intelligence”. It can be useful to think of cognitive architectures as the cognitive analogue of a circuit diagram; specifying the hardware/operating system (in the case of robotic agents) or the wetware (in the case of biological agents) that the processes deliberate thought, intentions and reasoned behaviour occur. They generally take a “mechanism, not policy” approach: just like a mathematician has the same brain structure as a carpenter, a given cognitive architecture can produce wide range of agents, depending on what has been learned.
Cognitive architecture has been studied in the context of Artificial Intelligence for at least fifty years, with several hundred architectures described in the literature. lairdSoarCognitiveArchitecture2012 describes the SOAR cognitive architecture, one of the most well-established examples in the field. Several surveys have been presented, such as goertzelBriefSurveyCognitive2014,kotserubaReview40Years2016. huntsbergerCognitiveArchitectureMixed2011 describes one particular architecture designed for human-machine interaction in space exploration.
2.1.3 Generalised Problem Space Conceptual Model
As a concrete example of a cognitive architecture, we show a modified version of the Problem Space Conceptual Model[lairdSoarCognitiveArchitecture2012] in Figure 1
. It is reminiscent of a multi-head, multi-tape Turing machine; but with transition rules stored in memory, too. The core idea of the PSCM is that operators are selected on the basis of the contents of working memory, which is essentially short-term memory. Production rules elaborate the contents of working memory until quiescence; operators are essentially in-band signals about which production rules should be fired. If progress can’t be made, an
impasse is generated; from here, we can enter a new problem space. This is essentially a formalisation of a “point of view” to solve a problem.2.1.4 Realisation of cognitive architecture
Cognitive architectures are abstract models, which need to be concretely realised. We almost always study cognition through a realisation via neuroscience, which itself is realised via biology, then chemistry, then physics. This is depicted in Figure 2. Alternatively, we may realise it in the form of robotics and software simulations; like with neuroscience, these are also built upon a tower of realisations.
2.2 Limits on computation
Computation has its limits; most famously, the halting problem. We can find limits of computation in at least two general classes: limits imposed by computation in-and-of-itself, and limits imposed by physics on the embodiment of a computation.
2.2.1 Busy-Beaver Machines
Limits on maximum output of a Turing machine for a given number of states, and number of transitions, . The Busy Beaver game is to construct an -state Turing machine which outputs the largest amount of “1”s possible, then halts. A variation on this is to count the number of steps it takes for the said Busy Beaver machine to halt. If you look at this as a function in , whether counting the size of the output or the transitions, you have an extraordinarily fast-growing function, quickly outpacing Ackermann’s function. You can use this to decide whether -state Turing machines halt, simply by waiting BB(n) steps. If it takes longer than that, then you know it will never halt, by definition, because it would be the new BB(n). Of course, BB(n) itself is uncomputable in general: We have no free lunch.
2.2.2 The Bekenstein bound
The Bekenstein bound is a limit on information density:
(1) |
where is the radius of the system, is the energy of the system, is the reduced Planck’s constant, and the speed of light.[2005RPPh...68..897T] If one assumes the laws of thermodynamics, then all of general relativity is derivable from Equation 1.[1995PhRvL..75.1260J] If you fixes the radius , then as you try to fit more and more information in the given volume, you must eventually start to increase the energy in that volume. Eventually, the Schwarzchild radius of that energy equals , and you have a black hole. This occurs when Equation 1 is an equality.
2.2.3 The Margolus-Levitin bound
The Margolus-Levitin bound is a bound on the processing rate of a system. A (classical) computation must take at least . To decrease the minimum time to take a computational step, you must increase the energy involved. If this energy is contained in a finite volume, then you suffer the same problem arising from the Bekenstein bound; eventually the Schwarzchild radius catches up to you. It’s important to note, however, that this assumes no access to quantum memory.
2.3 Gauge fields and their symmetries
A gauge field is an approach to formalising what we mean by a physical field. We think of a gauge field as data “sitting above” the points in a space (a principle bundle), as shown in Figure 3, combined with a transformation rule (a connection) to ensure the data transforms sensibly with a change of test point.[baez1994gauge] Gauge fields typically have a very useful property: We can reparameterise them to be more convenient to work with. For example, we can change the electric and magnetic potentials in an electromagnetic field by a consistent transformation derived from arbitrary data, yet the exact same physical effects will occur. This is called gauge fixing.
2.4 Topological defects
Topological defects are an important notion in a number of scientific and engineering fields. Intuitively speaking, they are imperfections in what we call an ordered medium; roughly, something which has a consistent rule for assigning values to a point in space, like a gauge field.[merminTopologicalDefectsOrderedMedia1979] Some familiar examples include:
-
Grain boundaries in metals, consisting of disruptions in the regular lattice placement of the constituent atoms,
-
Singularities and vortices in vector fields, such as black holes and tornadoes, respectively, and
-
Gimbal lock is a topological defect in the Lie algebra bundle of rotations over the Euler angles.
In general, defects arise when there are non-trivial homotopy groups in the principle bundle of the gauge field; that is, when you can make a loop in the bundle space, but you can’t shrink it down to a single point.
These are non-perturbative phenomena, arising due to the very nature of these systems. These manifest themselves as (psuedo-)particles or higher branes, such as membranes. However, due to the risk of confusion between the homonyms “brain” and “brane”, we will call them all particles, regardless of their dimension.
2.4.1 Dynamics
What makes defects particularly interesting is that they can evolve over time. Dynamical behaviour in the base space can lead to dynamical behaviour of defects; for example, the movement of holes in a semiconductor, or the orbit of a small black hole around a larger one.
2.4.2 Temperature and spontaneous symmetry breaking
Another common phenomenon is that there is a temperature associated with the defect, which controls both its dynamics and its very existence. The obvious example of a temperature parameter is thermal temperature; if you melt a bar of iron, you won’t have any more crystal structure to be defective. A more challenging one is temperature in superconductors or superfluids; raise it too high and Cooper pairing no longer works, so you lose superconductivity and superfluidity, respectively. These drastic changes in behaviour are phase changes.
Consider the opposite situation: Freezing a liquid to a solid. If you lower the temperature to a point where defects start appearing, then those defects have to actually appear somewhere. The configuration of the defects will be just one possible configuration out of a potentially astronomically large space of possibilities (such as the precise atoms involved in a grain boundary), chosen at random. This phenomena is called spontaneous symmetry breaking, and is responsible for the masses of elementary particles, via the Higgs mechanism.
2.5 Category theory and its slogans
Category theory is a foundation of mathematics, as an alternative to set theories like ZFC. Instead of focusing on membership, category theory focuses on transformation and compositionality. Its methodology for studying an object is to look at how that object relates to other objects. There are many excellent introductions to category theory aimed at different audiences (such as [fongInvitationAppliedCategory2019, riehlCategoryTheoryContext2016, lawvereConceptualMathematicsFirst2012, spivakCategoryTheorySciences2014]). These introduce the basic notions and show the versatility of category theory in a wide range of fields.
2.5.1 Categories and functors
A category is a collection of objects, and a collection of morphisms between those objects. For example, the category of sets has sets as objects, and functions as morphisms. Different classes of category allow different constructions; for example, monoidal categories allow pairing of objects. You can translate between categories with functors, which are themselves just morphisms in the category of categories.
2.5.2 Adjunctions
A common situation in mathematics arises when you want to translate back and forth between two types of structures via a pair of functors, but the structures aren’t isomorphic. The best we can do is approximate an isomorphism; this is called an adjunction.[riehlCategoryTheoryContext2016]
2.5.3 Enrichment
Enriched categories are a generalisation of plain categories. Whereas in standard category theory, the morphisms between objects are described by their homset, a category enriched in a monoidal category has hom-objects from .[fongInvitationAppliedCategory2019, pages 56,139] Ordinary categories are just categories enriched over the category of sets. [fongInvitationAppliedCategory2019, page 87]
2.5.4 An approach to science and mathematics
Let us take a step back for a moment, and consider
how science should be done. Bill Lawvere offers this:When the main contradictions of a thing have been found, the scientific procedure is to summarize them in slogans which one then constantly uses as an ideological weapon for the further development and transformation of the thing. [lawvereCategoriesSpaceQuantity1992]
Let us take this approach. What are some slogans for category theory?
-
Better to have a nice category of objects than a category of nice objects [corfieldModalHomotopyType2020]
-
Dialectical contradictions are adjunctions [lawvereCategoriesSpaceQuantity1992]
-
Find and characterise the principal adjunction of any problem
We shall now provide some motivation for our work.
3 Motivation
We motivate our approach by considering arbitrary cognitive systems in terms of physics. First, we consider the consequences of a physically dense cognitive agent; second, we consider the role self-parameterisation in conceptual spaces.
3.1 The high-density regime of cognition and its consequences
Let us consider the high-density regime of cognition; that is, where the cognitive agent’s radius is close to its Schwarzchild radius. We shall assume that the agent can learn from its experiences. As the agent experiences the world, it may learn new information,111These experiences may allow generalisation of existing information and thus discarding of the individual cases; but over time, there will be a point where generalisations can’t be reasonably made. which requires storage in the agent’s memory. Consider this increase in information in the context of Equation 1: more and more energy will be required to store it in a given volume. This, in turn, results in an increase in the Schwarzchild radius. As we are already close to our Schwarzchild radius, and we presumably don’t want to collapse into a black hole,222It is ill-advised to become a black hole, if only to be able to affect the external environment in a reasonable manner (but see [blackHoleModelOfComputation]) we must increase our containment radius; this results in both the information storage and processing gadgets spreading out. This results in the average distance between two pieces of information in the containment volume increasing, which in turn results in increased average propagation delay to shunt information around. Thus, the average serial processing rate will decrease — in other words, learning can make you slower. This isn’t even taking into account gravitational time dilation, which will only further enhance the slowdown relative to an outside observer, as well as the energy required to interact with the external environment.
3.1.1 Setting
We consider an agent comprised of an incompressible fluid which is spherically symmetric and non-rotating, and electrically neutral, resulting in the interior Schwarzschild metric.[Schwarzschild:1916ae] In Schwarzschild coordinates, the line element is given by
(2) | ||||
where
(t,r,θ,ϕ) | The spacetime coordinates in convention |
---|---|
R_S | The Schwarzschild radius of the agent |
R_g | The value of at the boundary of the agent, measured from the interior. |
3.1.2 Slowdown due to propagation delay
If we consider an instantaneous path through our agent – that is, one where – we can see from Equation 2 that when is close to , the radial term blows up as the path approaches the surface; indeed, we can see that the distance required to be travelled to travel to the surface when is infinite.
3.1.3 Slowdown due to gravitational time dilation
The time dilation experienced by the agent compared to an observer at infinity increases with density. The scaling compared to the outside observer is given by
(3) |
valid for .
3.1.4 Extra energy required to interact with the external environment
Not only will the extreme gravitational field reduce rate of processing, it will also increase the energy required to interact with the external environment. Consider the case of our agent trying to communicate with a distant observer via some protocol based on the exchange of radio signals. If there is any specification of frequency of the carrier signal, then we must account for the gravitational redshifting of our agent’s emissions. This redshifting results in longer wavelength signals; equivalently, lower frequency signals. Because the energy of the photons comprising the signal is directly proportional to its frequency, this can be stated in terms of lower energy signals. Thus, in order to ensure outbound transmissions meet the frequency specifications of the protocol, the photons must be emitted with extra energy to overcome the extreme curvature of the gravitational field.
In the simplest case, assume the exterior Schwarzschild metric.[Schwarzschild:1916uq] To signal an outside observer at infinity with a photon of a given energy emitted at a radius , the photon will have to be emitted with an energy increased by a factor of
(4) |
3.1.5 Some tasks are too complex to be solved in a given volume
We have thus seen that there is a fundamental physical trade-off between learning, and the time it takes to solve an arbitrary task. This immediately implies a number of things:
-
There is a trade-off between the general capability of an agent, and its average reactivity,
-
Some complex tasks with a time requirement are simply too complicated for any agent to solve the task in a given volume of space,
-
As you increase the average capability of a group of agents, less of them will be able to fit in a given volume.
3.1.6 Lattice of cognitive skills
We can use these limits to create various partial orders:
-
We can rank tasks by the minimum and maximum volume of agents capable of solving it,
-
We can rank agents by their capabilities with respect to physically-embodied Busy-Beaver machines,
-
We can rank space-time volumes by combinations of the above
3.2 Self-parameterisation of behaviour
Let us consider our second motivational example: the self-parameterisation of behaviour. The behaviour of an agent depends on its accumulated knowledge up until that point, in particular, its procedural knowledge. Its knowledge, in turn, depends on its past behaviour acquiring the knowledge. This self-referential quality appears in physics, in the form of metric field theories. Consider, again, general relativity: The metric tensor controls the dynamics of a system, and the distribution of that system determines the metric tensor. The previous example considered the effects of general relativity on a cognitive agent, but the agent didn’t particularly play an active role in our considerations.
But, why consider only general relativity? Can we consider other metric field theories? For that matter, do we even have to consider physical field theories?
4 Goal
Consider the position our motivational examples has put us in: Instead of considering cognition as an embodied agent comprised of wetware or hardware, we just re-enacted the old “assume a spherical cow in a vacuum” joke. We didn’t bother trying to determine how a dense sphere of incompressible fluid could embody a cognitive agent; it turned out we didn’t even need to! We just assumed that the translation from an abstract model of cognition, to the concrete sphere of cosmic horror, and then back again to an abstract model preserved the semantics of cognition, whatever they may be.
4.1 Taking advantage of tools from other disciplines
Let us now state what we wish to do, in order to find a solution. We wish to use the mathematical tools from other disciplines in order to study cognition. How is this typically done?
4.1.1 The physics of cognition, by way of biology
As shown in Figure 2, we translate through a chain of “domains” of science before reaching physics. Each translation introduces an extraordinary amount of complexities that exist solely due to choosing a specific concrete realisation at each stage. While, of course, neurology and biology are incredibly important subjects of study in humans – if only for their phenomenological observations, not to mention the pathophysiological importance – they are complicating factors in the study of the phenomena of cognition in-and-of-itself.
4.1.2 We only need to find nicer realisation morphisms which preserve behaviour
Instead of considering the familiar realisation via neurobiology, the realisation transformation only needs to preserve the structures and behaviours of interest; that is, we only need to find toy models. To do this, we need to find a nice class of “cognitive categories”, and adjunctions out of them, as shown in Figure 4. A slightly more category-theoretic diagram is shown in Figure 5.
4.2 Cognitive categories
If we are going to use category theory to study cognition, then we ought to specify what cognitive categories actually are. This is an open problem, but some plausible requirements include:
-
Subcategories of cognitive categories ought to include Turing categories, in order to capture computational behaviour
-
There should be an opportunity for enrichment in a category of conceptual spaces.
5 Example: Topological defects in conceptual spaces
Let us consider the metric field analogy a little deeper. Since many conceptual spaces of interest have a genuine geometric structure, and we can form fibre bundles over them which allow parallel transport, then we can at least consider gauge fields.
5.1 Cognitive gauge fields
There are a lot of potential options for cognitive gauge fields. Some are generic, which might apply to any conceptual space; whereas some might only apply to certain classes of conceptual space. We assume some mechanism for smooth interpolation in cases of noisy discrete data.
Some example plausible generic gauge fields include:
-
The activation value of a specific memory is related to its probability of being recalled due to a query. The higher the activation value, the more it will be recalled. These values can spread to adjacent memories. Many “forgetting” mechanisms are based on forgetting memories with low activation value.
-
We can consider the emotional state of an agent when storing the memory, decomposed into a set of valuation and valence affect dimensions.
-
We can also consider the subjective importance of a memory, which is separate to its activation; an example in procedural memory is a rule saying to not completely flood a room with water if there are people in it. It’s not very likely to be subject to recall, but it’s certainly an important rule!
Some plausible non-general gauge fields might be:
-
Trustworthiness of data gathered socially.
-
Difficulty of tasks and behaviours. This can evolve based on experience and better understanding of the situation.
5.2 Defects and their dynamics: Particles of thought
Some cognitive gauge fields will have non-trivial topology, resulting in topological defects. As the underlying conceptual spaces evolve, whether contents or the topology itself, we might observe dynamical evolution of these defects. Thus, we might (not-so-metaphorically) call these “particles of thought”.
Leaving aside whether such a thing has a meaningful interpretation,333This almost certainly depends on the conceptual space and gauge field involved. we can at least ask more about the nature of such things.
5.3 Production rules as potentials
Can we encode production rules as a something akin to a potential? If so, what actually generates those potentials?
5.4 Transmission of influence and cognitive gauge bosons
How is the influence of a gauge field propagated? Is it wave-like, as in classical physics, or are there ’force-carrying particles’, like gauge bosons in particle physics?
5.5 Phase changes
Are there phase changes? That is, is there some order parameter where the particles are only manifest in a given range of parameter values? Does this relate to switching problem spaces?
5.6 Cognitive event horizons
Are there “cognitive event horizons”, where there are boundaries from which the effects of a particle can never escape, not just effects on its surrounding particles or memories, but on behaviour too? If so, how do they form? How do they evolve? Are there mechanisms to affect the underlying topological structure of a conceptual space? Is there a “maximum resolution” to some conceptual spaces as a result, analogous to the Schwarzschild radius in general relativity?
6 Discussion
There is an important consideration to be made when talking about theoretical modelling: we must stress when we are only talking about effective theories; that is, theories which model the effects of cognition, but do not make any claims as to whether there is any actual causal connection with the reality of cognition. The ontological status of particles of thought certainly rests on the status of conceptual spaces, and gauge fields over them. Further, assuming they are ontologically valid, whether they actually have any causal role requires considerable further study.
6.1 Future research
We have a number of interesting questions prompted by our approach:
-
How might we study the flow of information throughout an agent’s lifetime? Can this be linked to “particles of thought”?
-
How can an agent perceive the Self? Does an agent who is aware of itself encounter, for example, the Barber’s paradox? How can it reason about things which are not true? What are the connections with paraconsistent logic?
-
How can we more fruitfully take advantage of topological data analysis?
-
How does analogy work in our approach? Does it rely on the homology structure of the relevant conceptual spaces having particular forms?
-
What is a good model for various learning mechanisms; both of mental content itself, and of new conceptual spaces?
-
What group structures over different conceptual spaces can we find to yield different field theories?
-
Behaviour and cognition depends heavily on emotional state.[rosalesGeneralTheoreticalFramework2019] What is the most appropriate conceptual space to represent this, and do they have any psuedoparticles?
Comments
There are no comments yet.