Cognitive scientists routinely invoke subcapacities in decompositional efforts to reverse-engineer fully-fledged capacities of minds, brains and machines cumminsHowDoesIt2000; milkowskiReverseengineeringCognitiveScience2013. For instance, speech processing is presumed to decompose into, among other things, segmentation and decoding, and action understanding into parsing, predicting and goal inference. These subcomputations are thought to tackle certain problems that the cognitive system faces to behave appropriately in the world.
Problems that originally show up in one domain (e.g., speech processing) are subsequently encountered in other domains (e.g., action understanding), and so the conceptual apparatus naturally carries over. As a result, cognitive scientists may come to view, e.g., the problem of segmenting speech as analogous to the problem of parsing actions. Researchers can then transfer ideas across the domains, adopting and adapting similar subcomputations in their explanations of the different capacities. What is passed along, however, will include latent (and possibly mistaken) notions about the computational properties of these problems as well. For instance, if a cognitive scientist believes that the search space of speech segmentation is large (combinatorially complex) and that this makes the problem hard, then by analogy, the same can be inferred about the parsing problem in action understanding.
Once such initial framing of a cognitive (sub)capacity is adopted, it completely shapes the kinds of empirical questions that appear relevant and in so doing determines the course of research programs across disciplines and cognitive domains. Crucially, the assumptions that gave rise to the initial framing are seldom examined formally, and since they are taken for granted as background commitments, empirical tests are not designed to bear on them. These foundational oversights can sidetrack researchers into directions that will be largely immune to empirical corrective feedback later on.
To illustrate how crucial it is to formally assess the validity of intuitive assumptions about problem properties, and what can go astray if one doesn’t, we undertake here a formal examination of an example subcapacity. Our case study is Segmentation111Segmentation relates closely to computations whose names vary depending on time period, cognitive domain, and theoretical framework: chunking, sampling, discretization, integration, grouping, packaging, quantization, sequencing, segregation, parsing, temporal pooling, temporal gestalt, boundary placement, temporal attention.. This subcapacity figures ubiquitously in explanations of real-world cognitive capacities such as speech recognition, music perception, active sensing, event memory, temporal attention, action processing, and sequence learning. We focus on two classes of assumptions about its computational properties: i) the search space is excessively complex and ii) this makes the segmentation problem intrinsically hard.
To formally assess the theoretical viability of these assumed properties, we develop a formalization of the (intuitive) segmentation problem at Marr’s marrVisionComputationalInvestigation1982 computational level. Next, we submit this formalization to a mathematical analysis to assess the size of its search space, its computational hardness, and its possible sources of complexity using tools from computational complexity theory gareyComputersIntractabilityGuide1979; aroraComputationalComplexityModern2009; vanrooijCognitionIntractabilityGuide2019. As our results may run counter to intuition, we end with a word of caution regarding the general non-intuitiveness of the computational properties of hypothesized cognitive problems.
2 Conceptualization of segmentation
In order to rigorously examine computational assumptions, we need a mathematical formalization of the problem that can be submitted to further analyses. This computational-level model, in turn, should capture key aspects of the theorized cognitive capacity. To that end, in this section we synthesize conceptualizations of the segmentation problem as it appears in various cognitive domains.
2.1 Informal definitions: segmentation as a fundamental subcomputation
“How the brain processes sequences is a central question in cognitive science and neuroscience” jinLowfrequencyNeuralActivity2020. A substantial amount of information available to the cognitive system is “continuous, dynamic and unsegmented” zacksHumanBrainActivity2001. The purpose of the segmentation process is, then, “to generate elementary units of the appropriate temporal granularity for subsequent processing” giraudCorticalOscillationsSpeech2012. Succinctly, “[t]he central nervous system appears to ‘chunk’ time” poeppelAnalysisSpeechDifferent2003.
Several subfields of the cognitive and brain sciences have proposed segmentation as a key subcomputation. Active listening (cf. active sensing) casts it as “the selection of internal actions, corresponding to the placement of […] boundaries” fristonActiveListening2021, “to sample the environment” poeppelSpeechRhythmsTheir2020. Event cognition similarly defines it as “the process of identifying event boundaries […] a concomitant component of normal event perception” zacksHumanBrainActivity2001. In episodic memory, it is “the process by which people parse the continuous stream of experience into events and sub-events [for] the formation of experience units” jeunehommeEventSegmentationTemporal2018. Central to music perception, it features as determining the “perceptual boundaries of temporal gestalts” tenneyTemporalGestaltPerception23 and “entails the parsing into chunks” farboodDecodingTimeIdentification2015; tillmannMusicLanguagePerception2012. The speech recognition literature describes it as the core process of “segmenting the continuous speech stream into units for further perceptual and linguistic analyses” tengSpeechFineStructure2019, where it “allows the listener to transform [the] signal into segmented, discrete units, which form the input for subsequent decoding steps” poeppelSpeechRhythmsTheir2020. In action processing, “[a] fundamental problem observers must solve […] is segmentation […] Identifying distinct acts within the dynamic flow of motion is a basic requirement for engaging in further appropriate processing” baldwinSegmentingDynamicHuman2008.
Such ubiquitousness has been suggestive that the capacity “appeals to general principles the brain may use to solve a variety of problems” fristonActiveListening2021; himbergerPrinciplesTemporalProcessing2018. “[M]any sequence-chunking tasks share common computational principles. [E.g.,] to find and encode the chunk boundaries” jinLowfrequencyNeuralActivity2020. Segmentation as a subcomputation appears across processing hierarchies as well, even when the world is relatively static: “[it] exists at multiple layers within a given problem” wybleTemporalSegmentationFaster2019. The downstream operations on segments that partially determine optimal segmentation play similar roles but vary with cognitive domain and modeling framework.
Segmentation, concisely, is then a fundamental subcomputation whose requisite role across cognitive domains and processing hierarchies is to determine, given a sequence representation, the optimal boundary placement with respect to a downstream computation over segments.
3 Formalization of segmentation
A succinct, yet informal, definition of segmentation can be stated by verbally specifying the inputs and outputs of the conjectured subcomputation.
Input: A sequence and a downstream process that, for any given segment of the sequence, can evaluate its quality relative to domain-specific criteria.
Output: The best222Without loss of generality, here ‘best’ could be replaced by ‘good enough’, and our formal results would still apply. segmentation of the sequence with respect to criteria relevant for the downstream process.
With this sketch in mind (see Fig. 1 for a schematic), we develop the formal definition of the computational-level model.
We envision an input sequence that captures the idea of a time-ordered representation the cognitive system must work with. Its origin could be sensory encoding at the periphery or deeper, more elaborate processes alike (e.g., an encoding of the acoustic envelope of speech or music, or a compressed representation of a visual scene). As instances of the segmentation problem appear throughout processing hierarchies, their inputs vary in origin and nature. We model the sequence with according generality. Next, we pin down the notion of a downstream cognitive process that computes over segments (e.g., a decoder that maps speech segments to phonemes, or a module that maps scene segments to action meanings). Our formalization is agnostic as to what these domain-specific processes, and theoretical frameworks used to model them, might be. We aim for generality and simply model, with a function (over a possibly infinite domain) available at the input, the idea that the process is capable of guiding the placement of boundaries. This is achieved by reporting back some (discretized) aspect of its performance
(e.g., label probability, likelihood w.r.t. generative model, depending on framework). The desired output — a useful segmentation scheme — is modeled as a collectionof disjoint segments jointly making up the input sequence, whose overall appropriateness with respect to the downstream process is optimal. These modeling choices yield the following formalization.333For succinctness, we omit the set notation for sequences and subsequences, , and we slightly abuse notation when using set operations directly on tuples.
Input: a finite sequence of length , with and a scoring function that maps contiguous subsequences to a positive value .
Output: a segmentation of into contiguous subsequences, , where segments are disjoint, , and their concatenation yields the original sequence, , such that its overall value is maximum.444 We model segmentation as an optimization problem to keep with conceptual constraints but without loss of generality. This modeling choice makes our results an upper-bound on the complexity of the problem.
4 Assumptions about segmentation
To determine the course of our analyses, we survey views on the computational properties of the segmentation problem. We illustrate with examples and synthesize core intuitions.
4.1 Problem properties: segmentation as a computational challenge
4.1.1 Hardness and complexity.
Segmentation problems have been widely assumed to be computationally challenging. This is evidenced in explicit statements and in the ‘solutions’ researchers propose after taking onboard certain beliefs about hardness. To illustrate: “Speech recognition is not a simple problem. The auditory system must parse a continuous signal into discrete words” fristonActiveListening2021. “It is hard for a brain, and very hard for a computer” poeppelAnalysisSpeechDifferent2003. “[S]egmentation requires inference over the intractably large discrete combinatorial space of partitions.” franklinStructuredEventMemory2020a.
4.1.2 Sources of complexity.
As is evident in researchers’ descriptions, the hardness is attributed to the (presumed) combinatorial explosion involved in the number of possible segmentation schemes — the size of the problem search space is informally taken as the source of computational complexity. Again, to illustrate: “Where should these candidate boundaries be placed? In an extreme case, we could place boundaries at every combination of time points […] but that would be computationally inefficient given that we can reduce the scope of possibilities” fristonActiveListening2021. “The problem would be enormously complicated by the presence of so many candidates […]” brentSpeechSegmentationWord1999.
4.1.3 Solutions for complexity.
Arguably as a consequence of coupling these intuitions with additional assumptions, the effectiveness of certain solutions has been taken for granted. “From the computational perspective, the aim of research in segmentation […] is to identify mechanisms [that] reduce these computational burdens by reducing the number of candidate[s]” brentSpeechSegmentationWord1999. This position has motivated the search for bottom-up segmentation cues or top-down biases (e.g., priors) that would achieve, among other things, such a narrowing down <e.g.,¿tengSpeechFineStructure2019,fristonActiveListening2021. “We suggest a different role [of cues] in which they are part of the [segmentation] (rather than decoding) process” ghitzaRoleThetaDrivenSyllabic2012. For instance, researchers may observe environmental dingTemporalModulationsSpeech2017 and neural tengConcurrentTemporalChannels2017; tengThetaBandOscillations2017 regularities suggestive of segment-size constrained segmentation processes poeppelSpeechRhythmsTheir2020; poeppelAnalysisSpeechDifferent2003.
4.2 Core assumptions
This survey reveals a core set of intuition-based assumptions about the computational properties of segmentation:
Real-world sequences (e.g., speech, music, scenes, actions) and internal representations alike (e.g., memories of experiences) are “complex, continuous, dynamic flows”.
The cognitive system needs to make use of discrete representations of segments that are appropriate (size- and content-wise) for downstream tasks.
The problem is “hard” — the obstacle being that there are “too many” possible segmentations of a given sequence.
Cognitive systems must reduce the possibilities somehow, e.g., via bottom-up cues and/or top-down biases.
5 Computational complexity of segmentation
It is generally non-obvious what problems are genuinely (as opposed to merely apparently) hard, which refinements will render a model tractable, or which restrictions will effectively reduce a search space. Intuitions about computational properties of problems are frequently mistaken, hence need to be validated against formal analyses vanrooijIdentifyingSourcesIntractability2008a. This section presents a complexity analysis in two parts according to the assumed properties they examine: search space size, and problem hardness.
5.1 Search space of segmentation
We analyze the search space size as a possible source of hardness by envisioning a simple brute-force algorithm. If the number of candidate solutions grows polynomially (i.e., upperbounded by , where is the sequence length and is some constant), then such an algorithm would be tractable. We describe this growth through combinatorial analysis; first for the unconstrained problem and then including various theoretically motivated constraints.
5.1.1 Unbounded parts.
When the size of the segments is not constrained other than by the length of the sequence, i.e., , all boundary placements are possible. Notice there is a bijection between binary strings of length and boundary placements in sequences of length (Fig. 2).
Since the number of possible binary strings of length is given by , the number of possible segmentations that use unbounded parts grows as (i.e., exponentially).
5.1.2 Segmentation as integer composition.
In order to incorporate various constraints in combinatorial analyses, we draw an analogy between segmentation and integer compositions (Fig. 3). This enables us to take an analytic combinatorics approach flajoletSedgwick2009 to the latter and leverage the results to infer properties of the former.
Definition 1 (Integer composition).
A composition of an integer is an ordered list of positive integer parts , such that .
To obtain the growth rate for various restricted cases, we derive generating functions for each, whose coefficients count the number of compositions, and submit them to analysis based on the following lemma.
Lemma 1 (Growth rate of the coefficients of a rational function).
Let be a rational function with and assume P(x) and do not have any roots in common. The general form of the coefficients is , where is the exponential growth factor and is the subexponential growth factor. Then the exponential growth rate of the sequence of coefficients is equal to , where is the root of of smallest modulus <for proof, see¿[Theorem 7.10]bonaCombinatorics2016.
5.1.3 Lower-bounded parts.
We consider integer compositions involving parts , with .
The number of -restricted integer compositions of grows exponentially with .
For each part the choice is among the positive integers (), so in terms of the generating function we have , where is an integer constant. By factoring and using the closed form of the geometric series we write . For a -part composition we thus have , so for compositions of any number of parts we sum555Note that summing from or from to does not affect the denominator of the resulting generating function, therefore the results will be robust to these variations. over and write , which similarly as above can be simplified until we arrive at the closed form that completes the construction:
Having derived the generating function, we are interested in the growth rate of the coefficients . Let , and note that and . Since and for some small positive , by the intermediate value theorem, it follows that there always exists a root that satisfies . By Lemma 1 the exponential growth factor is , for some . ∎
5.1.4 Upper-bounded parts.
We consider integer compositions involving parts , with .
The number of -restricted integer compositions of grows exponentially with .
For each part the choice is among the positive integers (), so in terms of the generating function we have , where is an integer constant. Factoring out and using the identity , we write . For -part compositions we raise to the -th power and for compositions of any number of parts we sum over all , which yields , and finally by using the geometric series and simplifying we complete the construction:
Having derived the generating function, we are interested in the growth rate of the coefficients . Following the intermediate value theorem, let and note that , so for some small positive , and , for any as above, hence there always exists a root located in the interval . By Lemma 1 the exponential growth factor is , for some ∎
5.1.5 Doubly-bounded parts.
We consider compositions involving parts parts , with .
The number of -restricted integer compositions of grows exponentially with .
For each part the choice is among the positive integers (), so in terms of the generating function we have . Factoring out and using , we write . For compositions of any number of parts, we have Using the geometric series and simplifying, we arrive at the final form:
Having derived the generating function, we are interested in the growth rate of the coefficients . Since , with , we write the polynomial in the denominator which we then rewrite as . Note that has a root in if and only if has such a root. We have that and furthermore , and recall . By the intermediate value theorem, since and for some small positive , there always exists a root of in the open interval for any integers constrained as above. By Lemma 1 the exponential growth factor is , for some . ∎
5.2 Hardness of segmentation
We showed that intuitive constraints do not render brute-force segmentation tractable. One may be tempted to conclude that this demonstrates the conjectured hardness of the segmentation problem. However, in this section we present a theorem that contradicts this conclusion. The proof builds on the technique of (polynomial-time) reduction aroraComputationalComplexityModern2009; gareyComputersIntractabilityGuide1979; vanrooijCognitionIntractabilityGuide2019.
(Polynomial-time reducibility) Let and be computational problems. We say is polynomial-time reducible to if it is possible to tractably transform instances of into instances of such that solutions for can be easily transformed into solutions for . Note that this implies that if a tractable algorithm for exists, it could be used to solve tractably (namely, via the tractable transformation, called the reduction).
We present such a reduction from the problem segmentation to a problem in graph theory. On the way, we will have introduced an alternative way of thinking about segmentation at the computational and algorithmic levels.
segmentation is tractable (polynomial-time computable) in the absence of constraints.
We will show that, given an arbitrary instance of the segmentation problem, we can tractably construct an instance (with the correct associated output) of a target problem which is itself tractably computable. To begin, we introduce a class of graphs which we use as a stepping stone.
Definition 3 (Interval Graph).
An interval graph is an undirected graph built from a collection of intervals , here , by creating one vertex for each interval and an edge whenever the corresponding intervals have a non-empty intersection: .
Algorithm 1 involves systematically generating all legal segments, computing and negating their weights, checking their pairwise overlap, and using this to construct a graph. We call this object a segment graph.
Consider the time complexity of Algorithm 1. The elementary instructions are the weight computation (line 8), appending (lines 9-10, 16), and set intersection (line 15); all of which are polynomial-time computable ( is assumed to be). We focus now on the number of implied iterations. The loops defined in lines 5-6 yield iterations (the number of possible segments), given by a polynomial:
The loops defined in lines 13-14 yield a number of iterations equal to the number of segment pairs , given by the binomial coefficient with and , which grows as a quadratic in (i.e. -degree polynomial in ).
This algorithmic analysis demonstrates that BuildSegmentGraph (Algorithm 1) is polynomial-time computable.
Consider next the correctness of Algorithm 1. We will show that a segment graph encodes the properties of candidate solutions to an instance of segmentation. For this, we need the following definitions.
Definition 4 (Independent sets and maximality).
Let denote a graph. We call a vertex set an independent set if there exist no two vertices such that . Such a set is said to be maximal if there exists no vertex that can be added to without breaking the independence.
Definition 5 (Dominating sets and minimality).
Let denote a graph. We call a vertex set a dominating set if for all , either or there is an edge for some . Such a set is said to be minimal if there exists no vertex that can be removed without breaking the dominance.
By construction, a legal segmentation (i.e., a collection of disjoint segments whose concatenation yields the original sequence) is guaranteed to be represented within the segment graph as a subset of vertices with two properties:
maximal independence: vertices are pairwise non-adjacent because segments in a segmentation should be disjoint; since the segments should span the sequence, adding any vertex breaks independence.
minimal dominance: vertices in the graph are either in the subset or adjacent to one of its elements because once a segment subset spans the sequence, any other segment is guaranteed to overlap; since the segments should be disjoint, removing any vertex breaks dominance.
How segment graphs make the structure of the original sequence problem transparent is illustrated in Fig. 4.
A general feature of dominance and independence on arbitrary graphs is useful:
An independent vertex set in a graph is a dominating set if and only if it is a maximal independent set. Any such set is necessarily also a minimal dominating set. <cf.¿berge1962theory,goddardIndependentDominationGraphs2013.
It follows from the above and Lemma 2 that if a vertex subset in a segment graph is independent and dominant, then it is a candidate solution (i.e., valid segmentation). A feasible solution has, additionally, minimum weight among candidates: it is a minimum-weight independent dominating set. With this, we introduce the formal graph problem we reduce to.
minimum-weight independent dominating set
Input: A vertex-weighted graph . For each we have a weight .
Output: An independent dominating set such that is minimum.
So far, we have established that, given an instance of segmentation, we can construct, in polynomial time by Algorithm 1, call it , a corresponding instance of Minimum-weight independent dominating set. This demonstrates the validity of the reduction and we now finally consider the tractability of the problems.
Though the problem of finding minimum-weight independent dominating sets is NP-hard in general and remains so in several special cases gareyComputersIntractabilityGuide1979; liuIndependentDominatingSet2015, the following input restriction is relevant.
minimum-weight independent dominating set is polynomial-time computable provided the input graph is an interval graph. <for proof, see¿[Theorem 2.4]changEfficientAlgorithmsDomination1998.
Recall that the restriction required by Lemma 3 is guaranteed by our reduction. Hence, we conclude segmentation is tractably computable, which completes the proof. ∎
Computational feasibility is a widespread concern that motivates choices in the framing and modeling of biological and artificial intelligence. While implicit or informal assumptions abound, the reality may turn out to be counterintuitive as they are examined formally. Here, we undertook a formal examination of the existing computational assumptions about Segmentation. Using complexity-theoretic tools, we mathematically proved two sets of results that run counter to commonly-held assumptions: 1) the search space is either not large to begin with or it is large but placing intuitive constraints does not alleviate the issue; and 2) a computational model of segmentation that formalizes its conceptualization across domains is tractably computable in the absence of widely-adopted constraints to address the assumed hardness.
Beyond our proofs, we set the groundwork for further refinements of segmentation theory and its computational analyses: a) we contributed a formalization of the computation that satisfies a domain-agnostic specification; b) we illustrated the relationship between segmentation and integer compositions, which makes the search space amenable to asymptotic analyses; and c) we built a bridge from the problem as originally defined on sequences to the mathematics of graphs, which opens up alternative formalisms to model it and to think about it algorithmically. A desirable consequence of translating problems between formal domains is that structure which was originally hidden from view may become visible.
Our results challenge existing intuitions about hardness of the segmentation problem and its sources of complexity, and by extension question the motivation of proposed solutions and their associated empirical research foci. For instance, concerns about search space size and what mitigates it may be misplaced. The space of possible segments is not exponential to begin with; the space of segmentations is. However, the bounds on segment size are, neither individually nor combined, a source of exponentiality. Left unexamined, this may still appear to support the conjectured hardness of the problem. But our tractability proof challenges this intuitive conclusion. It demonstrates that no assumptions about bottom-up segmentation cues or top-down biases on segment properties are necessary to make the formal problem tractable. These proofs run counter to the computational efficiency concerns that partially motivate segmentation theories. For instance, proposals that argue from minimal units of representation <
cf.¿poppelHierarchicalModelTemporal1997, temporal integration limits of neuronal populations<cf.¿overathCorticalAnalysisSpeechspecific2015, intrinsic oscillatory timescales <cf.¿ghitzaRoleThetaDrivenSyllabic2012,wolffIntrinsicNeuralTimescales2022, bottom-up segmentation cues <e.g.,¿giraudCorticalOscillationsSpeech2012, and top-down biases on candidate search <e.g.,¿fristonActiveListening2021, which to some extent build on the supposition of problem hardness, search space size, and various sources of complexity. This suggests that intractability concerns, if any, might be better placed, for instance, on the processes guiding segmentation rather than the boundary placement itself.
Together, the results proven here caution against intuitive notions about the properties of computational problems driving empirical programs, and demonstrate the need and benefits of critically assessing their soundness. Whenever intuitions are challenged, this enables researchers to either slightly or entirely redirect efforts as ideas shift regarding what evidence is relevant to collect. For instance, if researchers believe that a certain problem is computationally hard and that some set of neural and environmental regularities might speak to constraints that make it tractable, then they would be inclined to look for those regularities that satisfy such a requirement. If, however, the original belief is removed, the target regularities or the kinds of experiments that are adequate to test their putative role might be different.
We close with a similar word of caution about interpreting our results. These are to some degree tied to the particular formalization we put forth. While modeling choices were motivated and they bear some generality, alternative theoretical commitments are conceivable. For instance, an extended model could allow for multiple unsegregated high-dimensional input streams; it is an open question whether it would have different complexity properties. We view our analyses not as the last word on the computational complexity of segmentation but rather as initial words in a conversation with a sound formal basis.
We thank the Computational Cognitive Science group at the Donders Institute for Brain, Cognition, and Behaviour for discussions, Nils Donselaar for invaluable feedback on a previous version that helped improve the manuscript, and Ronald de Haan for comments on future directions of this work. We thank four anonymous reviewers and one anonymous meta-reviewer for thoughtful comments, and Reviewer 1 in particular for a comprehensive, constructive and educational review. FA thanks David Poeppel for support and discussions on auditory segmentation. TW was supported by NSERC Discovery Grant 228104-2015.