A biomimetic basis for the perception of natural sounds

05/26/2020 ∙ by Habib Ammari, et al. ∙ ETH Zurich 0

Arrays of subwavelength resonators can mimic the biomechanical properties of the cochlea, at the same scale. We derive, from first principles, a modal time-domain expansion for the scattered pressure field due to such a structure and propose that these modes should form the basis of a signal processing architecture. We investigate the properties of such an approach and show that higher-order gammatone filters appear by cascading. Further, we propose an approach for extracting meaningful global properties from the coefficients, tailored to the statistical properties of so-called natural sounds.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Biomimetic signal processing

Humans are exceptionally good at recognising different sound sources in their environment and there have been many attempts at designing artificial approaches that can replicate this feat. The human auditory system first amplifies and filters sounds biomechanically in the peripheral auditory system before processing the transduced neural signals in the central auditory signal. With a view to trying to mimic this world-beating system, we consider using an artificial routine with a similar two-step architecture: physical filtering followed by additional processing stages.

There has been much attention paid to designing biomimetic structures that replicate the biomechanical properties of the cochlea [1, 2, 3, 4, 5, 6, 7]. At the heart of any such structure are graded material parameters, so as to replicate the spatial frequency separation of the cochlea. In particular, a size-graded array of subwavelength resonators can be designed to have similar dimensions to the cochlea and respond to an appropriate range of audible frequencies [1]. An acoustic subwavelength resonator is a cavity with material parameters that are greatly different from the background medium [3]. Bubbly structures of this kind can be constructed, for example, by injecting air bubbles into silicone-based polymers [8, 9].

A graded array of resonators effectively behaves as a distributed system of band-pass filters [10]. The choice of kernel filter for auditory processing has been widely explored. Popular options include windowed Fourier modes [11, 12], wavelets [13, 14, 15, 16, 17] and learned basis functions [18]

. In particular, gammatone filters (Fourier modes windowed by gamma distributions) have been shown to approximate auditory filters well and, thanks also to their relative simplicity, are used widely in modelling auditory function as a result

[10, 19, 20, 21]. We will prove that, at leading order, an array of subwavelength resonators behaves as an array of gammatone filters.

The human auditory system is known to be adapted to the structure of the most important inputs and exhibits greatly enhanced neural responses to natural and behaviourally significant sounds such as animal and human vocalisations and environmental sounds [22]. It has been observed that such sounds, often known collectively as natural sounds, display certain statistical properties [22, 23, 24, 25]. By design, most music also falls into this class; music satisfying these properties sounds “much more pleasing” [23]. Thus, it is clear that the human auditory system is able to account for global properties of a sound and that a biomimetic processing architecture needs to replicate this. Many attempts have been made to extract these. We propose using the parameters of the observed statistical distributions as meaningful and tractable examples of global properties to be used in artificial representations of auditory signals.

1.2 Main contributions

In Section 3 we derive results that describe the resonant properties of a system of resonators in three dimensions and prove a modal decomposition in the time domain. This formula takes the form of spatial eigenmodes with first-order gammatone coefficients. Further to this, we show in Section 4 that a cascade of these filters equates to filtering with higher-order gammatones and that extracting information from temporal averages is stable to deformations. Finally, in Section 5.1 we focus our attention on the class of natural sounds, which we define as sounds satisfying certain (widely observed) statistical and spectral properties. Using these properties, we propose a parametric coding approach that extracts the global properties of a sound.

2 Boundary integral operators

2.1 Problem setting

We are interested in studying wave propagation in a homogeneous background medium with disjoint bounded inclusions, which we will label as . We will assume that the boundaries are all Lipschitz continuous and will write . In order to replicate the spatial frequency separation of the cochlea, we are interested in the case where the array has a size gradient, meaning each resonator is slightly larger than the previous, as depicted in Figure 1.

We will denote the density and bulk modulus of the material within the bounded regions by and , respectively. The corresponding parameters for the background medium are and . The wave speeds in and are given by

We also define the dimensionless contrast parameter

We will assume that , meanwhile , and .

2.2 Boundary integral operators

The Helmholtz single layer potential associated to the domain and wavenumber is defined, for some density function , as

where is the Helmholtz Green’s function, given by

The Neumann-Poincaré operator associated to and is defined as

where denotes the outward normal derivative on . These two integral operators are related by the conditions


where the subscripts and denote evaluation from outside and inside the boundary , respectively, and is the identity operator on .

2.3 Asymptotic properties

The single layer potential and the Neumann–Poincaré operator both have helpful asymptotic expansions as (see e.g. [26]). In particular, we have that


where (i.e. the Laplace single layer potential) and

One crucial property to note is that is invertible. Similarly,


where , ,

Several of the operators in the expansion (2.3) can be simplified when integrated over all or part of the boundary . As proved in e.g. [27, Lemma 2.1], it holds that for any and ,


3 Subwavelength scattering decompositions

3.1 Scattering of pure tones

Suppose, first, that the incoming signal is a plane wave parallel to the -axis with angular frequency , given by where . Then, the scattered pressure field is given by where satisfies the Helmholtz equation


where and , along with the transmission conditions


and the Sommerfeld radiation condition in the far field, which ensures that energy radiates outwards [26], given by


where, in this case, .

Definition 3.1 (Resonant frequency).

We define a resonant frequency to be such that there exists a non-trivial solution to (3.1) which satisfies (3.2) and (3.3) when .

Definition 3.2 (Subwavelength resonant frequency).

We define a subwavelength resonant frequency to be a resonant frequency that depends continuously on and is such that as .

Lemma 3.3.

A system of subwavelength resonators exhibits subwavelength resonant frequencies with positive real parts, up to multiplicity.


This was proved in [3] and follows from the theory of Gohberg and Sigal [28, 26]. ∎

3.1.1 Capacitance matrix analysis

Our approach to solving the resonance problem is to study the (weighted) capacitance matrix, which offers a rigorous discrete approximation to the differential problem. We will see that the eigenstates of this -matrix characterise, at leading order in , the resonant modes of the system.

In order to introduce the notion of capacitance, we define the functions , for , as


is used to denote the characteristic function of a set

. The capacitance coefficients , for , are then defined as

We will need two objects involving the capacitance coefficients. Firstly, the weighted capacitance matrix , given by

which has been weighted to account for the different sized resonators (see e.g. [27, 29, 30] for other variants in slightly different settings). Secondly, we will want the capacitance sums contained in the matrix , given by

where is the matrix of capacitance coefficients and is the matrix of ones (i.e. for all ).

We define the functions , for , as

Lemma 3.4.

The solution to the scattering problem (3.1) can be written, for , as

for constants which satisfy, up to an error of order , the problem


The solutions can be represented as


for some surface potentials , which must be chosen so that satisfies the transmission conditions across . Using (2.1), we see that in order to satisfy the transmission conditions on , the densities and must satisfy

Using the asymptotic expansions (2.2) and (2.3), we can see that

and, further, that


Then, integrating (3.5) over , for , and using the properties (2.4) gives us that

At leading order, (3.5) says that so, in light of the fact that forms a basis for , the solution can be written as


for constants . Making this substitution we reach, up to an error of order , the problem


The result now follows by combining the above. ∎

Theorem 3.5.

As , the subwavelength resonant frequencies satisfy the asymptotic formula

for , where

are the eigenvalues of the weighted capacitance matrix

and are real numbers that depend on , and .


If , we find from Lemma 3.4 that there is a non-zero solution to the eigenvalue problem (3.4) when is an eigenvalue of , at leading order.

To find the imaginary part, we adopt the ansatz


Using the expansions (2.2) and (2.3) with the representation (3.4) we have that

and, hence, that

We then substitute the decomposition (3.6) and integrate over , for , to find that

Then, using the ansatz (3.8) for and setting

(the eigenvector corresponding to

) we reach that


Remark 3.6.

The resonant frequencies will have negative imaginary parts, due to the loss of energy (e.g. to the far field), thus for all .

Remark 3.7.

Note that in some cases for some , meaning the imaginary parts exhibit higher-order behaviour in . For example, the second (dipole) frequency for a pair of identical resonators is known to be [27].

Figure 2: The resonant frequencies in the complex plane, for an array of 22 subwavelength resonators.
Remark 3.8.

The numerical simulations presented in this work were all carried out on an array of 22 cylindrical resonators. We approximate the problem by studying the two-dimensional cross section using the multipole expansion method, as in [1].

Remark 3.9.

The array of 22 resonators that simulated in this work measures 35 , has material parameters corresponding to air-filled resonators surrounded by water and has subwavelength resonant frequencies within the range 500  – 10 . Thus, this structure has similar dimensions to the human cochlea, is made from realistic materials and experiences subwavelength resonance in response to frequencies that are audible to humans.

It is more illustrative to rephrase Lemma 3.4 in terms of basis functions that are associated with the resonant frequencies. Denote by the eigenvector of with eigenvalue . Then, we have a modal decomposition with coefficients that depend on the matrix , provided the system is such that is invertible.

Remark 3.10.

The invertibility of is a subtle issue and depends only on the geometry of the inclusions . In the case that the resonators are all identical, is invertible since is symmetric. If the size gradient is not too drastic, we expect to also be invertible (this is supported by our numerical analysis, which typically simulates an array of resonators where each is approximately 1.05 times the size of the previous).

Lemma 3.11.

Suppose the resonator’s geometry is such that the matrix of eigenvectors is invertible. Then if the solution to the scattering problem (3.1) can be written, for , as

for constants given by

where , i.e. the sum of the th row of .


In light of (3.6), we define the functions


for . Then, by diagonalising with the change of basis matrix , we see that the solution to the scattering problem (3.1) can be written as

for constants given, at leading order, by

Now, so we have that up to an error of order

In order to simplify this further, we use the fact that to see that

3.2 Modal decompositions of signals

Consider, now, the scattering of a more general signal,

, whose frequency support is wider than a single frequency and whose Fourier transform exists. Again, we assume that the radiation is incident parallel to the

-axis. Consider the Fourier transform of the incoming pressure wave, given for , by

where . The resulting pressure field satisfies the Helmholtz equation (3.1) along with the conditions (3.2) and (3.3).

Working in the frequency domain, the scattered acoustic pressure field

in response to the Fourier transformed signal can be decomposed in the spirit of Lemma 3.11. We write that, for , the solution to the scattering problem is given by


for some remainder . We are interested in signals whose energy is mostly concentrated within the subwavelength regime. In particular, we want that

Remark 3.12.

Note that the condition (3.12) is satisfied e.g. by a pure tone within the subwavelength regime, since if then Lemma 3.11 gives us that .

Now, we wish to apply the inverse Fourier transform to (3.11) to obtain a time-domain decomposition of the scattered field. From now on we will simplify the notation for the resonant frequencies by assuming we can write that and (which, by Theorem 3.5, is known to hold at least at leading order in ).

Theorem 3.13 (Time-domain modal expansion).

For and a signal which is subwavelength in the sense of the condition (3.12), it holds that the scattered pressure field satisfies, for , ,

where the coefficients are given by for kernels defined as


for .


Applying the inverse Fourier transform to the modal expansion (3.11) yields

where, for , the coefficients are given by

where denotes convolution and the kernels are defined for by


We can use complex integration to evaluate the integral in (3.14). For , let be the semicircular arc of radius in the upper and lower half-plane and let be the closed contour . Then, we have that

The integral around is easy to evaluate using the residue theorem, since it has simple poles at . We will make the choice of or so that the integral along converges to zero as . For large we have a bound of the form


for a positive constant .

Suppose first that . Then we choose to integrate over in the upper complex plane so that (3.15) converges to zero as . Thus, we have that

since the integrand is holomorphic in the upper half plane. Conversely, if then we should choose to integrate over in order for (3.15) to disappear. Then, we see that

Using the notation , we can simplify the expressions for the residues at the two simple poles to reach the result. ∎

Remark 3.14.

The fact that for ensures the causality of the modal expansion in Theorem 3.13.

Remark 3.15.

The asymmetry of the eigenmodes means that the decomposition from Theorem 3.13 replicates the cochlea’s famous travelling wave behaviour. That is, in response to an incoming wave the position of maximum amplitude moves from left to right in the array, see [3] for details.

4 Subwavelength scattering transforms

Figure 3: The frequency support of the band-pass filters . Shown here for the case of 22 resonators.

In Section 3 we showed that when a subwavelength (i.e. audible) sound is scattered by a cochlea-mimetic array of resonators the resulting pressure field is described by a model decomposition. This decomposition takes the form of convolutions with the basis functions from Theorem 3.13. Since , each is a windowed oscillatory mode that acts as a band pass filter centred at . We wish to explore the extent to which these decompositions reveal useful properties of the sound and can be used as a basis for signal processing applications.

In order to reveal richer properties of the sound, a common approach is to use the filters

in a convolutional neural network. That is, a repeating cascade of alternating convolutions with

and some activation function



where, in each case, the indices are such that . We will use the notation

from now on, and refer to the vector

as the path of .

Figure 4: The emergence of gammatones at successively deeper layers in the cascade, shown for the first subwavelength resonant frequency in the case of 22 resonators.

4.1 Example: identity activation

As an expository example, we consider the case where is the identity . In this case, for any depth we have that for some function which is the convolution of functions of the form (3.13), indexed by the path . This simplification means that a more detailed mathematical analysis is possible.

The basis functions take specific forms. In particular, the diagonal terms contain gammatones. A gammatone is a sinusoidal mode windowed by a gamma distribution:

for some order and constants , . Gammatones have been widely used to model auditory filters [10]. We notice that and that higher order gammatones emerge at deeper levels in the cascade (4.1).

Lemma 4.1 (The emergence of higher-order gammatones).

For and , there exist non-negative constants , , such that

In particular, .


Let us write , for the sake of brevity. Firstly, it holds that . Furthermore, we have that

as well as, for , the recursion relation

The result follows by repeatedly applying this formula. In particular, we find that

Remark 4.2.

While the gammatones appeared here through the cascade of filters, gammatones also arise directly from resonator scattering if higher-order resonators are used: resonators that exhibit higher-order singularities in the frequency domain [10, 31]. It was recently shown that if sources of energy gain and loss are introduced to an array of coupled subwavelength resonators then such higher-order resonant modes can exist [30].

Remark 4.3.

Since the imaginary part of the lowest frequency is much larger than the others (see Figure 2), acts somewhat as a low-pass filter (see Figure 3).

For any depth and path it holds that meaning that if then . If, moreover, is compactly supported then the decay properties of mean that for any . Further, we have the following lemmas which characterise the continuity and stability of .

Lemma 4.4 (Continuity of representation).

Consider the network coefficients given by (4.1) with being the identity. Given , there exists a positive constant such that for any depth , any path and any signal it holds that