Many, if not most, popular vision models consist in a cascade of linear and non-linear (L+NL) operations [Martinez2018]. This happens for models describing both visual perception – e.g., the Oriented Difference Of Gaussians (ODOG) [Blakeslee1999] or the Brightness Induction Wavelet Model (BIWaM) [Otazu2008] – and neural activity [Carandini2005]. However, L+NL models, while suitable in many cases for retinal and thalamic activity, are not adequate to predict neural activity in the primary visual cortex area (V1). In fact, according to [Carandini2005]
, such models have low predictive power (indeed they can explain less than 40% of the variance of the data). On the other hand, several vision models are not in the form of a cascade of L+NL operations, such as those describing neural dynamics via Wilson-Cowan (WC) equations[WilsonCowan1973, Bressloff2002]. These equations describe the state of a population of neurons with V1 coordinates and orientation preference at time as
Here, are fixed parameters, is an interaction weight, is a sigmoid saturation function and represents the external stimulus. In [Bertalmio2007, BertalmioCowan2009, BertalmioFrontiers2014] the authors show how orientation independent WC-type equations admit a variational formulation through an associated energy functional which can be linked to histogram equalisation, visual adaptation and efficient coding theories [Olshausen2000].
In this paper, we consider a generalisation of this modelling and introduce explicit orientation dependence via a lifting procedure inspired by neuro-physiological models of V1 [Citti2006, Duits2010, Prandi2017]. Interestingly, the Euler-Lagrange equations associated with the proposed functional yield orientation-dependent WC-type equations analogous to (1). We then report some numerical evidence showing how the proposed model is able to better reproduce some visual perception phenomena in comparison to both previous orientation-independent WC-type models and state-of-the-art ODOG [Blakeslee1999] and BIWaM [Otazu2008] L+NL models.
In particular, we firstly test our model on orientation-dependent Grating Induction (GI) phenomena (generalising the ones presented in [Blakeslee1999, Figure 3], see also [McCourt1982]) and show a direct dependence of the processed image on the orientation, which cannot be reproduced via orientation-independent models, but which is in accordance with the reference result provided by the ODOG model. We then test the proposed model on the Poggendorff illusion, a geometrical optical effect where a misalignment of two collinear segment is induced by the presence of a surface [Weintraub1971, Westheimer2008], see Figure 6. For this example our model is able to reconstruct the perceptual bias better than all the reference models considered.
2 Orientation-independent variational models
Let be a rectangular image domain with and let be a normalised image on . Let further be a given positive symmetric weighting function such that for any . For a fixed parameter we consider the piece-wise affine sigmoid defined by
and we consider to be any function such that . Observe that is convex and even.
In [Bertalmio2007, BertalmioFrontiers2014, KimBatardBertalmio2016] the authors consider the following energy as a variational model for modelling contrast and assimilation phenomena, defined in terms of a given initial image :
Here, is the local mean of the initial image . Such term can encode a global reference to the “Grey-World” principle [Bertalmio2007] (in this case for any ) or a filtering around computed either via a single-Gaussian convolution [BertalmioFrontiers2014] or a sum of Gaussian filters [KimBatardBertalmio2016], consistently with the modelling of the multiple inhibition effects happening at a retinal-level [MarceloPLOS2016]. Finally, the parameter stands for a normalisation constant, while represents a weighting parameter enforcing the attachment of the solution to the given .
The gradient descent associated with is the following equation corresponds to a Wilson-Cowan-type equation similar to (1), where the visual activation is assumed to be independent of the orientation, as discussed in [Bressloff2001]. The parameters and are set and and the time-invariant external stimulus is equal to :
Note that in [KimBatardBertalmio2016] the authors consider in (3) a convolution kernel
which is a convex combination of two bi-dimensional Gaussians with different standard deviations. While this variation of the model is effective in describingassimilation effects, the lack of dependence on the local perceived orientation in the Wilson-Cowan-type equation (4) makes such modelling intrinsically not adapted to explain orientation-induced contrast and colour perception effects similar to the ones observed in [Otazu2008, Self2014, Blakeslee1999]. Up to our knowledge, the only perceptual models capable to explain these effects are the ones based on oriented Difference of Gaussian filtering coupled with some non-linear processing, such as the ODOG and the BIWaM models described in [Blakeslee1999, Blakeslee2016] and [Otazu2008], respectively.
3 A cortical-inspired model
Let us denote by the size of the visual plane and let be the disk . Fix such that . In order to exploit the properties of the roto-translation group on images, we now consider them to be elements in the set:
We remark that fixing is necessary, since contrast perception is strongly dependent on the scale of the features under consideration w.r.t. the visual plane.
In the following, after introducing the functional lifting under consideration, which follows well-established ideas contained, e.g., in [Citti2006, Duits2010, Prandi2017], we present the proposed cortical-inspired extension of the energy functional (3), and discuss the numerical implementation of the associated gradient descent. We remark that the resulting procedure thus consists of a linear lifting step combined with WC-type evolution, which is non-linear due to the presence of the sigmoid , see (2).
3.1 Functional lifting
Each neuron in V1 is assumed to be associated with a receptive field (RF) such that its response under a visual stimulus is given by
Motivated by neuro-phyisiological evidence, we will assume that each neuron is sensible to a preferred position and orientation in the visual plane, i.e., that . Here, is the projective line that we represent as , with . Moreover, in order to respect the shift-twist symmetry [Bressloff2002, Section 4]
, we will assume that the RF of different neurons are “deducible” one from the other via a linear transformation. Let us explain this in detail.
The double covering of is given by the Euclidean motion group , that we consider endowed with its natural semi-direct product structure
In particular, the above operation induces an action of on , which is thus an homogeneous space. Observe that is unimodular and that its Haar measure (which is the only left and right-invariant measure up to scalar multiples) is simply .
We now denote by the space of linear unitary operators on and let be the quasi-regular representation of which associates to any the unitary operator , i.e., the action of the roto-translation on square-integrable functions on . Namely, the operator acts on by
Moreover, we let be the left-regular representation, which acts on functions as
Letting now be the operator that transforms visual stimuli into cortical activations, one can formalise the shift-twist symmetry by requiring that
Under mild continuity assumption on , it has been shown in [Prandi2017] that is then a continuous wavelet transform. That is, there exists a mother wavelet satisfying for all , and such that
Observe that the operation above is well defined for thanks to the assumption on . By (5), the above representation of is equivalent to the fact that the RF associated with the neuron is the roto-translation of the mother wavelet, i.e., .
Letting , the above formula can be rewritten as
where denotes the standard convolution on .
Notice that, although images are functions of with values in , it is in general not true that . However, from (7) we deduce the following result, which guarantees that is a bounded operator. (See [Prandi2017, Section 2.2.3].)
The operator is continuous and, therefore, bounded. In particular, for any the function is a continuous bounded function on , with . Moreover, if takes values in and , we have .
Neuro-physiological evidence shows that a good fit for the RFs is given by Gabor filters, whose Fourier transform is simply the product of a Gaussian with an oriented plane wave[Daugman1985a]. However, these filters are quite challenging to invert, and are parametrised on a bigger space than , which takes into account also the frequency of the plane wave and not only its orientation. For this reason, in this work we chose to consider as wavelets the cake wavelets introduced in [Bekkers2014]. These are obtained via a mother wavelet whose support in the Fourier domain is concentrated on a fixed slice, which depends on the number of orientations one aims to consider in the numerical implementation. To recover integrability properties, the Fourier transform of this mother wavelet is then smoothly cut off via a low-pass filtering, see [Bekkers2014, Section 2.3] for details. Observe, however, that in order to lift to and not to , we consider a non-oriented version of the mother wavelet, given by , in the notations of [Bekkers2014].
An important feature of cake wavelets is that, in order to recover the original image, it suffices to consider the projection operator defined by
Indeed, by construction of cake wavelets, Fubini’s Theorem shows that for all .
3.2 WC-type mean field modelling
The natural extension of (3) to the cortical setting introduced in the previous section is the following energy on functions :
Here, is the lift of the local mean appearing in (3), and is the lift of the given initial image . The constant will be specified later. Furthermore, is a positive symmetric weight function and, as in [Bertalmio2007], we denote by its evaluation at . A sensible choice for would be an approximation of the heat kernel of the anisotropic diffusion associated with the structure of V1, as studied in [Citti2006, Duits2010]. However, in this work we chose to restrict to the case where is a (normalised) three-dimensional Gaussian in , since this is sufficient to describe many perceptual phenomena not explained by the previous 2D model (3). A possible improvement could be obtained by choosing as the sum of two such Gaussians at different scales in order to better describe, e.g., lightness assimilation phenomena.
Thanks to the symmetry of
and the oddness of, computations similar to those of [Bertalmio2007] yield that the gradient descent associated with (10) is
This is indeed a Wilson-Cowan-like equation, where the external cortical stimulus is .
3.3 Discretisation via gradient descent
In order to numerically implement the gradient descent (11), we discretise the initial (square) image as an matrix. (Here, we assume periodic boundary conditions.) We additionally consider orientations, parametrised by .
The discretised lift operator, still denoted by , then transforms matrices into arrays. Its action on an matrix is defined by
where is the Hadamard (i.e., element-wise) product of matrices, denotes the discrete Fourier transform, is the rotation of angle , and is the cake mother wavelet.
After denoting by , and by where is a Gaussian filtering of , we find that the explicit time-discretisation of the the gradient descent (11) is
Here, for a given 3D Gaussian matrix encoding the weight , and an matrix , we let, for any and ,
We refer to [Bertalmio2007, Section IV.A] for the description of an efficient numerical approach used to compute the above quantity in the 2D case and that can be translated verbatim to the 3D case under consideration.
After a suitable number of iterations of the above algorithm (measured by the criterion , for a fixed tolerance ), the output image is then found via (9) as
4 Numerical results
In this section we present results obtained by applying the cortical-inspired model presented in the previous section to a class of well-established phenomena where contrast perception is affected by local orientations.
We compare the results obtained by our model with the corresponding Wilson-Cowan-type 2D model (3)-(4) for contrast enhancement considered in [KimBatardBertalmio2016, BertalmioFrontiers2014]. For further reference, we also report comparisons with two standard reference models based on oriented Gaussian filtering. The former is the ODOG model [Blakeslee1999] where the output is computed via a convolution of the input image with oriented difference of Gaussian filters in six orientations and seven spatial frequencies. The filtering outputs within the same orientation are then summed in a non-linear fashion privileging higher frequencies. The latter model used for comparison is the BIWaM model, introduced in [Otazu2008]. This is a variation of the ODOG model, the difference being the dependence on the local surround orientation of the contrast sensitivity function111For our comparisons we used the ODOG and BIWaM codes freely available at https://github.com/TUBvision/betz2015_noise. .
All the considered images are pixel. We always consider lifts with orientations. The relevant cake wavelets are then computed following [Bekkers2014] for which the frequency band bw is set to for all experiments. In (11), we compute the local mean average and the integral term by Gaussian filtering with standard deviation and , respectively. The gradient descent algorithm stops when the relative stopping criterion defined in Section 3.3 with a tolerance is verified.
4.1 Grating induction with oriented relative background
Grating induction (GI) is a contrast effect which has been first described in [McCourt1982] and later studied, e.g., in [Blakeslee1999]. In this section we describe our results about GI with relative background orientation, see Figure 1. Here, when the background has different orientation from the central grey bar, a grating effect, i.e. an alternation of dark-grey/light-grey patterns within the central bar is produced and perceived by the observer. This phenomenon is contrast dependent, as the intensity of the induced grey patterns (dark-grey/light-grey) is in opposition with the background grating. Moreover, it is also orientation-dependent as the magnitude of the phenomenon increase or decrease based on the background orientation, and is maximal when the background bars are orthogonal to the central grey bar.
Discussion on computational results.
We observe that model (11) predicts in accordance with visual perception the appearance of a counter-phase grating in the central grey bar, see Figures 3(b) and 3(d). The same result is obtained by the ODOG model, see Figures 1(a) and 3(a). In particular, Figures 3 and 5 show higher intensity profile when the background gratings are orthogonal to the central line, while the effect diminishes if the angle of the background decrease from to , see orange and green dashed line. On the other hand, the BIWaM model and the WC-type 2D model for contrast enhancement do not appear suitable to describe this phenomenon. See for comparison the red and blue dashed lines in Figures 3 and 5.
4.2 Poggendorff illusion
The Poggendorff illusion (see Figure 5(b)) consists in the super-position of a surface on top of a continuous line, which then induces a misalignment effect. This phenomenon has been deeply investigated [Weintraub1971, Westheimer2008] and studied via neuro-physical experiments, see, e.g., [Weintraub1971]. Here, we consider a variation of the Poggendorff illusion, where the background is constituted by a grating pattern, see Figure 5(a). Figure 5(b) contains the classical Poggendorff illusion, extracted from figure 5(a).
Discussion on computational results.
The result obtained by applying the model (11) to Figure 5(a) is presented in Figure 5(c). Similarly as in the previous example, we observe an alternation of oblique bands of different grey intensities within the central grey surface. However, the question here is whether it is possible to reconstruct numerically the perceived misalignment between a fixed black stripe in the bottom part of Figure 5(a) and its collinear prosecution lying in the upper part. Note that the perceived alignment differs from the actual geometrical one: for a fixed black stripe in the bottom part, one would in fact perceive the alignment of the corresponding collinear top stripe slightly flushed left, as it is clear from Figure 5(b) where single stripes have been isolated for better visualisation. To answer this question, let us look at the reconstruction provided in Figure 5(c) and mark by a continuous green line a fixed black stripe in the bottom part of the image. In order to find the stripe in the upper part which is perceived to be collinear with the marked one, one would need to follow how the model propagates the marked stripe across the central surface. By drawing a dashed line in correspondence with such propagation, we can thus find the stripe in the upper part of the image corresponding to the marked one and observe that, as expected, this does not correspond to its actual collinear prosecution. This can be clearly seen in Figure 5(d) where the two stripes have been isolated for better visualisation.
This example shows that the proposed algorithm computes an output in agreement with our perception. Comparisons with reference models are presented in Figures 8 and 8. We observe that the results obtained via the proposed 3D-WC model cannot be reproduced by the BIWaM nor the WC-type 2D model, which moreover induce non-counter-phase grating in the central grey bar. On the other hand, the result obtained by the ODOG model is consistent with ours, but presents a much less evident alternating grating within the central grey bar. In particular, the induced oblique bands are not visibly connected throughout the whole grey bar, i.e. their induced contrast is very poor and, consequently, the induced edges are not as sharp as the ones reconstructed via our model, see Figure 8 for one example on the middle-line profile.
We presented a cortical-inspired setting extending the approach used in [Bertalmio2007, BertalmioFrontiers2014] to describe contrast phenomena in images. By mimicking the structure of V1, the model explicitly takes into account information on local image orientation and it relates naturally to Wilson-Cowan-type equations introduced in [WilsonCowan1973] to study the evolution of neurons in V1. The model can be efficiently implemented via convolution with appropriate kernels and discretised via standard explicit schemes. The additional information on the local orientation allows to describe contrast/assimilation phenomena as well as notable orientation-dependent illusions outperforming the models introduced in [Bertalmio2007, BertalmioFrontiers2014]. Furthermore, the performance of the introduced mean field model is competitive with the results obtained by applying some popular orientation-dependent filtering such as the ODOG and the BIWaM models [Blakeslee1999, Otazu2008].
Further investigations should address a more accurate modelling reflecting the actual structure of V1. In particular, this concerns the lift operation where the cake wavelet filters should be replaced by Gabor filtering, as well as the interaction weight which could be taken to be the anisotropic heat kernel of [Citti2006] instead of the isotropic Gaussian currently employed. Finally, extensive numerical experiments should be performed to assess the compatibility of the model with psycho-physical tests measuring the perceptual bias induced by these and other phenomena such as the tilt illusion [Self2014]. This would provide insights about the robustness of the model in reproducing the visual pathway behaviour.