ImageNet-trained deep neural network exhibits illusion-like response to the Scintillating Grid

07/21/2019 ∙ by Eric D. Sun, et al. ∙ Weizmann Institute of Science Harvard University 2

Deep neural network (DNN) models for computer vision are now capable of human-level object recognition. Consequently, similarities in the performance and vulnerabilities of DNN and human vision are of great interest. Here we characterize the response of the VGG-19 DNN to images of the Scintillating Grid visual illusion, in which white dots are perceived to be partially black. We observed a significant deviation from the expected monotonic relation between VGG-19 representational dissimilarity and dot whiteness in the Scintillating Grid. That is, a linear increase in dot whiteness leads to a non-linear increase and then, remarkably, a decrease (non-monotonicity) in representational dissimilarity. In control images, mostly monotonic relations between representational dissimilarity and dot whiteness were observed. Furthermore, the dot whiteness level corresponding to the maximal representational dissimilarity (i.e. onset of non-monotonic dissimilarity) matched closely with that corresponding to the onset of illusion perception in human observers. As such, the non-monotonic response in the DNN is a potential model correlate for human illusion perception.



There are no comments yet.


page 4

page 6

page 7

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural network (DNN) models are capable of besting human champions in chess [13] and Go [15] and reaching superhuman levels of accuracy in image classification and object recognition tasks [2, 8, 4, 13]

. The comparable performances of DNNs and humans are reflected by several similarities in their computational architecture such as having isolated computational units or neurons organized hierarchically into layers. Possibly as a result of these similarities, and by virtue of the similarity in object recognition accuracy as compared to humans

[21, 9], DNNs have been proposed as models for several aspects of human vision including shape recognition [10] and visual perceptual learning [25] among others [27, 23]. These results motivate the search for other visual intersections, and possibly vulnerabilities, that are shared between human and machine models.

Human perception can exhibit large deviations from what is considered to be physical reality; these deviations are often referred to as visual illusions [7]. The study of such illusions may provide insight into the constraints and mechanisms of human visual processing [4]. In a similar manner, DNNs are also prone to “illusions” where the neural network reports mistaken classifications for noisy and meaningless images, which were generated intentionally via adversarial methods to mislead the network [22, 11]. Further, some images generated to mislead a DNN have also led to mistaken classification by time-limited human observers [5]. However, to our knowledge, there has yet to be an examination of how the representation of images within the DNN model differs for images that clearly exhibit illusion perception in humans as compared to non-illusion images. Here we explore the Scintillating Grid, a human visual illusion in that regard.

The Scintillating Grid illusion (Fig. 1a) induces an illusory perception of scintillating black dots within white grid dots [14]. The Scintillating Grid is a stronger variant of the famous Hermann Grid, which exhibits a similar effect at the intersections of grid lines [18]. The Hermann Grid is structurally identical to the Scintillating Grid with the exception of dots at the intersections.

In this study, we characterize a potential model correlate for human illusion perception in the VGG-19 DNN. Using a setup where images with increasing whiteness of a masked region are compared to a standard image, we analyze the VGG-19 representation of Scintillating Grid illusion and control images and discover an illusion-specific deviation from the monotonic relationship that is expected from linear increases in pixel difference (i.e. whitening). We introduce additional control setups, compare the deviation to the illusion effect in human perception, and examine the propagation of the deviation across the VGG-19 network architecture. Our findings suggest several similarities between VGG-19 and human responses to the Scintillating Grid and potentially offer a fresh perspective regarding the origin of the Scintillating Grid illusion effect in visual systems.

2 Methods

2.1 Vgg-19 Dnn

VGG-19 is a DNN with 19 layers consisting of a stack of convolutional layers followed by three fully connected layers and a final soft-max layer [17]. The VGG models produced top performances in both localization and classification tracks at the 2014 ImageNet Challenge. We utilized the standard VGG-19 model accessed through Matlab (MatConvNet [24]) that was pre-trained on the 1.3 million images of 1,000 image classes of ImageNet [3]. Most analyses were performed on the output representation of the final fully-connected layer (fc8), since it was the closest layer to the network output, and hence presumably the most similar to visual perception. All image stimuli used were compressed to dimensions of pixels.

2.2 Dot whiteness experimental setup

The grid illusion images were generated at a size of pixels and then re-sized to pixels to conform to VGG-19 input requirements. Here we denote pixel whiteness with the symbol . The Scintillating Grid stimulus was set to default parameters:

  • Dots: 25 dots organized in a 5x5 grid, each of diameter 30 pixels (prior to downsizing) and whiteness (white); each with a concentric border of width 1 pixel (prior to downsizing) and whiteness to prevent shape loss when dot whiteness matches line whiteness.

  • Lines: 10 lines organized to intersect at dot positions, each of width 15 pixels and whiteness (gray).

  • Background: whiteness (black).

The dot elements were masked and their whiteness was varied along 21 uniform values between black () and white () with . These images were compared to the reference image with black dots () and the distance in the VGG-19 representations (referred to as the representational dissimilarity, ) was measured (Fig. 1b).

To maintain contrast boundaries between dots and lines even when whiteness of dots and lines was the same, we introduced a one-pixel border of whiteness around all dots prior to image downsizing. This manipulation preserved illusion perception in humans as evident by inspection (see Fig. 1a). We conclude that the observed deviation in the representational dissimilarity is not significantly influences by shape loss, since the peak of occurred at and not (i.e. border whiteness).

For the natural and synthetic images, we selected a masked region consisting of 5-20 percent of pixels that were approximately white (

) using a heuristic threshold on the pixel value. The whiteness of the masked region was varied along 21 intervals to reflect the grid illusion setup. Similarly, each whiteness variant of the original image was compared to the same image with a black (

) masked region to obtain a representational dissimilarity measure (see Section 2.4).

2.3 Image stimuli sets

We used a set of 30 illusion images and a set of 30 control images. The illusion stimuli set consisted of diverse Scintillating Grid variants including grids with translation, increased dot size, altered background color, different scales, and different dot array dimensions. The control stimuli set included 19 natural and synthetic images and 11 illusion controls (i.e. grid images where no illusory perception was present for human observers). Natural and synthetic images included animals, humans, plants, and also randomly generated square grids or checkerboard patterns. The natural images were selected by an independent party from non-ImageNet sources. The illusion control images consisted primarily of Scintillating Grid variants with the grid lines removed. Representative images from each of the stimuli sets are available in the Supplementary and the full stimuli set is included in the public repository:

2.4 Representational Dissimilarity

To quantify the dissimilarity between VGG-19 fc8 representations of two images, we used the distance. This distance was referred to as the representational dissimilarity . Specifically, the metric was calculated as the mean absolute difference of the neuron outputs and of the two images and , given the neurons in the layer (rows, columns, convolution kernels).


2.5 Deviation magnitude and area

To quantify deviations from the expected relation between dot whiteness and representational dissimilarity with respect to a blackened image, we made two assumptions: 1) In the absence of illusion-like deviation, increases approximately linearly with increased whiteness, and 2) an illusion-like deviation contributes to a depressed since the perception of black dots enforces greater similarity to the black (

) dot grid (i.e. illusory white dots are more similar than regular white dots to black dots). An approximation of the non-illusory representational dissimilarity as a function of dot whiteness was obtained using a linear regression of representational dissimilarity values from the initial dot whiteness (

) through the dot whiteness at maximal ( such that ). We assumed linearity in this range of values since there was effectively no illusory perception for human observers at low dot whiteness values [20]. The deviation from this expected linear relationship was measured by subtracting the observed representational dissimilarity in VGG-19 from the expected linear regression (Fig. 1c):


We refer to this difference between the linear and observed values as the deviation magnitude . Similarly, we refer to the positive area of the curve as the deviation area . The deviation area represents the magnitude and depth (range of whiteness intervals) for the illusion-like deviation. was calculated as the integral of the deviation magnitude from the point of maximum representational dissimilarity to the final dot whiteness (Fig. 1c):


We approximated this integral using numerical integration with trapezoidal quadrature.

2.6 VGG-19 stage and layer analysis

To examine the extent of the deviation in representational dissimilarity across the architecture of VGG-19, we compared the deviations at different layers, and at different neurons within a layer. Direct layer comparisons were achieved by scoring the deviation of each layer with the deviation area . The was then normalized with respect to the representational dissimilarity of the final white dot grid () for each layer respectively. This normalization was done to adjust for differences in the magnitudes of activation between layers. Similarly, comparisons at the neuronal level were achieved by using the deviation area normalized by the final () value for each neuron. In each layer, we measured the fraction of significant neurons whose deviation area was above a heuristic threshold of .

Figure 1: Scintillating Grid and experimental protocols. (a) The Scintillating Grid visual illusion exhibits illusory scintillation of black dots within the white grid dots. (b) Schematic representation of the experimental setup. Representational dissimilarity, denoted , was calculated as the distance of the VGG-19 representation (layer fc8) between two images. One image had a masked region of varying whiteness (from through by ) and the other image was constant with a black masked region ( throughout). For the Scintillating Grid (as in illustration), the masked regions were the grid dots. (c) Schematic representation of deviation magnitude and deviation area measurements. was measured as the distance between the value of a linear regression on the values up to and the VGG-19 for the given . was measured as the area between the linear regression and the VGG-19 dissimilarity curve and represented the accumulated magnitude of the deviation for all less than the at maximum .

3 Results

3.1 Dot whiteness experiment

With increased dot whiteness, and subsequently increased pixel distance from the black dot image, we expected the representational dissimilarity to monotonically increase from at to at (Fig. 1c). The expected monotonic relation between dot whiteness and representational dissimilarity was evident in the “No Lines” control image (Fig. 2b), where the lines of the Scintillating Grid were removed to effectively eliminate illusion perception. The monotonic relation was also evident in most natural and synthetic images (Fig. 2c). Interestingly, we observed a significant deviation from the monotonic relation when increasing dot whiteness on images of the Scintillating Grid illusion (Fig. 2a). The representational dissimilarity increased in a monotonic fashion to an at before decreasing and leveling off for higher whiteness levels (). This trend was noticeably different from the completely monotonic behavior observed in natural and synthetic images and grid illusion controls (Fig. 2d). The Scintillating Grid-specific deviation was observed for several different grid sizes, in images with different numbers of dots, and in translated grids (data not shown). This deviation from the expected monotonic behavior implicated a significant effect that was sensitive to dot whiteness, but independent of absolute differences in pixel values.

To quantify the magnitude of this illusion-like effect in VGG-19, we computed the deviation magnitude at each experimental interval. In the Scintillating Grid illusions, low deviation magnitudes were observed in early intervals () but increased after (Fig. 2e). In comparison, the No Lines control produced minimal deviation magnitude throughout all dot whiteness intervals (Fig. 2e). As a result, the Scintillating Grid observed a much higher deviation area () as compared to the No Lines control ().

3.2 Natural and Synthetic Images and Grid Controls

We examined the robustness of the illusion-like effect in a diverse set of illusion variants () and control images (). The majority of the control images observed strictly increasing representational dissimilarity when subjected to increasing masked region whiteness (Fig. 2

e). Of the images that showcased some deviation from the expected monotonic relation, none were as significant as those observed for the Scintillating Grid (independent t-test,

for mean values) (Fig. 2f).

Figure 2: Significant deviations from the expected monotonic increase of representational dissimilarity with increasing whiteness was selectively present in the Scintillating Grid. (a) The representational dissimilarity , calculated as described in Fig. 1, for increasing dot whiteness in the Scintillating Grid image. The measured deviated from the expected linear increase. (b) Removing lines from the Scintillating Grid eliminated human perception of the grid illusion, and recovered a clearly monotonic relation between and which was approximately linear with increasing . (c) Representative example of a natural image control (cheetah). Although natural and synthetic images deviated from the expected linear relation, they exhibited significantly less deviation in VGG-19 than the Scintillating Grid and observed a monotonic relation between and

. (d) Representational dissimilarity averaged across samples in three stimuli sets (30 illusions, 19 natural and synthetic images, 11 illusion controls). Shaded region represents standard error of the mean. (e) Deviation magnitude

as a function of element whiteness for illusions, controls, and natural/synthetic images. Individual trajectories are shown in solid lines. (f) There was a significant difference between the mean deviation area of illusion and control stimuli sets. Error bars represent standard error of the mean . Note that the magnitude of (y-axis scaling) is irrelevant in panel d because of differences in the number of masked pixels and presence of grid elements (see Supplementary for details).

The mean deviation area for illusions (, Mean SEM) was significantly higher than that for illusion controls () and natural and synthetic images (), which indicated a significant deviation in the VGG-19 model that was specific to grid illusions (Fig. 2f). Interestingly, the mean deviation area of the natural and synthetic images was higher than that of the illusion controls (Fig. 2f). Unlike illusion controls, which had contoured dots that prevented loss of color-derived boundaries under increasing whiteness (see Section 2.2), natural and synthetic images may be sensitive to these changes in color and contrast contours. This difference is a possible explanation for the inflated measurement observed in natural and synthetic images as compared to illusion controls (see Supplementary for more discussion).

Interestingly, a much less pronounced response was observed in ResNet [9]

, another deep convolutional neural network (see Supplementary). Unlike in VGG-19, the ResNet deviation area

for Scintillating Grid variants was only significantly higher than the of natural and synthetic images.

3.3 Number of white dots experiment

In the previous experiment where the unit of change was dot whiteness, only the pixel difference between the two compared images was proportional to the unit of change. In that setup, we observed a significant competing effect outside of pixel differences, which resulted in a deviation from the expected monotonic increase in with increased dot whiteness (Fig. 2a). We postulate that this deviation represents VGG-19 “perception” of a human-like illusion effect in the Scintillating Grid. To test this theory, we attempted to recover the expected behavior by using the number of white dots in the grid image as the unit of change in lieu of dot whiteness. In this setup, black-dotted Scintillating Grids with progressively increased numbers of white dots were compared to an all black dot grid (Fig. 3a). Like in the previous dot whiteness setup, the number of white dots is linearly proportional to the pixel difference between the two compared grids. However, unlike the previous setup, the number of white dots is additionally proportional to the magnitude of the perceived illusions for human observers since each white dot contributes equally to the illusion effect. Therefore, if the observed deviation is a model correlate of human illusion perception, then varying the number of white dots linearly would eliminate any deviation from a monotonic relation between and the number of white dots.

As the number of white dots increased, the representational dissimilarity increased linearly as expected of a stimulus with competing pixel difference and illusion-like deviation effects (Fig. 3a). By constraining illusion-like effects to be linearly proportional to the units of change, we effectively eliminated the previously observed deviation. These results support human-like illusory perception as a contributing factor to the significant deviation observed in the VGG-19 representational dissimilarity for the Scintillating Grid in the previous dot whiteness setup where only pixel differences were held to be proportional to the units of change.

3.4 Comparison to human vision

After characterizing the illusion-like response of VGG-19 to the Scintillating Grid, we compared it to previously reported measurements of human perception for the same illusion [20]. Results (Fig. 2a) showed that the dot whiteness interval with maximal VGG-19 representational dissimilarity roughly corresponded to the onset of significant illusion perception in the DNN model. That is, the standard Scintillating Grid stimulus induced a significant illusion-like response in VGG-19 for dot whiteness . Correspondingly, an earlier experiment with human observers () using the same visual stimuli, found that the dot whiteness critical point for illusion perception (defined as the where half or less of the participants perceived the illusion) was around [20]. This similarity in perceptual ranges can be readily verified by visual inspection of Fig. 3b.

Figure 3: (a) Increasing number of whites dots controlled for illusion-like deviations since the number of white dots is linearly proportional to the magnitude of the perceived illusions (for human observers) and to the pixel differences between the two compared images. Grids with incrementally greater numbers of white dots were compared to an all black dot grid. This setup recovered the expected linearly increasing representational dissimilarity with increased number of white dots. (b) Comparison to human perception. Visually, the transition from illusion-absent to illusion-present grid images is around . This corresponded to the dot whiteness value with maximal VGG-19 representational dissimilarity .

3.5 Origin and propagation of illusion-like perturbation

We analyzed individual layer outputs to determine if any subset of layers was responsible for the origin or propagation of the illusion response. Results for the Scintillating Grid showed a sharp induction of an illusion-like effect in the deeper layers starting with relu5_1, disappearing at conv5_3, and then reappearing and persisting after relu5_4 up until the final layer fc8 (Fig. 4a). To determine the number of significant neurons per layer, we applied the same method with a heuristic threshold of (Fig. 4b).

Figure 4: (a) VGG-19 deviation area for each layer / computational stage of the DNN with respect to the Scintillating Grid and the No Lines control grid; (b) Fraction of significant neurons for each layer / computational stage determined using a heuristic threshold of .

These results suggested that later processing stages were responsible for illusion-like response in VGG-19. However, most theories of grid illusion perception in human vision focus on earlier visual processing such as lateral inhibition in retinal ganglion cells or S1 simple cells in the primary visual cortex. Since a universal theory of grid illusion perception has not been established [12], it may be worthwhile to consider the role of deeper visual processing in human perception of the Scintillating Grid.

We also used back-propagation to visualize regions of highest activation in VGG-19 for different image stimuli. These activation patterns were qualitatively different for the Scintillating Grid as compared to the No Lines control (see Supplementary for more details). Additionally, we applied principal component analysis to the layer

values and observed that the first principal component was sufficient to resolve all whiteness levels in illusion controls and natural and synthetic images, while the first two principal components was necessary to fully resolve the different whiteness levels of the Scintillating Grid (see Supplementary).

4 Discussion

Here, we report that a human visual illusion, the Scintillating Grid, evoked a potential correlate of human illusory perception in VGG-19, a deep convolutional neural network. By measuring the representational dissimilarity between grid illusions of varying dot whiteness and a black dot grid illusion, we showed that the observed trends in deviated significantly from the expected monotonic behavior observed in natural and synthetic images and illusion control images. Varying the number of white dots in the grid illusion (and hence associating the magnitude of human-perceived illusion effect to pixel difference) recovered a linear relation between and the number of white grid dots. Overall, these results suggest that a strong nonlinear relation between and dot pixel whiteness was present in Scintillating Grid variants for VGG-19 and that this effect was not simply the result of pixel differences. We propose the non-monotonic deviation in observed in VGG-19 as a model correlate of human illusory perception of the Scintillating Grid because:

  1. The Scintillating Grid produced the largest VGG-19 deviation from monotonic behavior among all natural and synthetic images and control grid images investigated in this study.

  2. When the number of white dots was increased rather than dot whiteness, the deviation was lost and the increased linearly with the number of white dots. This is consistent with a deviation that correlates to human perception of the Scintillating Grid illusion.

  3. The illusion-like effect was present for a dot whiteness range characterized by a critical threshold () that was similar to that for human perception of the Scintillating Grid () [20].

To our knowledge, these results are the first indication that a deep neural network may exhibit human-like representations of select visual illusions like the Scintillating Grid.

Several theories have been proposed to explain the neural mechanisms responsible for the perception of these grid illusions. The traditional theory posits that shallow retinal ganglion cell processes produce the observed center-surround effect [18, 26]. Retinal ganglion cells typically exhibit a center-surround receptive field, whereby the activity of some cells is increased by light falling on the excitatory center and decreased by light falling on the inhibitory surround (ON-center cells), or vice versa (OFF-center cells) [6]. The illusory perception of dark dots at the intersections of the Hermann Grid was therefore argued to manifest from more light falling on the inhibitory surround of ON-center cells at the intersections than at other areas flanked by single grid lines [18, 4, 1]. Subsequent studies have indicated that the retinal ganglion cell theory is not sufficient to explain additional properties of grid illusions, which include the perception of the illusion under rotation and perturbation of the grid lines [12]. As such, it was proposed that additional downstream processing, such as in V1, is involved in mediating illusion perception for human observers. In VGG-19, we observed large deviation magnitudes () and greater proportions of deviation-significant neurons in the deepest layers of VGG-19. Therefore, the illusion correlate is more likely to have originated from these deep layers rather than being propagated from the earlier processing. Although a direct correlation between the architectures of human and computer vision is difficult, early DNN layers may be comparable to the human opponent-color and frequency-selective representations in retinal ganglion cells and in V1 neurons, while deeper DNN layers might compare with higher areas in the human visual system, such as V4 or IT. Therefore, consideration of higher level processing in human perception of the Scintillating Grid and its variants may potentially yield new insight into the underlying mechanisms of visual illusion perception.

In conclusion, the work suggests a novel area of overlap between human and computer vision, which we hope will motivate further investigation of visual illusions using computer vision, and vice versa.


We would like to thank the Dr. Bessie F. Lawrence International Summer Science Institute (ISSI) and Prof. Dov Sagi for the opportunity to initiate this collaboration.


  • [1] Amari, S. Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics 27, 2 (Aug. 1977), 77–87.
  • [2] Ciregan, D., Meier, U., and Schmidhuber, J. Multi-column deep neural networks for image classification. In

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    (June 2012), pp. 3642–3649.
  • [3] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (June 2009), pp. 248–255.
  • [4] Eagleman, D. M. Visual illusions and neurobiology. Nature Reviews Neuroscience 2, 12 (Dec. 2001), 920–926.
  • [5] Elsayed, G. F., Shankar, S., Cheung, B., Papernot, N., Kurakin, A., Goodfellow, I., and Sohl-Dickstein, J. Adversarial Examples that Fool both Computer Vision and Time-Limited Humans. 32nd Conference on Neural Information Processing Systems (Feb. 2018).
  • [6] Enroth-Cugell, C., and Robson, J. G. The contrast sensitivity of retinal ganglion cells of the cat. The Journal of Physiology 187, 3 (Dec. 1966), 517–552.
  • [7] Gregory, R. L. Putting Illusions in their Place. Perception 20, 1 (Feb. 1991), 1–4.
  • [8] He, K., Zhang, X., Ren, S., and Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1026–1034.
  • [9] He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778.
  • [10] Kubilius, J., Bracci, S., and Beeck, H. P. O. d. Deep Neural Networks as a Computational Model for Human Shape Sensitivity. PLOS Computational Biology 12, 4 (Apr. 2016), e1004896.
  • [11] Nguyen, A., Yosinski, J., and Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Computer Vision and Pattern Recognition (June 2015), IEEE, pp. 427–436.
  • [12] Schiller, P. H., and Carvey, C. E. The Hermann Grid Illusion Revisited. Perception 34, 11 (Nov. 2005), 1375–1397.
  • [13] Schmidhuber, J. Deep learning in neural networks: An overview. Neural Networks 61 (Jan. 2015), 85–117.
  • [14] Schrauf, M., Lingelbach, B., and Wist, E. R. The Scintillating Grid Illusion. Vision Research 37, 8 (Apr. 1997), 1033–1038.
  • [15] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. v. d., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (Jan. 2016), 484–489.
  • [16] Simonyan, K., Vedaldi, A., and Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv:1312.6034 [cs] (Dec. 2013). arXiv: 1312.6034.
  • [17] Simonyan, K., and Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (Sept. 2014). arXiv: 1409.1556.
  • [18] Spillmann, L. The Hermann Grid Illusion: A Tool for Studying Human Perceptive Field Organization. Perception 23, 6 (June 1994), 691–708.
  • [19] Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. Striving for Simplicity: The All Convolutional Net. arXiv:1412.6806 [cs] (Dec. 2014). arXiv: 1412.6806.
  • [20] Sun, E. Characterizing the whiteness dependence of the Hermann and Scintillating Grid visual illusions. preprint, PsyArXiv, June 2019.
  • [21] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going Deeper With Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), IEEE, pp. 1–9.
  • [22] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv:1312.6199 [cs] (Dec. 2013). arXiv: 1312.6199.
  • [23] Turner, M. H., Giraldo, L. G. S., Schwartz, O., and Rieke, F. Stimulus- and goal-oriented frameworks for understanding natural vision. Nature Neuroscience 22, 1 (Jan. 2019), 15.
  • [24] Vedaldi, A., and Lenc, K. MatConvNet: Convolutional Neural Networks for MATLAB. In Proceedings of the 23rd ACM International Conference on Multimedia (New York, NY, USA, 2015), MM ’15, ACM, pp. 689–692. event-place: Brisbane, Australia.
  • [25] Wenliang, L. K., and Seitz, A. R. Deep Neural Networks for Modeling Visual Perceptual Learning. Journal of Neuroscience 38, 27 (July 2018), 6028–6044.
  • [26] Wolfe, J. M. Global Factors in the Hermann Grid Illusion. Perception 13, 1 (Feb. 1984), 33–40.
  • [27] Yamins, D. L. K., and DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience 19, 3 (Mar. 2016), 356–365.

5 Supplementary Material

5.1 Stimuli sets

We used three image stimuli sets in the masked element and dot whiteness experiments. These included 30 illusion variants, 19 natural and synthetic images, and 11 illusion control images. In Figure 5, five representative images from each of the three sets are depicted for reference. All image sets are available on the public Github repository:

Figure 5: Representative images from each of the three stimuli sets: illusion variants, natural and synthetic images, and illusion control images.

5.2 Whole-network representational dissimilarity pattern

Our analyses used the outputs of fc8, which was the final fully convolutional layer of VGG-19 and arguably the closest analog to perception in human vision since it is the stage just prior to classification. To understand the dissimilarity patterns across all layers, principal component analysis (PCA) was applied to the representational dissimilarity vectors of each layer/stage in VGG-19 (as compared to and values in Fig. 4). In illusion controls and natural and synthetic images, the first principal component was sufficient to discriminate the different dot whiteness illusion variants (i.e. the values corresponding to different dot whiteness levels were separated when projected onto the first principal component as is shown in Fig. 6bc). On the other hand, the values corresponding to dot whiteness in the Scintillating Grid were significantly crowded along the first principal component and the second principal component was necessary to fully resolve individual whiteness intervals (see Fig. 6a). The median whiteness level in the crowded region was , which was close to the previously observed in fc8

. This implies that the sources of variance captured by the first principal component are primarily responsible for illusion perception since the component fails to fully discriminate the dot whiteness intervals corresponding to illusion perception. It will be of interest to characterize the correlates of the first principal component in human vision and its possible contribution to illusion perception.

Figure 6: Principal component analysis (PCA) of VGG-19 representational dissimilarity vectors of all layers. Shown are the first two principal components for the Scintillating Grid, No Lines control, and an example natural image (cheetah). Colors correspond to the dot whiteness or masked element whiteness level . The first principal component is insufficient for discriminating all dot whiteness intervals in the Scintillating Grid (see red shaded region) but is sufficient to discriminate intervals in the other control stimuli.

5.3 VGG-19 Visualization

To develop a visual understanding of VGG-19 “perception” of the Scintillating Grid, we utilized methods for visualizing DNN activation. One approach for visualizing regions of greatest activation for a given neuron is to use a backward pass for the neuron activation after the forward pass by the network. The gradient of the activation is then associated with the activation levels and was visualized as such. This can also be achieved with a gradient of the class score with respect to the input image

[16], which tends to offer a better visualization of the entire network than vanilla back-propagation. We adopted a variant of this approach and used gradient visualization with back-propagation through guidance from ”deconvnet” visualization [19]

. The implementation was adapted from the “pytorch-cnn-visualizations” project for a pre-trained VGG-19 model.

We visualized the activation of the VGG-19 with respect to the Scintillating Grid and three control variants that exhibited no illusion effect: the Scintillating Grid with black dots, the Scintillating Grid with no lines, and the Scintillating Grid with black dots and no lines. This was achieved using guided back-propagation. Guided back-propagation [19] was performed on the class differentials in the final output layer of VGG-19. Applying guided back-propagation on the Scintillating Grid illusion revealed a disjointed, quad-like arrangement of high activation patches around the border of each dot. This pattern was not observed in the corresponding patterns of activation of the three non-illusion images, which mostly consisted of continuous, circular boundaries of activation around the dots (Fig. 7).

Figure 7: Gradient visualization of the ImageNet VGG-19 activation with respect to the Scintillating Grid and its non-illusory variants using vanilla back-propagation and guided back-propagation.

5.4 ResNet shows no illusion-like response

We investigated the deep convolutional ResNet-152 network [9] for illusion-like responses. For the same 30 illusion stimuli set and 30 control (19 natural, 11 illusion control) stimuli set, there was a less pronounced illusion-like response as characterized by the deviation areas between illusions () and controls ( for natural, for illusion controls) (Fig. 8a). The deviation magnitude trajectories corroborated this trend (Fig. 8b).

Figure 8: ResNet exhibits no significant illusion-like deviation. (a) Deviation area for illusion variants, natural and synthetic images, and illusion controls with standard error of mean. (b) Deviation magnitudes for different element whiteness intervals for illusion variants, natural and synthetic images, and illusion controls. (c) Deviation area across different ResNet stages/layers for the Scintillating Grid and the No Lines control.

This difference as compared to the responses of VGG-19 suggests that the illusion-like effect is sensitive to network architecture–VGG-19 consists of 19 layers while the implementation of ResNet is much deeper with 152 layers [9]. Comprehensive analyses of other deep neural network models may further inform the properties which influence the susceptibility of a DNN to perception of the Scintillating Grid.

5.5 Comment on the lack of deviation magnitude comparability between stimuli sets

The magnitude of representational dissimilarity , and therefore and , were not comparable between the different stimuli sets due to intrinsic differences in the size and placement of the masked regions, and due to differences in the backgrounds (non-masked) regions. These differences were not trivially amenable to normalization methods due to the complexity of factors involved (e.g. different baseline levels of white pixels, different average pixel value, etc). Furthermore, since the competing illusion-like effect is additive, normalization may artificially inflate or deflate the illusion scoring statistics. Therefore, we only considered the monotonicity of the relation of with respect to dot whiteness.