The visual capabilities in humans have motivated a large number of scientific studies. The performance to perceive and interpret visual stimuli in different species including humans is outstanding: the wide range of tolerance to different illumination and noise levels are just a few characteristics that we are aware of, but that are still barely understood.
An important aspect in visual processing is the perception of motion. Motion is a key step in several computer vision tasks such as 3D reconstruction, feature tracking, time-to-collision estimation, novelty detection, among otherskey:barron . Motion is also one of the features that many species can perceive from the flow of visual information, and its detection has been observed in a large number of animals key:visualneuro , from invertebrates to highly evolved mammals. From optical engineering and experimental psychology we already know the main features of human motion discrimination key:nakayama . In this work, we are particularly interested in taking inspiration from biology in order to design a parallel algorithm with similar discrimination capabilities as obtained by classical serial architectures.
Our work begins by presenting an overview of techniques to detect motion in machine vision to continue with the available experimental results and their procedures in human psychophysics. Section 3, presents our algorithm for speed detection with which we perform our simulations, and compare with experimental data. The results are analyzed in section 4, and in the last two sections, we present the discussion and conclusions about our work.
In this work we are interested in the detection of motion, specifically in the coding and retrieval of speed (), and in the link between the idea of selecting a range of speed to work with, and providing wider ranges of discrimination as observed in human psychophysics experiments key:orban . We focus on two features: the multi-scale architecture of the speed detection, and the relation between the number of multi-scale levels and the range of speeds the system is sensitive to.
2.1 Motion detection in computer vision
The detection of motion is a widely used operation in computer vision. Commonly called “optical flow extraction”, the main objective is to assign a vectorto each frame pixel from a given sequence of frames (at least two). In this section, we explain the basic technique to increase the motion range of an optical flow extraction which the method is sensitive to. We ground our explanation on the well-known Lucas & Kanade’s method key:lucas ; key:barron (the basic multi-scale technique similarly applies to other methods for optical flow extraction).
2.1.1 Optical flow
Many optical flow extraction methods are based on the initial assumption of brightness conservation, that is,
where is the velocity vector. A well known technique following this approach is the Lucas & Kanade algorithm key:lucas , that minimizes the following cost function in a small fixed region , i.e.
where is a two-dimensional Gaussian function used to give more importance to the central points and is a square region of a few pixels. This minimization estimates with sub-pixel precision after a few iterations. This method achieves good optical flow extraction in regions where , such as corners key:simon .
2.1.2 Serial multi-scale optical flow
The Lucas & Kanade method for optical flow extraction considers a small region . The use of this region is not particular to this method: it is used in most algorithms key:barron . As the computation is performed in small windows, the detection of motion is constrained to detect speeds up to pixels per frame, where stand for the diameter of . To overcome this limitation, a multi-scale representation of the images can be performed, usually by considering Gaussian pyramids key:black . A Gaussian pyramid representation of an image is computed by recursively smoothing (using a Gaussian kernel) and sub-sampling the original image. In this way, the original image is represented by a set of smaller images. The representation at scale level is the original image itself. The image at level is obtained by sub-sampling a filtered version of the image at level with a downsampling factor equal to 2. Thus, the size of the image at each level is with , where is the number of levels of the representation.
In the serial multi-scale optical flow estimation, speed is computed by sequentially projecting the estimation obtained at level to level , until level . There are complex strategies for computing the optical flow with a multi-scale approach key:simon . A simple solution for optical flow computation is implemented in the widely used computer vision library OpenCV key:opencv ; key:black . In this case, the multi-scale estimation starts from the highest level () and it propagates to the next one:
where is the estimation of velocity at level after projecting the estimation by warping the image by at level . Computing the optical flow from the highest level and then projecting the solution to the lower level key:simon ; key:black increases the range of detectable speeds. This range is wider when more scales are used. On the other hand, the sequential projection between levels also propagates the error introduced at each level. Thus, in terms of precision, increasing the number of scales in the representation increases the error introduced in the estimation.
2.2 Biological elements
This section sketches out the current experimental knowledge in biology, focusing on studies of speed coding in the human brain key:logmt and on higher level descriptions of speed discrimination from experimental psychophysics key:nakayama ; key:metha ; key:koenderink .
2.2.1 Parallel architecture
In the human brain the main area that is responsible for coding different speeds is area MT key:visualneuro
. It is located in the occipital region (back of the head). Neurons in this area are selective to stimuli moving at a given speedkey:mt . Their spatial organization is retinotopical JingLiu01012003 : each neuron has a reduced visual field, and neurons who share the same local visual field are grouped together in some macro-column that contains cortical columns that are selective for different orientations. This configuration allows a complete mapping of the visual field with a group of cortical columns that codes for all possible directions of local motions. The spatial organization is less known with respect to the speed selectivity. Nevertheless, it has been found that (1) the average detected speed increases with eccentricity (with respect to the retinotopical organization of MT), (2) similar speeds are detected by neurons closer than for distinct speeds, and (3) for each eccentricity, there are neurons for different speeds JingLiu01012003 . The interactions between different units is not completely understood, but there is evidence that units sensitive to different speeds could be coding a range of speeds in parallel key:logmt . It has been observed that the range of detectable speeds is not uniformly covered key:mt , but in this work we are interested in the simultaneous existence of speed selective units in MT that could be accountable for a parallel architecture dealing with different speeds.
2.2.2 Speed discrimination
In the work of McKee et al. key:nakayama , two subjects were exposed to several stimuli, one of those being a horizontal scaled single bar vertically moving at different eccentricities. The goal of this experiment was to determine the minimal relative detectable variation in speed for every subject with the sight fixed at a certain location and for each stimuli eccentricity.
It is important to mention how this was actually measured, because the subject cannot assign a precise velocity at each location. Instead, given a reference velocity, the subject was asked to indicate whether the next presented stimuli moves faster or slower. The minimal detectable variation was then statistically inferred. Related experiments were performed by others key:orban ; key:metha ; key:koenderink , showing that the measurements are not affected by different contrast conditions, and that they do not depend on binocular or monocular sight.
The described experiments study the speed discrimination at several eccentricities111Distance to the center of the eye in foveated visition (humans, primates and others)., see Fig. 1. In this work we are interested in each one of these eccentricities and their related discrimination properties, and not in the relations between different eccentricities. In order to model these discrimination functions, we need to generate a given discrimination percentage in a range of speed . We also point out that the left side of the experimental curves, see Fig. 1, is related to the eccentricity but the same idea holds: for each eccentricity there is a wide range of speed discrimination, where the relative error (rather than the absolute error) remains stable (5%-15%).
3 Proposed parallel multi-scale speed detection
Multi-scale speed detection is based on the fact that a particular speed detection algorithm can be used to estimate slower speeds at lower levels and to estimate faster speeds at higher levels. This information is used in the above described serial multi-scale optical flow algorithm to detect speeds in a wide range of velocities by projecting the information at level to estimate speed at level , i.e. in a serial manner. As it is described in key:mt ; key:logmt , it seems that human motion perception is based on a parallel multi-scale scheme. Based on this idea, the speed detection algorithm proposed in this paper estimates the speed by combining the information computed at each level independently, i.e. using the multi-scale information in a parallel manner. In this case, there is no error propagation on the computation of speeds at each level because it does not depend on the estimation performed for other levels. At each level , we compute speeds using the optical flow estimation algorithm described above, see subsection 2.1. As explained before, this choice does not bias our results, since our work is to provide a bio-inspired parallel speed detection instead of the standard serial approach, for any optical flow extraction method.
As expected, the speed detection algorithm estimates speed with a certain error at each multi-scale level . The confidence in the estimation of speed at level , denoted as , can be defined as
where is the magnitude of the object’s real speed () and is the magnitude of the average estimated speed on the object pixels location. It can be noted, that this computation only takes into account the magnitude of the speed, ignoring its direction. Figure 2 shows the confidence for three different multi-scale levels. These distributions were computed using an input image sequence containing an object moving at different speeds in a range from 0.5 pixels per frame to 20 pixels per frame. To statistically determine the confidence at each level and speed
, the experiments were carried out using the input image sequence with several realizations of Gaussian white noise, then the resulting confidenceis computed as the mean value of the ones obtained in the experiments. Figure 3 shows two frames of an input image sequence used in the experiments. In this sequence the object is moving at 10 frames per pixel in the bottom-right direction.
As it may be seen in Figure 2, a particular speed can be detected at several multi-scale levels but with different confidence values. Thus, the current speed could be estimated by taking into account the speeds computed at each level and their associated confidence values . For that reason, the experimental distributions depicted in Fig. 2
have to be approximated by a closed-form equation. In this work, these distributions are approximated (modeled) as Gaussian distributions in a semi-log space defined by the following equation
are the mean and variance of the distribution at leveland is the scaling factor used in the sub-sampling of the images. The approximated distributions for , and are depicted in Fig. 2. It may be seen that a better approximation of the distributions could be obtained using a particular set of variables for each level but this would increase the model complexity. The approximation of the distributions for each level in Eq. (5) only depends on , and . It may be noted that this approximation allows to perform the estimation of speeds using different values of the scaling factor , which is usually set to , i.e., the case of using Gaussian pyramids for the sub-sampling.
Finally, denoting the detected speed at each level by , the proposed algorithm computes the current speed, using the speed detected at each multi-scale level with its associated confidence value , as
where is the number of levels used to compute the estimated speed . Figure 3 shows the obtained optical flow using the proposed parallel multi-scale algorithm. The comparison between the experimental confidence distribution of the proposed algorithm and confidence distributions for three levels is shown in Fig. 4. As expected, the confidence distribution of the parallel multi-scale algorithm with is approximately the envelope of the confidence distributions of levels , and .
As it was described in subsection 2.2.2, speed discrimination is computed as the minimal detectable variation in speed of a particular visual stimuli. In this work, a variation in speed, from a given reference speed , of the moving object is considered to be noticeable if the following inequality holds
where and are the speeds estimated by the algorithm when the object is moving at velocities and , respectively, and is the percentage of variation from the object speed required to consider as detectable. It may be noted in Eq. (9) that a variation in speed is considered noticeable if it is detectable when is both increased and decreased in . To statistically determine the minimum value of several experiments were carried out using the input image sequence with several realizations of Gaussian white noise. Then, the minimal detectable variation in speed, from a given reference speed , is computed as the minimum detectable obtained in 90% of the experiments.
We summarize our results in Fig. 5. Figure 5 shows the discrimination of the proposed parallel multi-scale algorithm for different values of L. The range of discriminated speeds is enlarged when the number of levels used in the multi-scale representation increases. In comparison with the serial multi-scale, our method has a similar range of speed discrimination when the same number of levels are used, see Fig. 5, 5 and 5. Considering both mean and variance of the discrimination in the range of speeds from 1 to 15 pixels per frame, the parallel multi-scale method shows lower values. For the case of , parallel discrimination has mean and variance, while serial discrimination has mean and variance. This indicates that the proposed parallel algorithm presents a better discrimination in this range.
proposes a bayesian scheme to compute the error distributions and then to estimate the velocity using a Kalman filter through the space of scales (not time). This approach builds a far more sophisticated error function, but it is still serial. Our work assumes that the error functions are fixed, whilekey:simon assumes the error changes with respect to , and this might be important in real-world scenarios. On the other hand, Chey et al. key:chey propose that considering higher threshold levels for higher scales (scale-proportional thresholds) and inter-scale competition could explain human speed discrimination curves. We have presented a scheme where the response of each scale regulates the relevance of the responses of that scale. Since we handle all scales at the same time, it corresponds to a notion of threshold and competition. To our knowledge, no other work models error functions as Gaussians in the log space (this strengthen the idea that detection is not symmetrical), which seems to fit recent recordings of motion sensitivity of neurons key:logmt .
Finally, about the time complexity of our algorithm. Lets consider the size of the image as , the order of the optical flow algorithm as (clearly ) and the number of scales . The complexity order of the serial multi-scale algorithm is , and for the parallel algorithm. The only difference is in the operations involved in the merge of scales. Considering the possible speed-up using processor for the case , then , what can be also written as . This last equation show us that the degree of parallelism (taking one level by processor) achieved by our proposed algorithm is linear.
In this work we have presented a parallel multi-scale algorithm to perform the estimation of motion using two consecutive images. This method takes bio-inspiration from human physiology and psychophysics knowledge in the sense that it achieves wide uniform relative discrimination properties by using evenly spaced logarithmic scales, and it gives results in constant time as a function of scales. With respect to the classical serial multi-scale optical flow algorithm, the error propagation among scales appears less important for our proposed algorithm in terms of relative discrimination. We now explore the idea of using more biologically plausible methods of optical flow extraction and the integration with a foveated topology.
-  J. L. Barron, D. J. Fleet, S. S. Beauchemin, and T. A. Burkitt. Performance of optical flow techniques. CVPR, 92:236–242, 1994.
-  M. Srinivasan and S. Zhang. Motion Cues in Insect Vision and Navigation, pages 1193–1202. MIT Press, Cambridge, MA, 2003.
-  S. P. McKee and K. Nakayama. The detection of motion in the peripheral visual field. Vision Research, 24:25–32, 1984.
-  G. A. Orban, F. V. Calenbergh, B. De Bruyn, and H. Maes. Velocity discrimination in central and peripheral visual field. J. Opt. Soc. Am. A, 2(11):1836, 1985.
-  B. D. Lucas. Generalized image matching by the method of differences. PhD thesis, Pittsburgh, PA, USA, 1985.
-  E. P. Simoncelli. Bayesian multi-scale differential optical flow. In B. Jähne, H. Haussecker, and P. Geissler, editors, Handbook of Computer Vision and Applications, volume 2, chapter 14, pages 397–422. Academic Press, April 1999.
-  M. J. Black. Robust incremental optical flow. PhD thesis, New Haven, USA, 1992.
-  G. Bradski and A. Kaehler. Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media, Inc., 1st edition, October 2008.
-  H. Nover, Anderson C. H., and G. C. DeAngelis. A logarithmic, scale-invariant representation of speed in macaque middle temporal area accounts for speed discrimination performance. The J. of Neurosci., 25(43):10049–10060, October 2005.
-  A. B. Metha, A. J. Vingrys, and D. R. Badcock. Detection and discrimination of moving stimuli: the effects of color, luminance, and eccentricity. J. Opt. Soc. Am. A, 11(6):1697, 1994.
-  J. J. Koenderink, A. J. van Doorn, and W. A. van de Grind. Spatial and temporal parameters of motion detection in the peripheral visual field. Journal of the Optical Society of America A, 2:252–259, February 1985.
-  J. H. R. Maunsell and D. C. Van Essen. Functional properties of neurons in the middle temporal visual area (mt) of the macaque monkey: I. selectivity for stimulus direction, speed and orientation. J. Neurophysiol., 49:1127–1147, 1985.
-  J. Liu and W. T. Newsome. Functional Organization of Speed Tuned Neurons in Visual Area MT. J. Neurophysiol., 89(1):246–256, 2003.
-  J. Chey, S. Grossberg, and E. Mingolla. Neural dynamics of motion processing and speed discrimination. Vision Research, 38(18):2769–2786, September 1998.