In the primary visual cortex, neurons in the extrastriate visual cortex have large receptive fields , which leads to computational problems when multiple stimuli fall within one receptive field. Studies have shown that in the case of two stimuli within the same receptive field, macaque monkeys are able to direct attention to one of the stimuli locations  . By directing attention to different spatial locations, the attended stimulus that location can be processed selectively, while stimuli at unattended locations can be ignored. This is known as selective attention.
Selective attention is the neuronal process that allocates resources to a specific spatial location . This allocation results in shifting the receptive field center toward the focus of attention and shrinking the receptive field when the attentional focus is directed into the receptive field . There are two main aspects of selective attention: endogenous attention and exogenous attention. Endogenous attention is allocation of attention using a cue to a likely location for an upcoming visual target . In this case, the subject’s attention is directed to that location. Exogenous attention is when the cue presented is non-informative to the location of the upcoming visual target . In this case, the cue may be presented at the same or different location of the upcoming stimulus.
The model presented in this project aims to simulate selective attention between V1 and the middle temporal (MT) area in the visual cortex and use endogenous attention as the verification for the model. Extended and modified from a unifying mechanistic model of selective attention in spiking neurons , the model simulates a study, by Womelsdorf et al, on selective attention in macaque monkeys MT area. The study reported that when attention is directed into receptive fields of neurons in the MT area, the magnitude of the shift of the spatial-tuning functions is positively correlated with a narrowing of spatial tuning around the attentional focus . The study also showed that the response is bell-shaped from the center of attention. The model in this project aim to also simulate a similar response for attend in and attend out stimulus cases.
Ii-a System Description
The neural system of interest is the connection between the primary visual cortex, V1 and the middle temporal (MT) area. Visual information is passed through the magno, also called the dorsal or parietal, pathway that goes through layers VI, V, IV, III and II of V1 to MT . There are also global control signals from the pulvinar that project to the posterior inferior temporal (PIT) cortex, which are then fed into the control neurons in layers V and VI in Figure 1 .
The lowest layers in V1 are layers V and VI. These layers compute local control signals, and , from the global control signals sent from PIT. is the width of the local receptive field and is the center of the local receptive field. These local control signals guide the routing of a local portion of the attended object . After layers V and VI, layer IV is involved in selective gating of inputs. Layer IV contains nonlinear dendrites, which gate feedforward visual signals based on local control signals . Deeper in the V1 hierarchy, layer II and III process the gated visual signals. Visually responsive neurons in these layers encode the visual signal and send to MT neurons.
When extracting information that are encoded in certain layers, different nonlinear transformations are used to extract transformed versions of the signals. The transformation function for determining whether or not to gate occurs between layers VI and IV and is shown in Function 1.
The gating of the visual signals is determined by if the position of the visual stimulus is in the receptive field or not based on the two local control signal. Gating of the visual signals happens in layer IV and is shown in Function 2. This transformation function determines whether to encode the visual stimulus or to ignore it.
The neurons in the MT area receive input from each of the encoded visual stimulus in layer II and III of the V1 columns. However, depending on the position of the MT neurons, the responses to the visual stimulus differs. To account for this difference, a Gaussian function, shown in Equation 1 is used, where is position of the MT neurons, is the position of the visual stimulus in V1 and is the radius of the receptive field. The transformation is shown in Function 3.
Ii-B Design Specification
In literature, the visual cortex of a cat has a maximum firing of 120 spikes per second for the most sensitive orientation . Additionally, the inactivation time constant, known as the refractory period is around 1-2 milliseconds in the visual cortex with total typical interspike interval between 20 to 100 ms . The membrane time constant was found to range from 20 to 50 milliseconds for major types of central neurons 
. Based on these results from literature, all neurons in the model are LIF neurons with 2 millisecond refractory periods, membrane time constants of 20 milliseconds and maximum firing rates from a uniform distribution in the range of 90 to 120 spikes per second. Since, the membrane time constants from literature vary, changing the membrane time constant are explored when evaluating the behaviour of the model.
Given the functions of each layer and the neuron specifications, the model was implemented in the Neural Engineering Framework (NEF), shown in Figure 2. The model is split into 6 different parts: the manual inputs, V1 column positions encoding, layers V and VI, layer IV, layers II and III, and MT area.
The manual inputs control the local control signals, as well as the position and strength of each visual stimulus. There are manual inputs for the local control signals because the computation of the local control signals from the global control signals are scoped out for this project. The global control signals calculate the center and radius, and , of the local receptive field depending on where the stimulus is located in the current receptive field. For this project, the center and radius are manually controlled depending on the positions of the visual stimulus. To control the positions and strengths of visual stimuli, the positions and strengths node are adjusted. Each column in the positions node corresponds to the same column in the visual stimulus node. The position and strength of a visual stimulus are both 1 dimensional.
The V1 columns are 1 dimensional and each encode the position of a visual stimulus. The connections are shown in Function 4. The encoded positions are then sent to their respective groups of routing neurons in layer V and VI. Layer V and VI contain 2 dimensional control neurons that encode the local control signals and 3 dimensional routing neurons that encode the local control signals and stimulus positions. In addition to V1 columns, the control neurons also send information to the routing neurons, shown in Function 5.
Layer IV contains 2 dimensional feedforward neurons and 1 dimensional gating neurons. The feedforward receives input from the visual stimulus and the encoded information from the routing neurons. These neurons encode the visual stimulus and whether or not the stimulus is to be gated. The connection is shown in Function 6. These feedforward neurons then send the encoded information to the gating neurons, shown in Function 7, which determine whether to encode the visual stimulus, if not gated, or 0, if gated.
Layer II and II contain 4 dimensional combine neurons, which is an intermediary step before each visual stimulus is sent to different MT columns. These neurons take input from encoded visual stimulus in the gating neurons and position information in the routing neurons, shown in Function 8. The neurons in each 1 dimensional MT column use the encoded information from the neurons in the layer below to calculate the response to each stimulus and adds the responses to produce a total response. Function 9 shows how the combine neurons use transformation functions to calculate the strengths of each input.
The model simulates an experiment by Womelsdorf et al. . The experiment measures receptive field profiles in the macaque MT area. First, the macaque foveats on a small square, which acts as a cue, presented on a computer scene for 440 milliseconds. The cue consists of a stationary random dot pattern. After a brief blank delay, three moving random dot patterns are shown. Of the three random dot patterns, two are within the receptive field of the isolated neuron with equal eccentricity. The third random dot pattern is presented outside the receptive field in the opposite hemifield. The procedure of this experiment is shown in Figure 3.
The cue is placed where one of the three stimuli are and when the three patterns are shown, the macaque attends to the stimulus that is closest to the cue location. For the experiment, the cue is tested at all three stimuli locations and 78 probes were placed on neurons in the receptive field, shown by the black dots in Figure 3. From the experiment, they found that the responses to the probes can be fit with a Gaussian to construct the responses in the receptive field. They also found that attending to a target stimulus inside the receptive field, S1 or S2, resulted in a shifted Gaussian peak towards the target and a smaller Gaussian width compared to when attending outside the receptive field, but no significant change in peak responses . When attending to a target stimulus outside the receptive field, S3, the Gaussian peak did not shift and the width of the Gaussian stayed the same, however the peak response activity is lower than when attending inside the receptive field . These results are shown in Figure 4.
For the simulation, the cue is encoded in the control neurons, with being the center of the Gaussian and being the width of the Gaussian. When the cue is inside the receptive field, is set to the position of the target stimulus and is set to 0.75. When the cue is outside the receptive field, is set to 0, which means the peak of the Gaussian stays at the center of the receptive field, and is set to 1.00 since the width of the Gaussian when attending outside the receptive field. Each of the three V1 columns encodes one of the three positions, with the center being position 0. The two attend in stimuli, S1 and S2, are placed with equal eccentricity at -0.5 and 0.5. The attend out stimulus, S3, is not in this simulation since the three position columns are all within the receptive field and anything not in the receptive field will be gated. Each of the visual stimulus columns corresponds to a position column. The value of each visual stimulus represents how fast the random dot patterns are moving on average. For the simulation, the value of both the attend in stimuli are set to 0.5 to keep a small radius for the neurons to encode.
The measurement for this simulation is based on the accuracy of the representation. The experiment uses probes at 78 neurons to get the overall response in the receptive field. Contrary to the experiment, the simulation gets the response using three MT columns inside the receptive field. The placement of each MT column represents a position in the receptive field and the output of each MT column represents the response at that position. The responses of the MT columns are compared to the results in Figure 4 to measure the similarity between the Gaussian responses. Additionally, since the MT column encodes the target visual stimulus, the root mean squared error between the encoded value and the target visual stimulus will be computed to measure the accuracy of the representation.
Iii-B Using starting parameters
The initial parameters for all neurons in the simulation are LIF neurons with a refractory period of 2 milliseconds and membrane time constant of 20 milliseconds. For each ensemble, the maximum firing rate is set to a uniform distribution between 90 and 150 Hz, the number of neurons is set to 200 to 300 neurons per dimension and the radius is set to 1 for the first and last layer and 2 for the intermediate layers. The intermediate layers require a bigger radius because those layers are encoding the width of the receptive field, which can be greater than 1. The first and last layers are encoding the position and response to the visual stimulus, which are all less than 1. The simulation is ran for 1 second 10 times for S1, S2 and S3 since there is randomness in the creation of a neuron model.
Using these initial parameters, the results are similar to the Gaussian when attending inside the receptive field, however, the results are slightly different than the Gaussian when attending outside the receptive field. These similarities and differences are shown in Figure 5, 7 and 6. For the attend in cases of S1 and S2, as shown in Figure 5 and 6, the shape of the responses are similar to the Gaussians in Figure 4 for S1 and S2. In both the experiment and simulation, the peak of the Gaussians are shifted towards the position of where the target stimulus is located. When attending outside of the receptive field for S3, the experiment expected a lower Gaussian peak and wider width than the Gaussians for S1 and S2. Based on results from the simulation, the Gaussian peak was slightly higher than the Gaussian peaks for S1 and S2. However, the width is wider than the widths for S1 and S2 since there is a smaller difference between the responses of the adjacent MT columns. The wider width is consistent with the result from the experiment.
The shape and width of the Gaussians can be further analyzed using the standard deviation measures for the errors of the responses in each MT column. Near the peak of a Gaussian, the values are all close together, which suggests a smaller standard deviation. This is reflected in TableI and II. For the attend in case of S1, the standard deviations in Table I increase from MT column 1 to column 3 and similarly for the attend in case of S2, the standard deviations in Table II increase from MT column 3 to column 1. This result is consistent with Figure 5 and 6 as the peaks are at MT column 1 for S1 and MT column 3 for S2. From Table III, the standard deviations for S3 are all greater than the standard deviations for S1 and S2. This suggests that the Gaussian for S3 is wider than the Gaussians for S1 and S2.
In addition, Table I and II can be used to measure how accurately the MT columns are able to encode the target visual stimulus from the root mean squared error (RMSE). Similar to the pattern of standard deviations, the average RMSE values increase from MT column 1 to 3 for S1 and increase from MT column 3 to 1 for S2. The smaller the RMSE, the more accurate the representation. The smallest RMSE value for S1 and S2 corresponds to the MT column that is at the same position as the visual stimulus, which shows that the receptive field shifts towards the position of the target stimulus to get a more accurate encoding. This behaviour of the simulation is consistent with the results from the experiment.
Iii-C Adjusting parameters
For further evaluation, the following parameters are varied to analyze the behaviour of the simulation model: membrane time constant, maximum firing rate, number of neurons per dimension and radius. All parameters, except for the varied parameter, are kept as the initial parameters so that all changes are associated with the varied parameter. For each varied parameter, the model ran for 1 second 10 times with and another 10 times with due to randomness in the creation of a neuron model.
Iii-C1 Membrane time constant
Based on literature, membrane time constants vary from 20 to 50 milliseconds. Since 20 milliseconds is used as an initial parameter, to further evaluate the behaviour of the simulation, the membrane time constant, , is changed to 50 milliseconds. The MT column responses are shown in Figure 8 and 9 and the RMSE results for 10 simulations are in Table IV and V.
From Figure 8 and 9, the responses for each MT column is very similar to the results with a membrane time constant of 20 milliseconds. The similarity can be verified using Table IV and V. The average RMSE for each MT column is only slightly better, approximately 0.016, than the average RMSE when membrane time constant is 20 milliseconds. This suggests that the membrane time constant has little effect on the behaviour of the simulation. As well, variations from 20 to 50 milliseconds in membrane time constants between neurons are biologically plausible and does not affect the performance of selective attention. The model is consistent with this fact and shows that varying the membrane time constant to 50 milliseconds does not affect the response of the MT columns.
Iii-C2 Maximum firing rate
Using maximum firing rates from 90 to 150 are biologically plausible for these neurons, however, the results in are very noisy, as indicated by the high RMSE values. One way to lessen the noise and lower the RMSE values is to increase the maximum firing rates. The maximum firing rates were adjusted to a uniform distribution between 200 to 400 Hz, which are the default maximum firing rates in NEF. The MT column responses are shown in Figure 10 and 11 and the RMSE results for 10 simulations are in Table VI and VII.
Both Figure 10 and 11 show a less noisy signal compared to the results using maximum firing rates between 90 and 150 Hz. The average RMSE for the MT column at the same position as the target stimulus is significantly lower than the average RMSE with the initial maximum firing rates by approximately 0.06. There is also a greater difference in RMSE between the two MT columns closer to the target stimulus and a smaller difference in RMSE between the middle MT column and furthest MT column, which is closer to the features of a Gaussian. Gaussians have a steeper slope near the peak and more tapered slope further from the peak. This shape is more prominent with higher maximum firing rates, however is not biologically plausible based on literature. Based on this result, the model still needs improvements to get more accurate Gaussian shaped responses when attending within the receptive field.
Iii-C3 Number of neurons per dimension
The average number of neurons in V1 for a galago is 35 million and the average number of neurons in MT is 1.6 million . For the model, the total number of neurons used for the simulation is 8200 for the V1 layers and 900 neurons for the MT area. Since there are 35 million and 1.6 million neurons in V1 and MT respectively, it is biologically plausible that the V1 and MT layers in model can have more neurons to encode information. Thus, the number of neurons per dimension were adjusted to 500. This resulted in a total of 31 000 neurons for the model, with 16 000 neurons in the V1 layers and 1500 neurons in the MT area. The RMSE results for 10 simulations are in Table VIII and IX. Based on the RMSE results, there is no difference when increasing the number of neurons.
Except for the first and last layer of the model, the other neurons have radiuses of 2 to encode values greater than 1. For instance, the width of the receptive field, , can be greater than 1. The first and last layer does not exceed 1 because the visual stimulus and the visual stimulus positions are all within 1. However, as the radius increases, the RMSE also increases linearly. This is because as the radius increases, there is more area where the neurons are not tuned to. Since the simulations above do not have greater than 1, the radius can be reduced to 1 to avoid tuning values greater than 1. The RMSE results for 10 simulations are in Table X and XI.
Similar to increasing the maximum firing rates, decreasing the radius also improves the RMSE for the MT column that is closest to the target stimulus. However, unlike the result from increasing the maximum firing rate, the RMSE for the second closest MT column to the target stimulus is decreased to be slightly larger than the closest MT column, minimizing the difference between the two columns. This is not desired as it is different from the shape of a Gaussian, which only has a steep slope near the peak. Decreasing the radius also decreases the standard deviations for all the columns, which suggests that more neurons are now tuned to values within a radius of 1.
The model simulates a selective attention experiment by Womelsdorf et al. . The responses from the three MT columns are used to compare with the experiment results of the Gaussian receptive field model and how accurately each MT column encodes the target visual stimulus. Based on the results, there are three improvements that can be made to the model to better compare with the experiment, produce more accurate responses and incorporate more biologically plausible features. There are always more improvements that can be made on this model, however, these three improvements do not involve additional subsystems to the model.
The first improvement is adding more MT columns to better analyze the pattern of responses. For the experiment, 78 probes are used to measure the response of the Gaussian receptive field in the MT area. More MT columns can be implemented in either 1 dimension or 2 dimensions like the experiment. Adding more MT columns allow the results to be compared more easily and gives visualization to the slope of the responses.
The second improvement is to reduce the amount of large oscillations in the output of the MT columns. As shown in Figure 5, 6 and 7, there are many large oscillations. When the maximum firing rates are increased to a uniform distribution between 200 and 400, there are less large oscillations, as shown in Figure 10 and 11. However, the maximum firing rates cannot be increased because increasing the maximum firing rates are not biologically plausible for V1 neurons. The question for this improvement is: how to adjust the model to produce similar results without changing the firing rates? Further research would have to go into this improvement to get more accurate responses.
Lastly, the final improvement is to more accurately represent the width of the attended receptive field, , by calculating the shrinkage based on the amount that was shifted for attend in cases, Equation 2, or by calculating the spread for attend out cases, Equation 3 . Currently, is set to 0.75 if the target is within the receptive field and 1.0 if the target is outside the receptive field. This is not an accurate measure of , however having smaller when attending within the receptive field than when attending outside gives an approximate response of the output since this is what happens in biology.
From the results using initial parameters, the simulation is able to produce Gaussian shaped responses similar to the experiment. When varying the parameters of membrane time constant and number of neurons per dimension, there were little to no change in the results. Changing the maximum firing rate to a uniform distribution between 200 and 400 Hz produced the most significant change that matches the results from the experiment the most accurately. However, this change is not biologically plausible for V1 neurons. Lastly, the radius was decreased to improve RMSE. This decreased the RMSE, however, the result was not desirable since the two columns closest to the position of the target visual stimulus had little difference, which is different from the shape of a Gaussian. Overall, this model is a start to representing the selective attention responses in MT columns. More improvement is needed for the model to improve both the response accuracy and for the model to be more biologically plausible.
Ö. B. Artun, H. Z. Shouval, and L. N. Cooper.
The effect of dynamic synapses on spatiotemporal receptive fields in visual cortex.Proceedings of the National Academy of Sciences, 95(20):11999–12003, 1998.
-  B. Bobier, T. C. Stewart, and C. Eliasmith. A unifying mechanistic model of selective attention in spiking neurons. PLOS Computational Biology, 10(6):1–16, 06 2014.
-  M. Carandini and D. Ferster. Membrane potential and firing rate in cat primary visual cortex. Journal of Neuroscience, 20(1):470–484, 2000.
-  C. E. Collins, D. C. Airey, N. A. Young, D. B. Leitch, and J. H. Kaas. Neuron densities vary across and within cortical areas in primates. Proceedings of the National Academy of Sciences, 107(36):15927–15932, 2010.
-  C. D. Gilbert. Laminar differences in receptive field properties of cells in cat primary visual cortex. The Journal of Physiology, 268(2):391–421, 1977.
-  E. R. Kandel. Principles of neural science. McGraw-Hill, 2013.
-  C. Koch, M. Rapp, and I. Segev. A brief history of time (constants). Cerebral Cortex, 6(2):93–101, 1996.
-  S. J. Luck, L. Chelazzi, S. A. Hillyard, and R. Desimone. Neural mechanisms of spatial selective attention in areas v1, v2, and v4 of macaque visual cortex. Journal of Neurophysiology, 77(1):24–42, 1997. PMID: 9120566.
-  E. Macaluso and I. Indovina. Dissociation of Stimulus Relevance and Saliency Factors during Shifts of Visuospatial Attention. Cerebral Cortex, 17(7):1701–1711, 09 2006.
-  T. Womelsdorf, K. Anton-Erxleben, and S. Treue. Receptive field shift and shrinkage in macaque middle temporal area through attentional gain modulation. Journal of Neuroscience, 28(36):8934–8944, 2008.