Images contain great amounts of information at multiple frequencies, scales and spatial locations. Perceiving this information in its entirety is a hard task. The human brain has evolved neurological processes that selectively focus attention 
at one salient frequency, scale or image location at a time. In order to enable processing of large amounts of data, it is natural to automate these neurological processes. This has led to the development of automated visual saliency detection algorithms that estimate salient features in images[14, 15]. Frintrop et al. in  showed the higher repeatability and discriminative power of salient regions as compared to ensemble-based detections, leading to more accurate matching across different scenes for registration or 3D scene estimation. Object detection and recognition accuracies have been improved by applying saliency filter as the front end followed by specific descriptors like SIFT  trained on the object class.
Visual saliency, can be thought of as a combination of bottom-up and top-down attention   Top-down uses the prior knowledge, models and abstractions . Judd et al  used a combination of low level local features and semantic information from high level detectors for faces and people. Such top down approaches try to predict the way humans perceive the visual world. It is difficult to translate such an approach to domains like medical image analysis where human attention is task dependent. Hence, we primarily concentrate on bottom-up saliency, which predominantly depends on the conspicuity emerging from contrasting/distinguishing local image features. For instance, Mahadevan and Vasconselos 
proposed an architecture where saliency is proportional to the discriminability of a set of features that classify center from surround, at that location. They set the size of the center window to a value comparable to the size of the display items and the surround window six times the size. This ratio is motivated from the neurological evidences on natural images. However, such an assumption does not hold in medical images where the lesions and organs can take diverse range of sizes. Itti and Koch
deal with this problem by estimating the size of salient region by applying difference of Gaussians on the image pyramid using different combinations of standard deviations that capture different center-surround ratios. This adds an extra dimension of complexity leading to high compute expense. estimated the conditional entropy of the center given its surrounds using kd-tree to reduce computation. This method assumes fixed size windows for defining center and surround region making it intractable for medical imaging where anatomies could be of various sizes. Also their method is not scalable for finding salient regions in 3 dimensional volumes. Considering the above mentioned shortcomings we adopt the saliency definition proposed by Kadir and Brady 
. Specifically, their framework considers different sized neighborhoods of every point in an image. The maximum of the product of the local entropy and the rate of the pdf as a function of scale of the neighborhood is then computed. If this maxima exceeds a pre-decided threshold the point is designated as a salient point. Subsequently Expectation-Maximization (EM) based clustering is employed to coalesce nearby detections. These steps of computing entropy and differential pdf at every point in the image at multiple scales are computationally intensive, especially when considering 3D volumetric imagery common in medical imaging.
In this paper we propose an approach for fast salient region detection in 3D imagery based on the Kadir-Brady approach. We show that in the vicinity of a salient region, entropy is a monotonically increasing function of the degree of overlap of a candidate window with the salient region. This allows us to initialize a sparse seed-point grid as the set of tentative salient region centers and iteratively converge to the local entropy maxima. This reduces the computation considerably compared to the Kadir Brady approach of performing this computation at every point in the image. We propose two different approaches for achieving this. The first approach involves evaluating entropy in the four quadrants around the seed point and iteratively moving in the direction that increases entropy. In particular, the effective displacement is calculated as the summation of four quadrant displacements weighted by corresponding normalized entropies. The second approach we propose makes use of pixel level information in a mean shift tracking framework, to effect entropy maximizing moves. Specifically, we propose the use of uniform pmf as the target distribution to seek high entropy regions. We also extend this Saliency shift algorithm to capture 3D salient regions in medical volumes by estimating orientation using 3D extension of ABMSOD algorithm . We develop an optimized GPU implementation of the saliency seek algorithm to enable accelerate the detection of salient regions. We demonstrate results for Left ventricle detection in PET and for the tumor map in brain MR sequence.
2 Technical details
As discussed before, the Kadir Brady approach considers different sized neighborhoods of every point in an image. The maximum of the product of the local entropy and the rate of the pdf as a function of scale of the neighborhood is then computed. If this maxima exceeds a pre-decided threshold the point is designated as a salient point. Subsequently Expectation-Maximization (EM) based clustering is employed to coalesce nearby detections. The key problem with this approach is the need to perform this computation at every point in the image which as we show can be quite unnecessary. To motivate this, consider a toy example consisting of a single salient region as shown in the figure 1.
We assume that the intensity distribution
in the salient region follows a uniform distribution and the background distributionfollows a delta distribution. Specifically and . The choice of these distributions capture the fact that salient regions have higher entropy as compared to their background in an exaggerated sense. Now consider a candidate window with an overlap fraction with the salient region. It is easy to see that the intensity distribution of this window will be a mixture distribution . The entropy of this mixture distribution is given by:
Substituting the distributions for and from above, the entropy evaluates to
The rate of change of this entropy as a function of is given by 3
Note that for and for , . Also, it can be seen that the above expression in 3 for the derivative of the entropy is positive, for ranging between 0 and 1. This shows that for the simple toy example that we have considered, that entropy increases monotonically as a function of overlap percentage in the vicinity of the salient object. In turn, this relationship suggests that, in the neighborhood of a salient region, an iterative algorithm that increases entropy of the enclosed region, will increase the overlap percentage. This would result in moving the candidate window closer to the salient object, obviating the need to perform an exhaustive entropy computation at every point.
In the following sections we describe two approaches which use the above idea to detect salient regions starting from a sparse set of initial seed points. The first approach involves evaluating entropy in the four quadrants centered around the seed point and iteratively moving in the direction that increases entropy. In particular, the effective displacement is calculated as the summation of four quadrant displacements weighted by corresponding normalized entropies. The second approach we propose makes use of mean shift tracking framework to effect entropy maximizing moves. Specifically, we propose the use of uniform pmf as the target distribution to seek high entropy regions. We would like to note here that while the above assumptions made regarding the distributions of the salient region and its background do not strictly hold for real world problems, the proposed entropy maximizing algorithms, nevertheless succeed in converging to the salient regions.
2.2 Quadrant method
In this algorithm, we initialize a uniformly sampled grid of points on the image. In order to capture all the entropy maxima, we assume that each salient region has at least one grid points within its vicinity. Let us denote the locations of these initial grid points with where the subscript represents the index of the point and the superscript denotes the iteration. Each of the points in the grid represents the tentative center of a salient region for the corresponding iteration. At every step, direction of increasing entropy has to be calculated. This is achieved by dividing the neighborhood of the point considered into four quadrants as shown in the figure below. In each of the quadrants we take windows of all scales in a range varying over
and compute respective entropies. For each quadrant, the window at the scale that gives maximum entropy is selected. The effective shift vectoris calculated as the summation of the displacement vectors corresponding to the optimal scales weighted by their normalized entropies. For more details see Algorithm 1 and Figure 2.
Once entropy maxima are found we calculate the change in pdf about the optimal scale. Salient regions are obtained by filtering out the high entropy noise by applying a lower bound on the pdf-difference. A sample result of this approach is shown in Figure 2. The quadrant approach considers variable sized neighborhoods of a test point and makes an entropy weighted move towards the salient region. The entropy weights are coarse, however, in the sense that they are computed for each quadrant rather than pixel level. In the next section we propose a more fine-grained approach to make more precise shifts to higher entropy regions. While the idea of using quadrants is feasible for 2D problems, its extension to 3D volumes is computationally intensive. Furthermore it cannot deal with anisotropic salient regions as described in the next section.
2.3 Saliency shift
Mean shift tracking  is a popular approach for non-rigid object tracking based on maximizing the Bhattacharyya coefficient between histogram of the target and the candidate window in successive frames. This maximization is achieved by an iterative procedure which involves computing a weight map over the candidate window that reflects the likelihood of a pixel belonging to the target histogram. A shift vector pointing to the centroid of the weight map is then computed and the procedure is repeated till convergence. We adapt this concept for maximizing entropy. One problem with this is that unlike the visual tracking problem where the target histogram refers to a fixed template, we do not have a specific target histogram to work with for the entropy problem. This can be addressed as follows: We note that among all discrete distributions, the uniform distribution has highest entropy. In order to adapt the meanshift tracking procedure to seek entropy maxima, a simple solution is to use the uniform distribution as the target histogram. We now describe the procedure in detail. A sparse set of seed points is distributed throughout the image. A search window is centered around each candidate point and candidate feature distribution of the the window is calculated by aggregating the weighted kernel responses over all the pixels in it.
where is the histogram bin index, is the bin number in which pixel lies, is the normalization constant, denotes the bandwidth matrix, is the discrete Kronecker delta function and is the euclidean distance. As mentioned above, to move towards higher entropy regions, we define the target distribution to be uniform . The algorithm estimates the shift by maximizing the similarity between the candidate distribution and uniform pdf, measured in terms of the Bhattacharyya coefficient given by . Each pixel in the candidate window is assigned weights given by equation 5
which in effect assigns higher importance to the candidate pixels belonging to the target distribution. At each iteration, the next position of the seed point is calculated by a kernel weighted mean of the candidate window pixels
The algorithm increasing the Bhattacharyya coefficient with the uniform distribution effectively moves the candidate in a higher entropy direction till the maximum is achieved. All maxima with entropy values above a certain pre-defined threshold are selected and the change in pdf is calculated at the specified scale. Regions with high rate of change of pdf are considered to be salient at that scale. We use above mentioned framework for identifying 3D salient regions in medical volumes. Specifically, we use cuboid search windows that are initially distributed randomly throughout the medical volume. Each of these windows has a scale parameter generated uniformly in a range constrained by the size of the volume. The size of the window does not change across iterations i.e the bandwidth matrix remains the same throughout the meanshift procedure. We use the identity function as our kernel . This approach worked well for isotropic salient regions as well as for anisotropic regions aligned with the coordinate axes. However for salient regions which are anisotropic and oblique, the cuboids lack the flexibility to precisely encapsulate the salient region resulting in a higher fraction of the background pixels contributing to the candidate histogram thus reducing its entropy and resulting in a missed detection as can be seen in Figure 3.
To address this problem, we use Adaptive Bandwidth Meanshift for Object Detection (ABMSOD) algorithm . ABMSOD is a meanshift based iterative algorithm used for 2D object detection in computer vision. It simultaneously estimates the position, scale and orientation of the target object. The initialization step involves randomly scattering elliptical candidate windows with varying sizes throughout the volume. The iterative step of the algorithm consists of two parts. In the first part, new estimate of the candidate location is calculated using the conventional meanshift framework. In the second part, we estimate the scale and orientation parameters that are encoded within the bandwidth matrix of the candidate window. For details of the implementation see 2. The optimal bandwidth corresponds to the scale and orientation that maximizes the Bhattacharyya coefficient at the current position.  derives the expression for estimating the optimum bandwidth matrix .
 validates the expression 7 for higher dimensions and uses the ABMSOD framework for localizing 3D structures in medical volumes. For this, the feature histogram of the anatomy to be localized is used as the target distribution. We adopt the same framework for maximizing the entropy using uniform pdf as the target distribution. However the algorithm described in 2 is computationally expensive and is amenable to parallelization using GPUs. The scheme used for parallelization of our algorithm is described in the next subsection.
2.4 Parallelization using GPUs
In this subsection, we describe the scheme used to parallelize the ABMSOD algorithm on a GPU. A similar scheme is used to accelerate the conventional Meanshift algorithm. The GPU is a data-parallel computing device consisting of a set of multiprocessing units (SM), each of which is a set of SIMD (single instruction multiple data) processing cores. Each SM has a fixed number of registers and a fast on-chip memory that is shared among its SIMD cores. The different SMs share a slower off-chip device memory. Constant memory and texture memory are read-only regions of the device memory and accesses to these regions are cached. Local and global memory refers to read-write regions of the device memory and its accesses are not cached.In the CUDA context, the GPU is called device, whereas the CPU is called host. Kernel refers to an application that is executed on the GPU. A CUDA kernel is launched on the GPU as a grid of thread blocks. A thread block contains a fixed number of threads. A thread block is executed on one of the multiprocessors and multiple thread blocks can be run on the same multiprocessor (For details on GPU architecture see ). The independent exploration of different search paths originating from each initial random point is distributed amongst thread blocks. In addition, using the finer level of parallelism offered by threads within a thread block, we further parallelize operations within each search iteration. We make the threads in a thread block handle computations for a subset of the voxels from the window. Computations such as the application of the kernel function, construction of the candidate histogram, weight assignment to the voxels, H matrix computation etc. are all done in parallel where each thread is responsible for a set of voxels. Summation of values across threads is performed through parallel reduction. To reduce the synchronization operations among threads during histogram computation, we allow each thread to construct a local histogram of the voxels handled by that thread. These histograms are stored in shared memory for fast access. After all local histograms are constructed, the histograms are bin wise aggregated by the threads in parallel to form the global candidate histogram. The volume is stored in a 3D texture and the access to the volume is ensured to be in a way that maximizes spatial locality and efficiently utilizes the texture cache. Constant variables the target histogram are stored in constant memory to utilize the constant cache. Data shared by threads within a block like the local histograms etc. are stored in shared memory for fast retrieval through the broadcast mechanism supported by CUDA. Enough number of threads are launched to keep all the cores of each streaming multiprocessor (SM) busy. We try and maximize the occupancy for each SM. The occupancy is however limited by the amount of shared memory and the number of registers available per SM. We also ensure that there is no register spilling and uncoalesced global memory accesses.
We consider the usage of fast salient region detection in order to speed up medical workflows. Specifically, we demonstrate the efficacy of our algorithm in estimating the location and size of left ventricle in myocardial perfusion imaging and quantifying brain tumor in MR images acquired by Fluid Attenuated Inversion Recovery Pulse Sequence (FLAIR).
3.1 LV detection in Myocardial perfusion imaging
Myocardial perfusion imaging is a nuclear medicine procedure that illustrates the function of the heart muscle (myocardium). The patient is typically administered FDG - fluorodeoxyglucose - which has radioactive isotope fluorine that emits imagable positrons. This technique captures the functional information of the body as against structural information from CT because the glucose part in the radiopharmaceutical rushes to regions in the body such as the myocardium which have high metabolic activity . Physicians require the myocardium dataset in a standard orientation and scale. Current techniques that estimate cardiac orientation  need a good initial mask around the myocardium otherwise, liver and other nearby organs having high uptake contribute to a biased estimate of the orientation and size.
Detection of Left ventricle in a medical volume of size using object appearance based classifier at all possible locations and scales  is extremely compute intensive. Myocardium in PET has increased uptake value because of high metabolic activity. One way to solve the problem of high computation is using this high uptake principle. Setting a high SUV threshold gives a rough initial estimate of the heart location. But we also observed large number of false positives due to noise, liver and other organs 1. Also, such a simple threshold does not give any estimate of the size in each dimension. Exploring alternatives, we observed that left ventricular region can be considered salient because of the high entropy and uniqueness of its pixel intensity distribution compared to its immediate surroundings. In order to leverage that property, we propose to use 3D saliency seek as a pre-processing step to identify tentative candidates having left ventricle. The candidates would then be further processed for location and cardiac orientation estimation. This 2 step approach eliminates false positives, effectively improving the accuracy and reducing compute time. We apply the saliency seek algorithm on an initial sparse seed point grid distributed uniformly across the volume at multiple initial scales. Once these points converge to local saliency maxima, we pick the top 20 having high pdf difference w.r.t surrounding. We observed that one of the top 20 detections is a true positive in 29 of the 32 volumes dataset giving us a recall rate of .
|Method||Recall Rate||Final precision|
|(for 20 detections)||
(using Hu moments)
|Threshold||15/32 = 46 %||N.A|
|Saliency Seek||29/32 = 91 %||25/32|
In order to increase the precision, i.e. to identify the true positives among the tentative candidates, we used Hu moments . These central moments have been carefully designed to be invariant to translation, rotation, scale and so they serve as a good choice for describing local appearance. We evaluated the set of 7 invariant Hu moments on the center slice of a heart template forming a 7 dimensional training feature vector. For each test volume, we then compute the 7 dimensional Hu-moment test vectors on five slices about the central slice for each detection. The detection which minimizes the average euclidean distance with the template across the five slices is chosen as the most accurate estimate for that volume. We used a test dataset of 32 PET volumes having a combination of ’rest’ and stress’ acquisitions each of size . The number of initial seed points was chosen to be 400. Also the average number of iterations for convergence was
. We are able to detect the left ventricle by minimizing the distance in 25 volumes accurately. The average Jaccard index between groundtruths and successful detections is as high as 41.36%. See figure4. Since we have to search for the ventricle structure throughout the PET volume, the algorithm is computationally intensive. To detect the left ventricle in a reasonable amount of time, we use the Nvidia Tesla C2050 GPU as an accelarator to speed up the saliency seek algorithm. This GPU has 14 multi-processors each having 32 cuda cores, resulting in a total of 448 cuda cores. The cores are clocked at 1.15 GHz. Each multi-processor has 48 KB of shared memory and 32 K registers. The GPU device has 3 GB of device memory. With the parallelization scheme as decribed in the previous section, the saliency seek algorithm is able to detect the left ventricle in 4.1 seconds which is 10 times faster as compared to a sequential implementation.
3.2 Brain tumor quantification in MR
In this section we introduce the concept of user defined saliency and demonstrate its accuracy in localizing brain tumor in MR volumes. In brain MRI, the accurate location of tumor and edema is essential for minimizing the damage to healthy tissue. In Fluid Attenuated Inversion Recovery (FLAIR) sequences, even subtle lesions stand out against attenuated csf fluids making this modality conducive for tumor detection i.e. lesions can be distinguished based on intensity values alone and can be considered salient at the right window level setting. Medical images like CT, MR and PET have high dynamic ranges and different tissue types in the body lie in mutually exclusive intensity ranges  . For example in CT, lung tissue ranges between -400 and -600 hounsfield units, fat tissue between -60 and -100, soft tissue lies between 40 and 80HU and bone from 400 to 1000 . In such cases, the range of intensities defined by the window-level setting to be used is determined by anatomy of interest and the application at hand. We consulted a few clinical experts and found out the specific window-level setting used in FLAIR sequences to illustrate malignant lesions. Entropy evaluated over complete dynamic range of MR gave us unwanted regions straddling bone-tissue and air-tissue boundaries. We noticed much more relevant entropy values when the entropy is computed on the constrained intensity range coming from tumor specific window-level. This new entropy quantifies the variation in the distribution of pixels falling in the relevant window-level.
We use the proposed saliency seek algorithm to locate tumors present in such MR volumes. However, we incorporated the new entropy evaluation as discussed above, using only a constrained intensity range. We found empirical evidence showing this idea of application specific saliency improving the detection accuracy significantly. We present the preliminary results of our evaluation of this concept on MR datasets. Brain tumor image data used in this work were obtained from . The challenge database  contains fully anonymized images with manually labeled tumor groundtruth. This dataset consists volumes of size . We used the saliency seek algorithm in 15 such volumes and were able to successfully localize the tumor in all the 15 volumes with an average Jaccard index of .
In order to compare our results with the state of the art we chose to apply a modified version of the Itti Koch approach on axial 2D projections of the brain MR images. The modified algorithm considers only pixels in the constrained intensity range in constructing the saliency maps. In figure 6 the first row shows 2D slices from 3 MR volumes with a tumour. The second row isolates the tumour in each of the volume. The third row consists of the saliency maps obtained using the modified Itti Koch algorithm. The fourth row consists of the set of detections obtained from the saliency seek algorithm. As can be seen from the figure, the Itti Koch algorithm is not able to identify the tumour as a salient region in all the cases, whereas saliency seek employing the metrics of entropy and pdf difference is successfully able localize the tumour. Since saliency seek in this case consists of searching for the tumor from numerous seed points scattered in a large sized MR volume, the acceleration due to GPUs becomes important for detection in a reasonable amount of time. The number of initial seed points in this case was 700 with the average number of iterations was . We use the Tesla C2050 GPU as described in 3.1 and are able to successfully localize the tumor in 7.8 seconds. We obtain a speedup of around 80x over the sequential CPU implementation.
In this paper, we showed that in the vicinity of a salient region, entropy is a monotonically increasing function of the degree of overlap of a candidate window with the salient region and proposed two iterative approaches to locate salient regions from a sparse grid of seed-points. The first used a four quadrant approach to find entropy maximizing moves. The second used meanshift tracking framework using a uniform target distribution in meanshift iterations for seeking high entropy regions. We developed an efficient GPU implementation of the proposed algorithm for quickly detecting salient regions in 3D and showed promising results for Myocardium detection in PET volumes and tumor quantification in brain MR sequences. The framework can be easily extended for visual tracking by using the converged salient regions from previous frames. Also incorporating target specific feature information within the saliency shift iterations would be another interesting extension.
-  Nvidia cuda c programming guide. http://developer.download.nvidia.com/compute/cuda/32/toolkit/docs/CUDA.
-  C. Chan, J. Kim, D. Feng, and W. Cai. Interactive fusion and contrast enhancement for whole body pet/ct data using multi-image pixel composting. In Nuclear Science Symposium Conference Record, 2005 IEEE, volume 5, pages 2618–2621. IEEE, 2005.
-  X. Chen, H. Huang, H. Zheng, and C. Li. Adaptive bandwidth mean shift object detection. In RAM, pages 210–215, 2008.
-  X. Chen, Y. Zhou, X. Huang, and C. Li. Adaptive bandwidth mean shift object tracking. In Robotics, Automation and Mechatronics, 2008 IEEE Conference on, pages 1011–1017, 2008.
D. Comaniciu, V. Ramesh, and P. Meer.
Real-time tracking of non-rigid objects using mean shift.
Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, volume 2, pages 142–149. IEEE, 2000.
-  M. Corbetta and G. L. Shulman. Control of goal-directed and stimulus-driven attention in the brain. Nature reviews. Neuroscience, 3(3):201–215, Mar. 2002.
-  D. N. M. P. Dept. Image quality in different modalities of medical imaging with focus on mammography, 14-12-2004.
-  R. Desimone and J. Duncan. Neural Mechanisms of Selective Visual Attention. Annual Review of Neuroscience, 18(1):193–222, 1995.
-  S. Frintrop. The High Repeatability of Salient Regions.
-  S. Frintrop, E. Rome, and H. I. Christensen. Computational visual attention systems and their cognitive foundations: A survey. ACM Transactions on Applied Perception (TAP), 7(1):6, 2010.
-  D. Gao, V. Mahadevan, and N. Vasconcelos. The discriminant center-surround hypothesis for bottom-up saliency. In NIPS, pages 497–504. 2008.
-  W. Hong, B. Georgescu, X. S. Zhou, S. Krishnan, Y. Ma, and D. Comaniciu. Database-guided simultaneous multi-slice 3d segmentation for volumetric data. In Computer Vision–ECCV 2006, pages 397–409. Springer, 2006.
-  M.-K. Hu. Visual pattern recognition by moment invariants. Information Theory, IRE Transactions on, 8(2):179–187, 1962.
-  L. Itti and C. Koch. Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3):194–203, Mar 2001.
-  L. Itti and C. Koch. Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, 10(1):161–169, Jan 2001.
-  S. Jackson and R. Thomas. Introduction to CT Physics.
-  T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In Computer Vision, 2009 IEEE 12th international conference on, pages 2106–2113. IEEE, 2009.
-  T. Kadir and M. Brady. Saliency, scale and image description. Int. J. Comput. Vision, 45(2):83–105, Nov. 2001.
-  A. C. Le Ngo, G. Qiu, G. Underwood, L.-M. Ang, and K. P. Seng. Visual saliency based on fast nonparametric multidimensional entropy estimation. In ICASSP, pages 1305–1308. IEEE, 2012.
-  D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, volume 2, pages 1150–1157. Ieee, 1999.
-  B. Menze, A. Jakab, S.Bauer, M. Reyes, M. Prastawa, and K. V. Leemput. Multimodal brain tumor segmentation. MICCAI Challenge.
-  A. Oliva, A. Torralba, M. S. Castelhano, and J. M. Henderson. Top-down control of visual attention in object detection. In Proc. ICIP 2003.
-  S. Vaswani, R. Thota, N. Vydyanathan, and A. Kale. Fast 3d structure localization in medical volumes using cuda-enabled gpus. In Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on, pages 614–620, 2012.