I Introduction
The steady launch of micro and nano remotesensing satellites is opening new possibilities for monitoring natural and humandriven processes on the surface of the planet. These instruments obtain imagery with resolutions of a few meters; when assembled as constellations, they are expected to offer reimaging intervals on the order of days for any given point on Earth. Change detection at these spatial and temporal scales becomes a freshly challenging problem.
One key feature of imagery from a constellation of lightweight satellites is that it derives from many different cameras, with individually distinct geometrical issues (e.g. lens distortions). This can lead to inconsistent registration and photometric calibration. In these circumstances, traditional pixelbased change detection techniques can no longer be relied upon. Further, pixelbased methods have always required of the analyst a certain artful, ad hoc selection of features and a decision threshold on the histogram of features. (For recent reviews and references to the extensive literature on change detection in remote sensing imagery, see [1, 2].) One would like to be able, quite generally, to select any pair of timeseparated images of a scene and identify locations that exhibit significant change.
Within a remotesensing image, humanordered structures are often set off by sharp edges, and in particular sharp corners, where pixel gradients are large in multiple directions. Local features such as those derived from David Lowe’s Scale Invariant Feature Transform (SIFT), [3, 4], are both local and specific to points of sharp contrast within an image. They are therefore sensitive to salient features of buildings, roads, pipelines, fields and furrows, reservoirs, clearcuts, slag heaps, and fluid discharge, to name a few examples, as well as to edges of forests, glaciers, or deserts, wherever sharp boundaries are marked on the landscape. SIFT features also offer some invariance to the position, scale, and projected geometry of features within an image. A set of relatively novel approaches to the change detection problem [5, 6, 7] seek to exploit these properties by matching local features across timesequenced pairs of images.
The idea behind local feature matching is as follows. The machine first extracts keypoints and descriptors defining local features for each of a pair of images. For every keypoint in one image, a keypoint is sought in the other image that is close to the first in both image geometry and descriptor space. Assuming stringent standards in this matching process, matched keypoints indicate a correspondence between local patches in the two images. Unmatched keypoints can result from variations in the way the images were captured, failures of the matching algorithm, or actual changes in structures on the ground. The trick, of course, is to tease apart those possibilities. This can be done by considering the keypoints in the immediate neighborhood of a given unmatched keypoint. Statistical anomalies in the matches for that set of keypoints, as compared to the matches of keypoints globally on the image, are indicative of actual, ontheground change.
Dellinger et al. [7]
implemented this idea by modeling the appearance of SIFT keypoints as a binomial point process on the image. Given the number of matched keypoints on a neighborhood, they computed the probability that the total number of detected keypoints on the neighborhood would be greater than or equal to the number found. A low probability indicates an excess of detected as compared to matched keypoints, which can then be thresholded to give a binary change/nochange proposal for the neighborhood. In our experience, the binomial statistical model significantly outperforms a naive comparison of keypoint match rates between a neighborhood and the image as a whole.
Despite the invariances built into the SIFT features, their limitations make the change detection protocol vulnerable to variations in atmospheric conditions, in satellite and solar viewing angles, and in geometric issues with image capture. These are familiar challenges for any change detection algorithm. In this case the problem manifests as a very low global match rate between SIFT keypoints. We often observe match rates as low as a few percent, even for a pair of images from a single provider with little obvious humaninterpretable change.
When almost all keypoints remain unmatched, distinguishing those few that don’t match due to ontheground change becomes a needleinthehaystack problem. The statistics don’t support a reliable comparison between a given neighborhood and the rest of the image. Very often the problem lies not with matching, per se. Rather, the SIFT algorithm identifies different structures in the two images on which to extract features. In this case, relaxing the match criterion simply trades the problem of unmatched keypoints for spurious matches.
Pursuing these observations, we offer steps to operationalize change detection in the generalized sense imagined above. The critical requirement is a high keypoint match rate in regions of the image where there is no significant ontheground change. We approach this challenge in three parts: (1) We introduce KAZE features [8] as an alternate local feature detector. Constructed through nonlinear diffusion filtering, KAZE features improve on SIFT invariances and add a measure of invariance to blurring, Gaussian noise, and nonlinear deformations of the images. (2) We relax the initial match criteria for keypoints, admitting as possible matches the knearestneighbors in descriptor space, but then vet these proposals with a spatial proximity test and a bidirectional consistency check: a match of keypoint in image one to keypoint in image two is accepted only if the top available match for keypoint is likewise keypoint . Together these tactics push our average keypoint match rate above 30%. Finally, (3) we propose a statistical test on the distribution of matched keypoints that inverts the roles of modeled and tested data from Dellinger et al. and can be expressed in terms of a marked, inhomogeneous Poisson point process on the image. The critical advantage of this new formulation is control: By varying the probability threshold, the generation of change proposals can be made to be as permissive or selective as desired.
Although in the long term we envision applications on hightimeresolution image sequences from micro and nanosatellite constellations, we work here with National Agriculture Imagery Program (NAIP) aerial imagery of California, predominantly from years 2010 and 2012 [9]. The design choice was to test the algorithm on a readily available image product, as delivered. We accessed the NAIP imagery from the public data catalog of Google Earth Engine [10]. Via the Earth Engine interface, the images were downsampled twice by averaging on 2x2pixel squares, from native 1m resolution, and served in Earth Engine standard EPSG:4326 projection. NAIP imagery comes radiometrically corrected, but calibrations are not consistent from year 2010 to year 2012. The algorithm nonetheless performs in this environment.
As part of our contribution, we present a small benchmark dataset for change detection on this imagery, 100 pairs of images handselected and handlabeled for change or lack thereof. It is a matter of interpretation as to what constitutes “significant” or “interesting” change – the change we want to capture. As a working definition, we resolved to seek changes in structures of the sort captured by local features, concentrated in an area covering at least one percent of the image frame. The positive instances in the dataset consist of largescale works of residential, industrial, and civic construction.
The paper will proceed as follows. We begin in Section II by reviewing the algorithm of Dellinger et al. based on SIFTfeature matching and the binomial statistical model. In sections IIA–IIC we explain and explore the consequences of the algorithmic developments enumerated above, and in IID we incorporate a routine for aggregating change proposals across an image. In III we describe the genesis of our benchmark dataset. In IV we present our results: On handcurated image pairs, we observe the challenges associated with low overall match rates and, by contrast, the consolidation of change centers and elimination of false detections which we achieve with better feature matching. On the benchmark dataset, we explore the response characteristics of the algorithm with varying change probability threshold, showing a peak overall accuracy of 68% and a proportion of true positives among total detections that spikes to 100%.
Ii Methodology and Method
We want to identify distinguishing features of a remotesensing image and map them to corresponding elements of a later or earlier image of the same scene. Where such a mapping does not exist, we need to be able to determine if that is or is not a result of a material difference between the two scenes.
To execute on this program, we need three components. The first is a protocol for extracting features from the two scenes. These features should capture midlevel semantics of the scene while being robust to finegrained variations caused by differences in the imagecapture process. The standard approach for imagematching applications is to use Lowe’s SIFT features. However, that algorithm was developed to target invariances to the affine transformations that are common in digital photographic applications. We would instead prefer features that are robust to the particular hazards of remote sensing, including differences in illumination, solar and sensor viewing angle, lens distortion, and the sensors themselves.
The second component is a mechanism for matching features between the two scenes. Typically this would be based on some metric of closeness on feature descriptor space, followed by a validation check such as Lowe’s ratio test or a test of geometric consistency between query and target keypoints.
The final component is a statistical model that captures the expected natural variability in matching, since perfect matching is impossible even where there is no meaningful change between two scenes. This model would be used to test for atypical distributions of matches that correspond to meaningful change. We explore each of these components in turn, taking the work of Dellinger et al. [7] as baseline.
In their process, Dellinger et al. begin by extracting SIFT keypoints and descriptors for two timesequenced images of a scene. They tentatively match each keypoint in one image to the keypoint in the other image with the closest descriptor. They then filter these proposals by checking against a RANSACestimated affine transformation between the two images. In developing a mechanism to identify change points from among the set of unmatched keypoints, the authors make two key assumptions:

Detected keypoints are distributed on the image according to a binomial spatial point process.

In the absence of ontheground change, matched keypoints are distributed with the same binomial probability of success as detected keypoints.
We will return to examine the validity of these assumptions, but in general one would not expect a binomial or Poisson process to model the spatial distribution of keypoints. The keypoints cluster along the distinctive underlying spatial structures captured in the image, and especially outside of densely built urban areas, this leads to distributions that might better be described in terms of nodes, filaments, and voids.
As a practical mechanism to account for inhomogeneity or interaction between keypoints, Dellinger et al. recompute the binomial probability of success locally on each neighborhood subject to the statistical test. They elect to model the matches, determining the local density of the point process by counting the number of matches on a fixedsized neighborhood . Given total matches on the image, the local probability of success is . This defines a binomial model for the distribution of keypoints, of which total are detected on the image. They then test the appearance of keypoints against this model. If an anomalously large number of keypoints are detected on a neighborhood, they flag the neighborhood for change.
Given detected keypoints on the image, detected keypoints on a neighborhood, and a hypothesized local probability of success , the neighborhood is flagged as a candidate for change when the probability of finding or more keypoints is smaller than some threshold :
(1) 
(In the referenced paper the probability is translated into a fraction of detected keypoints and the threshold correspondingly rescaled by a factor of .) Each unmatched keypoint defines a neighborhood , taken to be the ball of radius around that keypoint and truncated if necessary at image edges. The change proposal, while centered on a specific unmatched keypoint, implicates the distribution of matched and unmatched keypoints on the entire neighborhood.
The parameters and are tuned according to the requirements of a specific application. The threshold determines the stringency of the change criterion, while the radius defines the spatial precision of the test. For precision, one would like to be as small as possible, but of course a smaller neighborhood offers fewer keypoints on which to derive a reliable statistical model. This becomes a limiting factor especially when imagewide match rates are low. In the extreme case in which there are no matched keypoints on a neighborhood, the above test will necessarily flag the neighborhood for change  yet the lack of matches could just as well occur on a small neighborhood of an image whose keypoints are only sparsely matched due to differential effects of image capture in its image pair.
Broadly speaking, the statistical test checks for a local deficit of matched keypoints (or corresponding excess of detected keypoints) as compared to the imagewide mean. Good global matching is essential if the local deficit is to stand out against the global match background. Because of the importance of high match rates, we devote significant attention to this issue.
Iia Local Features
Inspired by Alcantarilla et al. [8], we examine the impact of using KAZE features instead of SIFT features for change detection. KAZE features are based on nonlinear diffusion filtering, which tends to respect edges and spatial structure more than the Gaussian scale space of SIFT. Because the imagery we are analyzing (i.e. buildings, roads, agricultural fields) is highly geometric, this seems intuitively justifiable.
For tests of keypoint distribution and matching, we collected a corpus of image pairs from across the state of California, paired by NAIP data years 2010 and 2012. We began by specifying eight seed locations in the state, with half in heavily urban areas and half in suburban or rural areas. For each seed we obtained the 2010 and 2012 NAIP images from Google Earth Engine at 64 locations, in an 8 x 8 grid spaced at increments of 0.02 degrees latitude and 0.03 degrees longitude to the north and east, respectively, of the seed location. This produces 512 image pairs semirandomly distributed across a variety of land cover types. Each image is 512 pixels wide by approximately 400 pixels high, varying in the northsouth direction due to the EPSG:4326 projection. (Across the state of California, vertical image size ranges from 435 pixels at Tijuana to 383 pixels at the Oregon border.) Regardless, an image represents an area x on the ground, with an approximate resolution of . We will refer to this set of images as our matching image corpus.
We extracted KAZE keypoints and descriptors using the OpenCV 3.2 implementation. The KAZE algorithm includes one dimensionless sensitivity parameter, called , although this is entirely distinct from the binomial probability threshold we refer to often in the course of this paper. When using the default value for the KAZE threshold, 0.001, we found significantly fewer keypoints than with SIFT (on average only ). Tuning this parameter to 0.0003, we found roughly the same number, although this varies across our matching image corpus between and (median ). We use this parameter value for the remainder of the paper.
To explore the extent to which keypoints are distributed according to a binomial (or the closely related Poisson) point process, we analyzed the distribution on our matching image corpus. Using a rectangular grid of 16 x 20 neighborhoods (”patches”), we calculated the cumulative distribution function of the number of keypoints per patch and computed the KolmogorovSmirnov (KS) statistic relative to a binomial distribution with the same total number of keypoints and mean number of keypoints per patch. (In some cases we discarded a small number of pixels from the edge of the image to ensure an equal number of pixels in each patch.) The resulting KS statistic shows a large variation across our matching image corpus, with values as high as 0.5. This decreases to approximately 0.05 for images with the highest number of total keypoints. See Fig.
1.The lowest KS statistics tend to occur in images with consistent land cover throughout, such as the urban scene of Fig. 1(b), while higher values occur in scenes that include distinct landcover boundaries, as for the reservoir spreading its tendrils among alternately densely and sparsely treecovered mountain slopes of Fig. 1(c). Of particular note is the excess of squares exhibiting zero keypoints in the latter scene. The statistical model described above posits an inhomogeneous point process to account for some of this variation. Given the readily apparent filaments and voids in high KS statistic scenes, improved modeling might focus instead on interaction among keypoints.
Finally, in passing, we observe that this analysis of keypoint distribution might be useful for classifying images into those with homogeneous versus varied land cover, although that lies beyond the scope of this paper.
IiB Matching
As suggested by previous work, KAZE features result in significantly higher match rates than SIFT features for the same images[8]. For matching we employed a routine from OpenCV 3.2 called BFMatcher.kNNmatch(). Given a keypoint descriptor in one image, it seeks the closest elements in descriptor space across all keypoint descriptors generated for the other image. In our initial approach, we set . We then imposed two consistency checks:
First, we filtered these match proposals with a geometric proximity test, discarding those that are separated in image coordinates by a distance of more than 4 pixels. We chose this value based on an estimate of the average registration offset between our images, which is approximately half this value. A hard proximity cutoff is appropriate in a context with known, bounded offsets in image registration. Establishing a fixed cutoff requires zero time and succeeds always. By contrast, generating a homography transform requires processing time and is subject to failure when based on potentially spurious nearestdescriptorspace proposals. The drawback to the proximity test is that it requires a priori knowledge of the average image registration error. We experimented with a dynamic approach, determining a cutoff on each image from the histogram of pixel offsets for the proposals returned by BFMatcher(). Faced with a new source of imagery, we could deploy this routine in preprocessing to determine an average registration offset, but here we used our experience with NAIP imagery in fixing a 4pixel cutoff.
Second, we required that matches be symmetric, such that if the proposed match for keypoint in image one is keypoint in image two, then the proposed match for keypoint is likewise keypoint . This crosscheck expresses a logically necessary condition for two keypoints to match, but it is not met by all proposals which pass the proximity test; we found that approximately of proposed matches fail this test for our matching image corpus.
We define the match rate for a pair of images as twice the number of successful matches divided by the sum of the total number of keypoints detected in each image. Using SIFT features on our matching image corpus, the average match rate was , with a small positive correlation with total keypoint number. When we relaxed the proximity test to 8 pixels, the average match rate increased to ; when we tightened it to 2 pixels, the average match rate decreased to . With KAZE features and the 4pixel proximity cutoff, the average match rate was , 2.3 times higher than with SIFT.
To improve our match rates, we continued to use KAZE features and expanded the parameters of the initial search performed by BFMatcher.kNNmatch() to return the top nearest neighbors in descriptor space, typically with . In of cases the nearest neighbor passed the (4pixel) proximity test; in the remaining cases we searched for the nextclosest match that passed. We repeated this search procedure, reversing the role of the two images, and performed the crosscheck described above. This approach increased the average match rate to . Results of these match tests are summarized in Fig. 2 and in Table I.
SIFT, 4pixel proximity text, crosscheck  

KAZE, 4pixel proximity test, crosscheck  
KAZE, kNN matching (), 4pixel proximity test, crosscheck 
IiC Statistical tests
The idea behind change detection based on local feature matching is that ontheground change ought to be reflected in an anomalously low match rate among features in a corresponding region of an image. An immediate way to operationalize this approach would be to threshold the match rate on a neighborhood as compared to the average match rate on the image, which amounts to postulating and thresholding a triangular probability distribution for matched keypoints. However, this test does not give consistent results: A threshold tuned to yield reasonable detections for one scene results in significant false positives or negatives in others.
Alternately, one could consider the matched keypoints to be uniformly distributed among total keypoints on the neighborhood and threshold the binomial probability that the given number of matches appear. We assign the same meaning to variables as above. If there are
matches from total detected keypoints on the image, the probability that a given keypoint is matched is . This gives rise to the binomial distribution of matches among the detected keypoints on the neighborhood . If matches out of keypoints are observed on the neighborhood, the change criterion is(2) 
This approach yields results qualitatively similar to our preferred statistical test, equation (3) below. We will return to discuss the close mathematical relationship between them.
Dellinger et al. postulate a binomial point process on the image, electing to model the matched keypoints and subsequently test the distribution of all keypoints against this model. Intuitively, we prefer the inverse, to model the keypoints and test the distribution of matches. We thus derive a (third, distinct) binomial distribution , with probability of success dictated by the local distribution of detected keypoints, . A change proposal is generated when the probability of finding or fewer matches on the neighborhood, out of total matches on the image, is less than a threshold :
(3) 
In testing for the appearance of anomalously few matches, as opposed to anomalously many keypoints, the survival function of (1) is replaced here by the cumulative distribution function. The critical practical advantage offered by (3) as compared to (1) is its sensitivity to the threshold . For instance, to eliminate false positives at the expense of missed detections, we would want to be able to tune down the number of proposed change points by tightening the threshold. In the case that there are, for example, zero, one, two, or three matched keypoints on , as often happens, the model for (1) is derived from those very few data points. The probabilities generated are zero or near zero, at the limits of machine precision. Such neighborhoods are always flagged for change, for any viable threshold . On the other hand, the test (3) responds readily to a reduction in threshold. The probability distribution is derived from the more numerous keypoints, and the cumulative distribution function outputs probabilities that are nonzero and finite even when there are zero matches on a neighborhood. See Fig. 3.
Although a comprehensive analysis of the possibilities of statistical modeling lies beyond the scope of this paper, we offer the following observations. The models (2), (3) can be viewed as different conditionings of a marked, inhomogeneous Poisson point process on the image, in which the marks indicate whether a keypoint is or is not matched, assumed uniformly distributed among the keypoints, and the inhomogeneity accounts for the local variations in intensity of the Poisson process. Conditioning on the match criterion () and a total number of matches on the image projects to the distribution of a binomial point process (3). By contrast, conditioning on the observed number of keypoints in a given neighborhood projects to a binomial distribution of marks (i.e. matched keypoints) among keypoints on the neighborhood. This is (2). The latter operation involves conditioning the distribution of the marked, inhomogeneous point process for each neighborhood subject to the statistical test. (For an introduction to spatial point processes, see, e.g. [11].)
Generically, as noted above and as seen in Fig. 1
, the keypoints are not Poisson distributed on the image. It would be interesting, as a direction for future exploration, to see what improvements in change detection might follow from modeling statistical interactions among the keypoints.
IiD Putting it together
Proposed change points often cluster in regions exhibiting the most significant change. At the same time, we frequently observe a few scattered change points in regions which, subjectively, we would not associate with meaningful change. We wish to leverage these observed patterns to make summary binary change / no change assertions for our image pairs. This being a task in pattern recognition, it would seem ripe for a supervised approach. Here we opted simply to count change points in local neighborhoods and assert change where this count exceeds some fraction of the average number of keypoints on the two images.
There is an asymmetry in the ordering of the images in a pair. In cases where new structures appear on a previously blank landscape, for instance in building construction on an empty lot, few keypoints will be extracted in the blank regions of the initial image, whereas many may be extracted on the new structure in the second image. The keypoints in the second image will be unmatched and flagged for change. In the first image there are few or no keypoints to consider, and the blank region won’t be flagged for change. The opposite occurs when structures disappear. Clearly, to capture change in both scenarios we need to check in both directions, forward, considering matches for keypoints from the earlier image onto keypoints from the later image, and backward, considering matches for keypoints from the later image onto keypoints from the earlier one.
To make final change / no change proposals, we sum both forward and backward change points on the image plane by convolution with a uniform square kernel and then threshold the output, thus aggregating change points into smooth change regions, or windows. With limited testing we were able to determine workable settings for the neighborhood size and threshold. The key steps of the algorithm, along with our preferred parameters for the Earth Enginederived NAIP imagery, are given in Table II.
Function  Parameters 

1. KAZE keypoint extraction  Sensitivity threshold: .0003 
2. Matching on nearest descriptors subject to proximity test + crosscheck  ; Radius: 4 pixels 
3. Binomial statistical test (3)  Threshold ; Radius(): 30 pixels 
4. Change point aggregation  Window: 120 x 120 pixels; Threshold fraction: 0.1 
Iii Benchmark Dataset
We set out to detect instances of concerted human activity on the landscape. The algorithm as presented is sensitive to construction and demolition, clearing and regrading of the land, shifts in agricultural use, and natural processes that impact sharp divisions in landcover classes. At the same time, we want to avoid detecting seasonal vegetative change and smallscale, transitory elements such as street traffic. The division between change of interest to human operators and all other differences between images is necessarily subjective. We still have not settled on a viable distinction in agricultural contexts. Construction, however, is relatively straightforward to observe and define. For this reason, and for its inherent interest, we made construction the basis of our benchmark dataset.
We established a minimal scale to filter small, scattered elements from consideration: We require that our target change regions cover at least one percent of the image frame. At our current working resolution, that means a change region, if square, extends minimally two hundred meters on a side. Clearly we are excluding some objects of potential interest, such as singlehome construction. In actuality, the smallest region we included in the benchmark dataset covers 1.6% of the area of the image, and even this may have been too small: Our algorithm struggles to detect these smallest instances, given the parameter settings required to limit falsepositive detections.
The bulk of our examples we found by handsearch of imagery in Google Earth, supplemented by news reports of construction from 2011. From Earth Engine we downloaded the scene for years 2010 and 2012 containing the construction of interest, often offcentered so as to avoid any secondary confounding instances within the frame. We then scrolled until we found a disjoint, nearby (no more than a tenth of a degree distant) scene in which any observable change was smaller in extent than our minimal allowed scale. In this way we built a corpus of fifty positive and fifty negative sample image pairs, from predominantly urban and surburban contexts, but including also some examples with surrounding farmed fields, desert, and green space.
For the fifty pairs exhibiting change, the change region covered on average of the image frame. The maximum coverage was 31% and the minimum, aforementioned, 1.6%. The construction examples fall into the following rough categories: twelve housing subdivisions, seven box stores or malls, five parks or athletic facilities, four hospitals, seven schools, two prisons, two cases of highways or roads, five varied industrial / warehouse / office park facilities, and six cases of demolition or site clearing.
We drew polygonal bounding boxes around areas of construction with an image editing tool. On testing, we considered any intersection of a handdrawn bounding box with an output aggregated change region to constitute a true positive detection for the scene. The benchmark dataset is available on GitHub [12].
Iv Results
We begin with some qualitative observations. The matching design factors most impacting the accuracy of change detection are the feature set (SIFT vs. KAZE) and the nearestneighbor matching protocol. These variants are shown in Figs. 4 and 5 below. The original images from 2010 and 2012 appear first, and the sequence below shows keypoints flagged for change under the following four scenarios: SIFT features; SIFT features with kNN matching (); KAZE features; KAZE features with kNN matching (). The scenes were handselected and are separate from the benchmark dataset. For purposes of this demonstration, the probability thresholds were determined to yield roughly comparable numbers of change points on the first of the Fig. 4 construction scenes.
For all variants in the first sequence, the construction of museum and plaza at center of the frame is correctly identified. However, the SIFTbased routines propose more scattered change points in areas with no obvious change. In the second sequence, the new buildings on the UCSF Mission Bay campus, upper right, are peppered with KAZE change points but are almost entirely missed when using SIFT. In Fig. 5, maintaining the same probability thresholds as for Fig. 4, KAZE with kNN matching zeros in on the three clearcuts and the Bay Bridge construction, while SIFTbased routines largely fail to capture the notable change. The mottled, shifting faces of forest and bay make for challenging matching tasks, and the overall SIFT match rates on these image pairs are in the range of 47%, as compared to 1023% for KAZE. By relaxing the probability threshold, the SIFT cases can capture the change, but this comes at the cost of significant falsepositive detections.
We reserved the benchmark dataset for the following quantitative tests, and did not experiment with it prior to this point. The graphs of Fig. 6 show data for several versions of the change algorithm as applied to this dataset. In the plots topleft, a progression can again be seen as improved matching techniques are introduced, from a best accuracy of 56% with SIFT alone, up to an overall optimal accuracy 68% with KAZE features and kNN () matching. Blanket change or nochange assertions across images would yield 50% accuracies, but as our program yields compact, targeted proposals, it could have fared much worse: For the optimal settings, at threshold , the average fractional area of a proposed change window is ; at this drops to . Meanwhile, recall, the average fractional area of a labeled change region is . An erroneous proposed detection and a labeled change region need not intersect.
With the statistical test (3), reducing the probability threshold reduces change proposals such that true positive and true negative rates both exhibit sigmoidal dependence on the logarithm of the threshold. As shown topcenter, near 100% true positive rates become 100% true negative rates as the threshold tightens by six orders of magnitude. This responsiveness enables the user to tune the algorithm for different applications. To monitor a given location, one might set the threshold so as to capture any possible event, even at the cost of false positive detections. On the other hand, to filter large bodies of satellitederived imagery to identify scenes for further, possibly human, interpretation, limiting false positives would be a priority. Here the qualities of our program particularly stand out. At restrictive thresholds it returns a high proportion of true positives out of total detections. For , the program hits 14 of 23, 12 of 16, 9 of 13, 7 of 8, and 6 of 6, true positives out of total change proposals, respectively. Full data for the program variants’ proportions of true positive detections are plotted in the rightmost panel of the triptych.
Data output by the program of Dellinger et al., using SIFT features, homography for match verification, and the statistical test (1), with the addition of our aggregation routines, are plotted in the bottom triptych. The overall best accuracy attained is 60%, somewhat lower than the 68% we achieve, although perhaps more notable is the way the accuracy remains nearly constant as the threshold tends to an infinitessimal value. As discussed in section IIC, and readily visible in the graph bottomcenter of Fig. 6, under test (1) many keypoints are flagged for change independent of the threshold . The baseline accuracy of 5960% seen in the lower triptych depends on the assertion of change on large swathes of many of the scenes, capturing by default many instances labeled for change. At , where 60% accuracy is attained, the average area of a proposed change window is . No matter how small the threshold, false positive rates never decline below 44%. As a related fact, seen bottomright, the fraction of true positives of total detections never exceeds 55%.
Examples of successful detections and notable fail cases on the benchmark dataset are presented in Figs. 7 and 8. They were generated with the settings in Table II and . As far as possible, we have tried to offer insights as to likely causes of misdetections, e.g., low match rates or clear differences in illumination. Despite the broad challenge of the task, the program makes credible, targeted assertions of change across a wide variety of scenes.
V Conclusions
We have shown an ability to identify regions of largescale construction against a background of diurnal and seasonal change. Our strategy was to seek improved feature extraction and matching, which allows a neighborhood with many unmatched keypoints to register as a statistical anomaly against a a robust statistical model of matched keypoints on the image. KAZE features are the critical design element. They are sensitive to sharp edges and offer some invariance to nonrigid deformations and differences in illumination between images.
Current prospects for application of the platform seem best in the filtering context mentioned above, where false positives can be tolerated as long as highquality targets are also proposed. We can tune the change probability threshold to the point that many quality positives are generated for every false detection, at least where positive and negative instances are balanced in the dataset. In a realistic filtering scenario, quality positives may, however, be relatively rare in the set of images. Along with direct applications, the filter functionality could aid in labeling data for future deep learning approaches to change detection.
In closing, we note that the spatial resolution of the imagery seems to impact the scale of objects most readily detected. We experimented with native onemeterresolution NAIP imagery and found that while average match rates decrease slightly, the algorithm performs in a manner qualitatively similar to that presented here. At onemeter resolution the algorithm captures construction of individual buildings.
Acknowledgments
We are grateful for the support from The Earth Genome. We thank Robin Kraft and Dominika Blackappl for inspiration and technical support in the early phase of this work.
References
 [1] M. İlsever and C. Ünsalan, TwoDimensional Change Detection Methods: Remote Sensing Applications, London: SpringerVerlag, 2012.
 [2] M. Hussain, D. Chen, A. Cheng, H. Wei, and D. Stanley, “Change detection from remotely sensed images: From pixelbased to objectbased approaches,” ISPRS J. Photogramm. Remote Sens., vol. 80, pp. 91106, Apr. 2013.
 [3] D. G. Lowe, “Object recognition from local scaleinvariant features,” in Proc. ICCV, Kyoto, 1999, pp. 11501157.

[4]
D. G. Lowe, “Distinctive image features from
scaleinvariant keypoints,”
Int. J. Comput. Vision
, vol. 60, no. 2, pp. 91110, Nov. 2004.  [5] P. Gamba, F. Dell’Acqua, and G. Lisini. “Change detection of multitemporal SAR data in urban areas combining featurebased and pixelbased techniques,” IEEE T. Geosci. Remote, vol. 44, no. 10, pp. 28202827, Oct. 2006.
 [6] M. İlsever and C. Ünsalan, “Graph matching based change detection in satellite images,” in Proc. IGARSS, Munich, 2012, pp. 62136215.
 [7] F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F. Tupin, “Change detection for high resolution satellite images, based on SIFT descriptors and an a contrario approach,” in Proc. IGARSS, Québec, 2014. Available: https://hal.archivesouvertes.fr/hal01059366.
 [8] P. F. Alcantarilla, A. Bartoli, and A. J. Davison, “KAZE features,” in Proc. ECCV, Florence, 2012, pp. 214227.
 [9] National Agriculture Imagery Program, Farm Service Agency, US Department of Agriculture. Available: http://www.fsa.usda.gov/programsandservices/aerialphotography/imageryprograms/naipimagery/
 [10] Datasets  Google Earth Engine. Available: https://earthengine.google.com/datasets/
 [11] A. Baddeley, ”Spatial point processes and their applications,” in Stochastic Geometry: Lectures given at the C.I.M.E. Summer School held in Martina Franca, Italy, September 1318, 2004, W. Weil, Ed. Berlin: SpringerVerlag, 2007, pp. 175.
 [12] Available: https://github.com/earthgenome/articleimagery
Comments
There are no comments yet.