Effective Data Fusion with Generalized Vegetation Index: Evidence from Land Cover Segmentation in Agriculture

by   Hao Sheng, et al.

How can we effectively leverage the domain knowledge from remote sensing to better segment agriculture land cover from satellite images? In this paper, we propose a novel, model-agnostic, data-fusion approach for vegetation-related computer vision tasks. Motivated by the various Vegetation Indices (VIs), which are introduced by domain experts, we systematically reviewed the VIs that are widely used in remote sensing and their feasibility to be incorporated in deep neural networks. To fully leverage the Near-Infrared channel, the traditional Red-Green-Blue channels, and Vegetation Index or its variants, we propose a Generalized Vegetation Index (GVI), a lightweight module that can be easily plugged into many neural network architectures to serve as an additional information input. To smoothly train models with our GVI, we developed an Additive Group Normalization (AGN) module that does not require extra parameters of the prescribed neural networks. Our approach has improved the IoUs of vegetation-related classes by 0.9-1.3 percent and consistently improves the overall mIoU by 2 percent on our baseline.


page 1

page 4

page 7


Scale Aware Adaptation for Land-Cover Classification in Remote Sensing Imagery

Land-cover classification using remote sensing imagery is an important E...

Uncertainty Gated Network for Land Cover Segmentation

The production of thematic maps depicting land cover is one of the most ...

Estimating heterogeneous wildfire effects using synthetic controls and satellite remote sensing

Wildfires have become one of the biggest natural hazards for environment...

Unsupervised Segmentation of Hyperspectral Remote Sensing Images with Superpixels

In this paper, we propose an unsupervised method for hyperspectral remot...

Multi-modal land cover mapping of remote sensing images using pyramid attention and gated fusion networks

Multi-modality data is becoming readily available in remote sensing (RS)...

Feature Transformation for Cross-domain Few-shot Remote Sensing Scene Classification

Effectively classifying remote sensing scenes is still a challenge due t...

Using maps to predict economic activity

We introduce a novel machine learning approach to leverage historical an...

1 Introduction

Deep learning has been widely adopted in computer vision across various applications such as diagnosing medical images[30]

, classifying objects in photos

[45], annotating video frames[33], etc. However, recognizing the visual patterns in the context of agriculture, especially segmenting the multi-labeled masks, has not been explored extensively in detail. One primary reason that hinders the progress is the difficulty of handling complex multi-modal information inside the images[11] because the sensing imagery in agriculture contains Near Infrared band and other thermal bands that are distinguished from traditional images spanning over red, green, and blue (RGB) visual bands. Such multi-band information is crucial for understanding the land cover context and field conditions, e.g., the vegetation of the land.

Figure 1: An example of an NRGB image and its Vegetation Index (VI) and ground-truth labels. Top-left: Input RGB channels; Top-right: Input near-infra red (NIR) channel; Bottom-left: Vegetation Condition Index (VCI)[37] calculated based on RGB and NIR channel; Bottom-right: Ground-truth labels, where yellow denotes Double Plant and blue denotes the Weed Cluster. VCI is able to pick up both Weed Cluster (a cluster of very high VI values) and Double Plant (lanes of different VI values compared to the background crops).

To leverage the information of multiple distinct bands in the images, researchers in the last several decades have focused on developing different algorithms and metrics to perform the land segmentation[19, 20]. As discussed in the literature review in Section 2, the design of Vegetation Index (VI) has been essential for studying land cover segmentation[32, 67, 53, 69]. The key idea of VI is to assess the vegetation of a region based on the reflectances from multiple bands, including the Near-Infrared band and other thermal bands, and hence ultimately approximate the region’s land cover segments. Nevertheless, in the context of deep learning, we have yet to investigate how to leverage the domain knowledge of VI while making use of models learned or transferred from non-agriculture data to segment the land accurately.

To tackle this question, we describe a general form of VI that serves as an additional input channel for image segmentation. Such a general form of VI covers many specific VI variants in existing studies[53, 69, 26, 65], which motivate us to develop a generalized learnable VI block that fuses the VIs and images in a convolution fashion. Based on the fused input, we also propose a new additive group normalization, a natural generalization of the instance normalization and layer normalization, because the VI channel and RGB channels can be considered as different groups.

Our work contributes to the research of agriculture land cover segmentation in three ways. Firstly, we systematically compare the vegetation indices that primarily depend on the Near-Infrared, red, green, and blue channels. We highlight the key idea of calculating VIs and disclose the connections among them. Secondly, we propose a model-agnostic module named General Vegetation Index (GVI) that captures many existing VI features. This module particularly fits convolutional neural networks, even for the pretrained models very well, because it doesn’t need to change model structures too much. Thirdly, we introduce the additive group normalization (AGN) that helps to fine tune models smoothly when GVI is introduced to a pretrained model. With these components in place, we modified a model based on DeepLabV3

[10] and ran experiments on land segmentation in agriculture. With careful evaluations, we achieved an mIoU of 46.89% which exceeds the performance of the baseline model by about 2 percent.

2 Related Work

Vegetation Index. Vegetation Indices (VIs) are simple and effective metrics that have been widely used to provide quantitative evaluations of vegetation growth[67]. Since the light spectrum changes with plant type, water content within tissues and so on[8, 68]

, the electromagnetic waves reflected from canopies can be captured by passive sensors. Such characteristics of the spectrum can provide extremely useful insights for applications in environmental and agricultural monitoring, biodiversity conservation, yield estimation, and other related fields

[43]. Because the land vegetation highly correlates with the land cover reflectance, researchers have built more than 60 VIs in the last four decades with mainly the following light spectra: (i) the ultraviolet region (UV, 10-380 nm); the visible spectra, which consists blue (B, 450-495 nm), green (G, 495-570 nm) and red (R, 620-750 nm); (iii) the near and mid-infrared band (NIR, 850-1700 nm)[50, 14]. Such VIs are validated through direct or indirect correlations with the vegetation characteristics of interest measured in situ, such as vegetation cover, biomass, growth, and vigor assessment[67, 22].

To our best knowledge, the first VI, i.e., the Ratio Vegetation Index (RVI), was proposed by Jordan[32] in 1969. RVI was developed with the principle that leaves absorb relatively more red than infrared light. Widely used at high-density vegetation coverage regions, RVI is sensitive to atmospheric effects and noisy when vegetation cover is sparse (less than 50%)[20]. The Perpendicular Vegetation Index (PVI)[52] and the Normalized Difference Vegetation Index (NDVI)[53] followed the same principle but to normalize the output, having a sensitive response even for a low vegetation coverage. To eliminate the effects of atmospheric aerosols and ozone, Kaufman and Tanre [34] proposed the Atmospherically Resistant Vegetation Index (ARVI) in 1992, and Zhang et al[69] improved the ARVI by eliminating its dependency to a 5S atmospheric transport model[61]. Another direction was to improve VI’s robustness against different soil backgrounds[52]. The Soil-Adjusted Vegetation Index (SAVI)[27] and modified SAVI (MSAVI)[49, 9] turned out to be much less sensitive than the RVI to changes in the background. Based on ARVI and SAVI, Liu and Huete introduced a feedback mechanism by using a parameter to simultaneously correct soil and atmospheric effects, which they called the Enhanced Vegetation Index (EVI)[41]. With the recent progress in remote sensing (increasing number of bands and narrower bandwidth)[25], more VIs are being built to capture not only the biomass distribution and classification, but also chlorophyll content (Chlorophyll Absorption Ratio Index(CARI))[35], plant water stress (Crop Water Stress Index (CWSI))[28], and light use efficiency (Photochemical Reflectance Index (PRI))[54, 22]. With these aforementioned studies, a summary of VIs that derives from NIR-Red-Green-Blue (NRGB) images can be found in Table 1. Although we refer to a full literature review of VIs in [2] and [67], many VIs share similar form that motivates us to find a generalized formula to capture the essence of VIs.

Remote Sensing with Transfer Learning and Data Fusion. As an emerging interdisciplinary field, remote sensing (on both aerial photographs and satellite images) with deep learning has experienced quite a few benchmark datasets that have been released in recent years, such as FMOW[12], SAT-4/6[4], EuroSat[24], DeepGlobe 2018[16], Agriculture-Vision [11]

and so on. Most of those datasets come with more than the visible band (i.e., RGB), including near and mid-infrared band (NIR) and sometimes shortwave red (SW). The different input structure, together with the context switch from a human-eye dataset (such as ImageNet


) to a bird’s-eye dataset, makes Transferring Learning less straightforward. Penatti

et al. [47] systematically compared ImageNet pretrained CNN with other descriptors (feature extractors, e.g. BIC) and found it achieve comparable but not the best performance in detecting coffee scenes. Xie et al. [66] has shown simply adopting the ImageNet pretrained model while discarding the extra information does not achieve the best result in predicting the Poverty Level. Zhou et al. [70] has also observed the similar phenomena in their Road Extraction task. In addition, a two-stage fine-tuning process is proposed in [71]

, where an ImageNet pretrained network is further fine-tuned on a large satellite image dataset with the first several layers frozen. An alternative direction in exploring the large-scaled but not well-labeled data is to construct satellite-image-specified geo-embedding through weakly supervised learning


, or unsupervised learning with Triplet Loss

[31]. These aforementioned steps motivate us to use a pretrained model based on ImageNet, which has been demonstrated to have a good performance empirically in transfer learning.

In [57], Sidek and Quadri defined data fusion as “dealing with the synergistic combination of information made available by different measurement sensors, information sources, and decision-makers.” Studies in the deep learning community have also proposed data fusion approaches that are specific to satellite images at a different level in practice. For example, [7] concatenates LiDAR and RGB to predict roof shape better. In DeepSat [4], Basu et al. achieves the state of the art performance on SAT-4 and SAT-6 land cover classification problems by incorporating NDVI[53], EVI[26] and ARVI[69] as additional input channels. A recent study[13] proposed a novel approach to select and combine the most similar channels using images from different timestamps. Apart from the multi-channel data fusion, fusions at multi-source [56, 17] and multi-temporal[5] levels have also shown their empirical value. Such an idea of fusing the multiple input channels also inspired our design of the fusion module of General Vegetation Index.

Multi-spectral Image Data Fusion. Multi-spectral image data fusion is also widely used in robotics, medical diagnoses, 3D inspection, etc[39]. Color related techniques represent color in different spaces. The Intensity-Hue-Saturation (IHS fusion)[21] transforms three channels of the data into the IHS color space, which separates the color aspects in its average brightness (intensity). The values in IHS space correspond to the surface roughness, its dominant wavelength contribution (hue), and its purity (saturation) [18, 6]. Then, one of the components is replaced by a fourth channel that needs to be integrated. Statistical/numerical methods introduce a mathematical combination of image channels. The Brovey algorithm[51] calculates the ratio of each image band by summing up the chosen bands, followed by multiplying with the high-resolution image.

In addition to concatenating multi-spectral channels, several deep learning architectures were proposed for multi-spectral images. [48] pretrains a Siamese Convolution Network to generate a weighted map for infrared and RGB channels in the inference time. Li et al. [38] first decomposes the source images into base background and detail content and then applies a weighted average on the background while using a deep learning network to extract multi-layer features for detail content.

These studies gave an initial attempt to tackle the image classification problem using multiple spectral inputs in deep learning models. But we have yet to investigate how the multi-spectral image can be translated into the VI-related input in the context of agriculture segmentation.

3 Proposed Method

3.1 Overview

In general, our approach hinges on fusing Vegetation Index with raw images. We first introduce using well-known VIs as another input channel, and then we generalize the idea of VI to a fully learnable data fusion module. Last but not least, we propose an Additive Group Normalization (AGN) to handle the warm-start with a pretrained model. We describe the technical details in the following subsections.

3.2 Vegetation Index for Neural Nets

According to [67], during the practice of remote sensing, more than 60 VIs have been developed in the last four decades. However, not all VIs are derived from NIR and RGB channels, few of which generalize across datasets without tuning their sensitive parameters manually. For example, the Perpendicular Vegetation Index (PVI) [52] is defined as follows:


where is the soil reflectance and is the vegetation reflectivity. However, PVI is sensitive to soil brightness and reflectivity, especially in the case of low vegetation coverage, and needs to be re-calibrated for this effect[34]. Such sensitivity introduces semantic difficulty as we try to feed the VI into the neural network as another input channel. There are also VIs designed for a specific dataset in the first place. On top of the Landsat Multispectral Scanner (MSS), Landsat Thematic Mapper (TM) and Landsat 7 Enhanced Thematic Mapper (ETM) data, Cruden et al[14] applied a Tasseled Cap Transformation and came up with the empirical coefficients for Green Vegetation Index (GVI) as:


where denotes the th band of Landsat MSS. Landsat TM and Landsat 7 ETM are not usually available for satellite and aerial imagery outside this product family. This Green VI is composed by a linear combination of the multi-channel input, which shares a similar concept among many other VIs. To better understand the popular format of different indices, we summarized some representative VIs, shown in Table 1, that are derived from NIR-Red-Green-Blue (NRGB) images, together with their definitions and value ranges. Based on the definitions, we calculate the pixel-wise correlation matrix for all 12 VIs (Figure 2). The correlation coefficients are calculated at the pixel level using all data released for training. For SAVI, we choose to be 0.5.

Figure 2: Pair-wise correlation coefficients of all 12 available vegetation indices.
Index Definition Meaningful Range
NDVI [53]
IAVI [69] ,
MSAVI2 [52] 0.5
EVI [26]
VDVI [65]
WDRVI [19]
MCARI [15]
GDVI [58]
SAVI [27]
RVI [46]
VCI [37]
GRVI [58]
NDGI [3]
Table 1: Summary of vegetation indices that are derived from NIR-Red-Green-Blue (NRGB) images

: These vegetation indices share the general format as equation 3
: Parameters need to be calibrated and this VI cannot be fed into the neural network directly

Except for certain pairs (such as NDVI v.s. SAVI), the correlation between different VIs are within the range of . We include all 12 VIs as extra input channels in our experiments when leveraging the information from existing vegetation indices.

3.3 Learnable Vegetation Index

Some high correlations between VIs stem from not only the fundamental vegetation status, but also the empirical function that researchers have introduced. We notice that 9 out of the 13 VIs from Table 1 share the following general form:


where are parameters to be determined, and could be learnable in deep learning models. As suggested in [37]

, we can normalize the response (output) by nearby regions to supress outliers. By extending the pixel-wise operation to each neighborhood of image channels, we introduce a learnable layer of Generalized Vegetation Index (GVI):


where denotes the convolution operation, is our NRGB inputs, and are the learnable weights. In practice, we clip both the numerator and denominator to avoid numerical issues. Depending on the output channels, this layer has the capacity to express a variant number of VIs when learned. An illustrative example can be found in Figure 3.

Figure 3:

Our data fusion module is model-agnostic. The VI or GVI input channel is compatible to any segmentation model trained on NRGB image. Additive Group Normalization (AGN) is applied in the Near-Infrared channel with a linear combination of the batch normalization.

3.4 Additive Group Normalization Index

In contrast to explicitly normalizing the constructed indices using a ratio (e.g., in Equation (4)), we could normalize the value using nearby regions and channels, as we saw in VCI[37]. Fortunately, the deep learning community has already developed the counterpart approaches, such as the Batch Normalization (BN)[29], Layer Normalization (LN)[1], Instance Normalization (IN)[62] and Group Normalization (GN)[64]. However, we found that the neural network, even equipped with the most widely used BN, has an internal difficulty in fitting existing VIs, as shown in Figure 4.

Figure 4:

Mean L1 error over standard deviation (%). At each pixel, we trained a two-layer, fully-connected neural network with the NRGB channels to fit the Vegetation Indices using a batch size of 16. We plot the relative error for each vegetation index, i.e., the L1 error over the mean standard deviation in percentage. The additive group normalization fits almost all VIs better compared to batch normalization.

The relative high errors in prediction indicate that BN is not able to captured channel-normalized features, while VIs are usually normalizing the inputs across the spectrum. Motivated by such observations, we introduce the Additive Group Normalization, which combine the BN and GM together in an additive fashion. Unlike BN, which normalizes each channel of features using the mean and variance computed over the mini-batch, GN splits channels into groups and uses the within-group mean and variance to normalize the particular group:


where is the number of groups, is the group assignment of channel , and is the GN response. Depending on the number of groups, such a normalization can be reduced to either Instance Normalization () or Layer Normalization ().

Figure 5: Examples of segmentation results. We include four examples (rows) of their RGB input, ground-truth labels, predictions of the baseline model, and predictions of the baseline model with VIs inputs (columns). Segmentation labels: Green for background, Blue for Weed Cluster, Red for Double Plant and Yellow for Waterway. Including VIs helps the model perform better in vegetation-related classes (e.g. Weed Cluster) as well as non vegetation classes (e.g. Waterway).

Inspired by the adaptive Instance-Batch Normalization [44], we designed our Additive Group Normalization (AGN) as follows:


where is a learnable parameter controlling the contribution of Group Normalization in each layer and is the response of AGN. This normalization does not introduce extra parameters (except for the running mean and standard deviation) but leverages the existing capacity of the underlying network.

When is a large negative number, the term gets a negligible weight and . This property makes fine-tuning of experiments much smoother on a pretrained model with an architecture of Batch Normalization: To control the “ramping up” of Group Normalization, we initialize with a negative number, e.g., , and the model weights are updated gradually. We show experimental results for both training from scratch and fine-tuning in Section 4.

4 Experiments

4.1 Architecture Setup

We use EfficientNet-B0 / EfficientNet-B2[60] as our base encoder in the DeepLabV3[10] framework. They are parameter-efficient networks that achieve the same performance of ResNet-50 / ResNet-101 respectively with a much lower number of parameters.[23]

Architecture Method mIoU (%) Background Cloud Shadow Double Plant Planter Skip Standing Water Waterway Weed Cluster
DeepLabV3 Baseline 44.92 78.84 40.59 33.14 0.74 51.03 60.64 49.48
Baseline + VI 46.04 78.75 41.17 33.66 0.46 56.67 62.06 49.50
Baseline + GVI 46.05 79.81 34.58 35.24 0.83 58.08 63.47 50.32
AGN 46.87 79.28 41.22 34.56 1.05 57.14 63.53 51.28
Table 2: mIoUs and class IoUs of baseline models, baseline models with Vegetation Index as additional models and our proposed generalized vegetation index model.

4.2 Training Details

We used backbone models pretrained on ImageNet in all our experiments. During initialization, we copied the pretrained weights for the red channel filter to the one for the NIR channel in the first layer. We trained each model for 80 epochs with a batch size of 64 on eight GeForce GTX TITAN X GPUs. Unless specified, we used a combination of Focal Loss

[40] and Dice Loss[59] with weights 0.75 and 0.25 respectively. We did not weigh classes differently, albeit the dataset is unbalanced. We also masked all the pixels that are either not valid or not within the region of the farmland. We use the Adam optimizer [36] with a base learning rate of 0.01 and a weight decay of . During the training, we monitored the validation loss and stopped the experiments if the loss didn’t decrease within ten epochs. Once a model was trained, we fine-tuned it with VI, GVI, or AGN modules. We adopted the cosine annealing strategy [42], with the learning rate ranges from to and a cycle length of 10 epochs. For a fair comparison, we also fine-tuned our baseline model in this stage.

4.3 Dataset and Evaluation Metric

We evaluated our approach on Agriculture-Vision[11] with mean Intersection-over-Union(IOU) across classes. Since our annotations may overlap while we modeled the segmentation as a multi-class classification problem pixel-wisely, we also described the mean IOU calculation as follows.
Agriculture-Vision. Agriculture-Vision is an aerial image dataset that contains 21,061 farmland images captured throughout 2019 across the US. Each image is of size 512 512 and with four color channels, namely, RGB and Near Infrared (NIR). By the time the experiments are done, the labels for the test set have not been released yet, so we used the verification set to test the trained model.
IOU with overlapped annotations. We followed the protocol from the data challenge organizer to accommodate the evaluation for overlapped annotations. For pixels with multiple labels, a prediction of either label was counted as a correct pixel classification for that label, and a prediction that did not contain any ground truth labels was counted as an incorrect classification for all ground truth labels.

4.4 Results

Table 2 presents the validation results of the baseline model, together with several proposed methods to leverage the information from the NIR band. The average mIoU of the baseline model yeilds 44.92% accuracy. When plugging the GVI module into our model, we achieve 46.05% accuracy on mIoU, and highest accuracy in some categories such as background, double plant, and standing water. Moreover, when we use the Additive Group Normalization (AGN), the model performs the best in terms of mIoU at the accuracy of 46.87%. Our model consistently outperforms a) only using the NIR bands without extra information from vegetation indices; b) adding vegetation indices directly as inputs. And we saw gains in vegetation-related classes (e.g., Weed Cluster) as well as non-vegetation classes (e.g., Waterway). We include some examples in Figure 5.

5 Conclusion

In this work, we introduced the General Vegetation Index that enhanced the power of neural networks in agriculture and highlighted the connection between this GVI and other existing VIs. When starting from a pretrained model with minimal modifications, our proposed GVI and additive group normalization can achieve, and in some cases, exceed state-of-the-art performances. Our best result of mIOU is about 2% better than the baseline model. In addition, our method doesn’t require sophisticated network architecture with the increase of model parameters. Such a result is a promising step forward when incorporating VI related information with multi-band images for segmentation tasks in agriculture.

While our approach sheds a promising light on segmenting lands in agriculture, we believe several potential directions could be valuable for future work. Firstly, how the model architecture can affect the result is still open for exploring. It is not clear if the segmentation results are sensitive to different models with VI inputs. Secondly, we would like to incorporate some additional training techniques, e.g., virtual adversarial training, which is orthogonal to our data fusion approach to improve the model performance further. Lastly, the ability to generalize our method on a larger scale dataset remains open to investigate.


  • [1] J. L. Ba, J. R. Kiros, and G. E. Hinton (2016) Layer normalization. arXiv preprint arXiv:1607.06450. Cited by: §3.4.
  • [2] A. Bannari, D. Morin, F. Bonn, and A. Huete (1995) A review of vegetation indices. Remote sensing reviews 13 (1-2), pp. 95–120. Cited by: §2.
  • [3] F. Baret and G. Guyot (1991) Potentials and limits of vegetation indices for lai and apar assessment. Remote sensing of environment 35 (2-3), pp. 161–173. Cited by: Table 1.
  • [4] S. Basu, S. Ganguly, S. Mukhopadhyay, R. DiBiano, M. Karki, and R. Nemani (2015) Deepsat: a learning framework for satellite imagery. In Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, pp. 1–10. Cited by: §2, §2.
  • [5] P. Benedetti, G. Raffaele, O. Kenji, R. G. Pensa, D. Stephane, D. Ienco, et al. (2018) Mfusion: un modèle d’apprentissage profond pour la fusion de données satellitaires multi-echelles/modalités/temporelles. In Conférence Française de Photogrammétrie et de Télédétection CFPT 2018, pp. 1–8. Cited by: §2.
  • [6] W. CARPER, T. LILLESAND, and R. KIEFER (1990) The use of intensity-hue-saturation transformations for merging spot panchromatic and multispectral image data. Photogrammetric Engineering and remote sensing 56 (4), pp. 459–467. Cited by: §2.
  • [7] J. Castagno and E. Atkins (2018) Roof shape classification from lidar and satellite image data fusion using supervised learning. Sensors 18 (11), pp. 3960. Cited by: §2.
  • [8] L. Chang, S. Peng-Sen, and L. Shi-Rong (2016) A review of plant spectral reflectance response to water physiological changes. Chinese Journal of Plant Ecology 40 (1), pp. 80–91. Cited by: §2.
  • [9] J. M. Chen (1996) Evaluation of vegetation indices and a modified simple ratio for boreal applications. Canadian Journal of Remote Sensing 22 (3), pp. 229–242. Cited by: §2.
  • [10] L. Chen, G. Papandreou, F. Schroff, and H. Adam (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Cited by: §1, §4.1.
  • [11] M. T. Chiu, X. Xu, Y. Wei, Z. Huang, A. Schwing, R. Brunner, H. Khachatrian, H. Karapetyan, I. Dozier, G. Rose, et al. (2020) Agriculture-vision: a large aerial image database for agricultural pattern analysis. arXiv preprint arXiv:2001.01306. Cited by: §1, §2, §4.3.
  • [12] G. Christie, N. Fendley, J. Wilson, and R. Mukherjee (2018) Functional map of the world. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 6172–6180. Cited by: §2.
  • [13] Y. T. S. Correa, F. Bovolo, and L. Bruzzone (2015) VHR time-series generation by prediction and fusion of multi-sensor images. In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 3298–3301. Cited by: §2.
  • [14] B. A. Cruden, D. Prabhu, and R. Martinez (2012) Absolute radiation measurement in venus and mars entry conditions. Journal of Spacecraft and Rockets 49 (6), pp. 1069–1079. Cited by: §2, §3.2.
  • [15] C. Daughtry, C. Walthall, M. Kim, E. B. De Colstoun, and J. McMurtrey Iii (2000) Estimating corn leaf chlorophyll concentration from leaf and canopy reflectance. Remote sensing of Environment 74 (2), pp. 229–239. Cited by: Table 1.
  • [16] I. Demir, K. Koperski, D. Lindenbaum, G. Pang, J. Huang, S. Basu, F. Hughes, D. Tuia, and R. Raska (2018) Deepglobe 2018: a challenge to parse the earth through satellite images. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 172–17209. Cited by: §2.
  • [17] F. Gao, J. Masek, M. Schwaller, and F. Hall (2006) On the blending of the landsat and modis surface reflectance: predicting daily landsat surface reflectance. IEEE Transactions on Geoscience and Remote sensing 44 (8), pp. 2207–2218. Cited by: §2.
  • [18] A. R. Gillespie, A. B. Kahle, and R. E. Walker (1986) Color enhancement of highly correlated images. i. decorrelation and hsi contrast stretches. Remote Sensing of Environment 20 (3), pp. 209–235. Cited by: §2.
  • [19] A. A. Gitelson (2004) Wide dynamic range vegetation index for remote quantification of biophysical characteristics of vegetation. Journal of plant physiology 161 (2), pp. 165–173. Cited by: §1, Table 1.
  • [20] J. Grace, C. Nichol, M. Disney, P. Lewis, T. Quaife, and P. Bowyer (2007) Can we measure terrestrial photosynthesis from space directly, using spectral reflectance and fluorescence?. Global Change Biology 13 (7), pp. 1484–1497. Cited by: §1, §2.
  • [21] B. A. Harrison and D. L. B. Jupp (1989) Introduction to remotely sensed data: part one of the microbrian resource manual. East Melbourne, Vic: CSIRO Publications. Cited by: §2.
  • [22] A. Haxeltine and I. Prentice (1996) A general model for the light-use efficiency of primary production. Functional Ecology, pp. 551–561. Cited by: §2, §2.
  • [23] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §4.1.
  • [24] P. Helber, B. Bischke, A. Dengel, and D. Borth (2019) Eurosat: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (7), pp. 2217–2226. Cited by: §2.
  • [25] E. Honkavaara, H. Saari, J. Kaivosoja, I. Pölönen, T. Hakala, P. Litkey, J. Mäkynen, and L. Pesonen (2013) Processing and assessment of spectrometric, stereoscopic imagery collected using a lightweight uav spectral camera for precision agriculture. Remote Sensing 5 (10), pp. 5006–5039. Cited by: §2.
  • [26] A. Huete, K. Didan, T. Miura, E. P. Rodriguez, X. Gao, and L. G. Ferreira (2002) Overview of the radiometric and biophysical performance of the modis vegetation indices. Remote sensing of environment 83 (1-2), pp. 195–213. Cited by: §1, §2, Table 1.
  • [27] A. Huete (1988) Huete, ar a soil-adjusted vegetation index (savi). remote sensing of environment. Remote sensing of environment 25, pp. 295–309. Cited by: §2, Table 1.
  • [28] S. Idso, R. Jackson, P. Pinter Jr, R. Reginato, and J. Hatfield (1981) Normalizing the stress-degree-day parameter for environmental variability. Agricultural meteorology 24, pp. 45–55. Cited by: §2.
  • [29] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §3.4.
  • [30] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya, et al. (2019) Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 33, pp. 590–597. Cited by: §1.
  • [31] N. Jean, S. Wang, A. Samar, G. Azzari, D. Lobell, and S. Ermon (2019) Tile2Vec: unsupervised representation learning for spatially distributed data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3967–3974. Cited by: §2.
  • [32] C. F. Jordan (1969) Derivation of leaf-area index from quality of light on the forest floor. Ecology 50 (4), pp. 663–666. Cited by: §1, §2.
  • [33] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei (2014) Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732. Cited by: §1.
  • [34] Y. J. Kaufman and D. Tanre (1992) Atmospherically resistant vegetation index (arvi) for eos-modis. IEEE transactions on Geoscience and Remote Sensing 30 (2), pp. 261–270. Cited by: §2, §3.2.
  • [35] M. S. Kim, C. Daughtry, E. Chappelle, J. McMurtrey, and C. Walthall (1994) The use of high spectral resolution bands for estimating absorbed photosynthetically active radiation (a par). Cited by: §2.
  • [36] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
  • [37] F. N. Kogan (1995) Application of vegetation index and brightness temperature for drought detection. Advances in space research 15 (11), pp. 91–100. Cited by: Figure 1, §3.3, §3.4, Table 1.
  • [38] H. Li, X. Wu, and J. Kittler (2018) Infrared and visible image fusion using a deep learning framework. In 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2705–2710. Cited by: §2.
  • [39] M. Liggins II, D. Hall, and J. Llinas (2017) Handbook of multisensor data fusion: theory and practice. CRC press. Cited by: §2.
  • [40] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. Cited by: §4.2.
  • [41] H. Q. Liu and A. Huete (1995) A feedback based modification of the ndvi to minimize canopy background and atmospheric noise. IEEE transactions on Geoscience and Remote Sensing 33 (2), pp. 457–465. Cited by: §2.
  • [42] I. Loshchilov and F. Hutter (2016)

    Sgdr: stochastic gradient descent with warm restarts

    arXiv preprint arXiv:1608.03983. Cited by: §4.2.
  • [43] D. J. Mulla (2013) Twenty five years of remote sensing in precision agriculture: key advances and remaining knowledge gaps. Biosystems engineering 114 (4), pp. 358–371. Cited by: §2.
  • [44] H. Nam and H. Kim (2018) Batch-instance normalization for adaptively style-invariant neural networks. In Advances in Neural Information Processing Systems, pp. 2558–2567. Cited by: §3.4.
  • [45] B. Oshri, A. Hu, P. Adelson, X. Chen, P. Dupas, J. Weinstein, M. Burke, D. Lobell, and S. Ermon (2018) Infrastructure quality assessment in africa using satellite imagery and deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 616–625. Cited by: §1.
  • [46] R. L. Pearson (1972) Remote mapping of standing crop biomass for estimation of the productivity of the shortgrass prairie. In Eighth International Symposium on Remote Sensing of Enviroment, pp. 1357–1381. Cited by: Table 1.
  • [47] O. A. Penatti, K. Nogueira, and J. A. Dos Santos (2015)

    Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?

    In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 44–51. Cited by: §2.
  • [48] J. Piao, Y. Chen, and H. Shin (2019) A new deep learning based multi-spectral image fusion method. Entropy 21 (6), pp. 570. Cited by: §2.
  • [49] J. Qi, A. Chehbouni, A. R. Huete, Y. H. Kerr, and S. Sorooshian (1994) A modified soil adjusted vegetation index. Cited by: §2.
  • [50] H. R. B. A. Rahim, M. Q. B. Lokman, S. W. Harun, G. L. Hornyak, K. Sterckx, W. S. Mohammed, and J. Dutta (2016) Applied light-side coupling with optimized spiral-patterned zinc oxide nanorod coatings for multiple optical channel alcohol vapor sensing. Journal of Nanophotonics 10 (3), pp. 036009. Cited by: §2.
  • [51] T. Ranchin and L. Wald (2000) Fusion of high spatial and spectral resolution images: the arsis concept and its implementation. Cited by: §2.
  • [52] A. J. Richardson and C. Wiegand (1977) Distinguishing vegetation from soil background information. Photogrammetric engineering and remote sensing 43 (12), pp. 1541–1552. Cited by: §2, §3.2, Table 1.
  • [53] J. Rouse, R. Haas, J. Schell, and D. Deering (1974) Monitoring vegetation systems in the great plains with erts. NASA special publication 351, pp. 309. Cited by: §1, §1, §2, §2, Table 1.
  • [54] A. Ruimy, L. Kergoat, A. Bondeau, and T. P. O. T. P. N. M. Intercomparison (1999) Comparing global models of terrestrial net primary productivity (npp): analysis of differences in light absorption and light-use efficiency. Global Change Biology 5 (S1), pp. 56–64. Cited by: §2.
  • [55] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115 (3), pp. 211–252. Cited by: §2.
  • [56] M. Schmitt and X. X. Zhu (2016) Data fusion and remote sensing: an ever-growing relationship. IEEE Geoscience and Remote Sensing Magazine 4 (4), pp. 6–23. Cited by: §2.
  • [57] O. Sidek and S. Quadri (2012) A review of data fusion models and systems. International Journal of Image and Data Fusion 3 (1), pp. 3–21. Cited by: §2.
  • [58] R. P. Sripada, R. W. Heiniger, J. G. White, and R. Weisz (2005) Aerial color infrared photography for determining late-season nitrogen requirements in corn. Agronomy Journal 97 (5), pp. 1443–1451. Cited by: Table 1.
  • [59] C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso (2017)

    Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations

    In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 240–248. Cited by: §4.2.
  • [60] M. Tan and Q. V. Le (2019) Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946. Cited by: §4.1.
  • [61] D. Tanré, C. Deroo, P. Duhaut, M. Herman, J. Morcrette, J. Perbos, and P. Deschamps (1990) Technical note description of a computer code to simulate the satellite signal in the solar spectrum: the 5s code. International Journal of Remote Sensing 11 (4), pp. 659–668. Cited by: §2.
  • [62] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §3.4.
  • [63] B. Uzkent, E. Sheehan, C. Meng, Z. Tang, M. Burke, D. Lobell, and S. Ermon (2019) Learning to interpret satellite images in global scale using wikipedia. arXiv preprint arXiv:1905.02506. Cited by: §2.
  • [64] Y. Wu and K. He (2018) Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. Cited by: §3.4.
  • [65] W. Xiaoqin, W. Miaomiao, W. Shaoqiang, and W. Yundong (2015) Extraction of vegetation information from visible unmanned aerial vehicle images.. Transactions of the Chinese Society of Agricultural Engineering 31 (5). Cited by: §1, Table 1.
  • [66] M. Xie, N. Jean, M. Burke, D. Lobell, and S. Ermon (2016) Transfer learning from deep features for remote sensing and poverty mapping. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §2.
  • [67] J. Xue and B. Su (2017) Significant remote sensing vegetation indices: a review of developments and applications. Journal of Sensors 2017. Cited by: §1, §2, §2, §3.2.
  • [68] C. Zhang and J. M. Kovacs (2012) The application of small unmanned aerial systems for precision agriculture: a review. Precision agriculture 13 (6), pp. 693–712. Cited by: §2.
  • [69] R. Zhang, X. Rao, and N. Liao (1996) Approach for a vegetation index resistant to atomospheric effect. Acta Botanica Sinica 38 (1), pp. 53–62. Cited by: §1, §1, §2, §2, Table 1.
  • [70] L. Zhou, C. Zhang, and M. Wu (2018) D-linknet: linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction.. In CVPR Workshops, pp. 182–186. Cited by: §2.
  • [71] Z. Zhou, Y. Zheng, H. Ye, J. Pu, and G. Sun (2018)

    Satellite image scene classification via convnet with context aggregation

    In Pacific Rim Conference on Multimedia, pp. 329–339. Cited by: §2.