The Industry 4.0 trend is transforming the production capabilities of all industries, including the agricultural domain. IoT and Artificial Intelligence are key enabling technologies for this transformation. Agriculture 4.0 will no longer depend on applying water, fertilizers, and pesticides uniformly across entire fields. Instead, farmers will use the minimum quantities required and target very specific areas. Farms and agricultural operations will have to be run very differently and more efficiently, primarily due to advancements in technology such as sensors, devices, machines, and information technology. Future agriculture will use sophisticated technologies such as robots, temperature and moisture sensors, aerial images and UAVs, behaviour and action analysis, multi-spectral and hyper-spectral imaging devices and GPS and other positioning technology. These advanced devices and precision agriculture and robotic systems will allow farms to be more profitable, efficient, safe, and environmentally friendly and paves the way toward Agriculture and Industry 4.0[1, 2, 3, 4, 5].
In the past few years, the UAV industry has grown from a niche market to mainstream availability, lowering the cost of aerial imagery acquisition and opening the way to many interesting applications. This had great positive impact in various industries such as agriculture. UAVs are now increasingly used as a cost effective and timely method of capturing remote sensing images. The advantages of UAV technology include low cost, small size, safety, ecological operation, and most of all, the fast and on-demand acquisition of images.The advance of UAV technology has reached the stage of being able to provide extremely high resolution remote sensing images encompassing abundant spatial and contextual information[6, 7]
In recent years, aerial imagery using UAVs becomes an integral part of precision agriculture. The spatial resolution provided by UAVs, is revolutionizing precision agriculture workflows for measurement crop condition and yields over the growing season, for identifying and monitoring weeds and other applications such as registration [8, 9, 10, 11]
. Various studies are proposing novel applications of UAV image analysis for precision agriculture and vegetation and crop monitoring. The vast quantity of data, acquired by UAVs as well as recent advancements in parallel computing and GPU technology, enabled researcher to to adopt and deploy data driven analysis and decision making techniques such as deep learning into agriculture domain. Deep learning models which are vaguely inspired by information processing and communication patterns in biological nervous systems, have revolutionized the artificial intelligence and computer vision techniques
. Deep learning techniques are similar to ANN. However, deep learning is about “deeper” neural networks that provide a hierarchical representation of the data by means of various convolutions. This allows larger learning capabilities and thus higher performance and precision. Deep learning allow data representation in a hierarchical way, through several levels of abstraction.
A strong advantage of deep learning is automatic feature extraction from raw data, with features from higher levels of the hierarchy being formed by the composition of lower level features. Deep learning can solve more complex problems particularly well and fast, because it benefits massive parallelization. Convolutional Neural Networks (CNN) constitute a class of deep, feed-forward ANN which mainly devised for deep operation on images. In recent years, CNN has been extensively used in computer vision and image processing. CNNs are capable to address highly complex classification and segmentation tasks that classic computer vision techniques were unable to solve. Positive effects of deep learning technique and UAVs into agriculture domain, paved the way for agriculture 4.0 and inspired various studies and research in the last decade.
In this regard, this study investigates the use of deep learning and UAVs imagery as two major drivers in agricultural 4.0. Several studies have been investigating the use of these technologies. Based on the domain and the application, we have categorized these studies into five major groups including: vegetation identification, classification and segmentation, crop counting and yield predictions, crop mapping, weed detection and crop disease and nutrient deficiency detection. This study conducts a comprehensive comparison among various approaches and studies and investigates their strengths and weaknesses. The potential benefits of deep learning and UAVs in agricultural industry are not fully compromised and there are several limitations and challenges to be addressed.
The following sections present a critical analysis on the existing literature on vegetation and crop monitoring using aerial imagery and deep learning.
Ii Vegetation Identification, Classification and Segmentation
|Ji et al.||Corn, forest, grass, rice, road, soybean, water, wheat||Gaofen 2 multi-temporal images||
VGG CNN, 3D spatio-temporal kernel tensor
|Chunjing et al.||Pond, rice, algae, waste-land, river, building, wood, road||Panchromatic images captured by Gaofen 1||Conventional CNN with spatio-temporal features|
|Rebetez et al.||22 different types of crops||Swiss confederation’s agroscope dataset||HistNN model using per-window histograms|
|Fan et al.||Tobacco crop||14 Hi-res tobacco plant images||Morphology, segmentation and CNN|
|Gao et al.||Maize crop||In-house Lidar scans of maize field||Unsupervised clustering and faster R-CNN|
Automatic classification and segmentation of crops and vegetation through UAVs is becoming a fundamental technology for vegetation identification and classification. Over the last decade, numerous studies particularly focused on crop classification and segmentation using aerial images, computer vision and machine learning techniques. However, rise of deep learning in recent years was a breath of fresh air in this research area. Deep learning models transform spatial, spectral, and temporal data into discriminative feature vectors and then classify each feature vector to certain types of vegetation according to the supplied training labels[13, 14]. Many studies have moved from classic machine learning techniques to the state of the art artificial neural networks and deep learning approaches to classify and segment the aerial images of vegetation.
Ji et al. 
proposed a three dimensional convolutional neural networks for crop classification with multi-Temporal UAV Images. They designed a 3D kernel tensor according to the structure of multi-spectral multi-temporal remote sensing data, consisting of weights, number of channels and temporal indicator which generates a 3D feature map after 3D convolution on spatio-temporal images and accumulating over different spectral bands. They recycled widely used neural network structure developed by Oxford’s Visual Geometry Group (VGGnet) as template to train a deep convolutional neural network where all 2D convolution operations are replaced by 3D convolution. However, the kernel size remained unchanged (3x3). All other network parameters were fine-tuned empirically for training 3D crop samples and learning spatio-temporal discriminative representations, with the full crop growth cycles being preserved. They also introduced an active learning strategy to the CNN model to improve labeling accuracy up to a required threshold. The use of 3D CNN is especially suitable in characterizing the dynamics of crop growth and it could outperforms over classic techniques. However the use of of the shelf VGGnet which is not really designed to pick up the fine texture patterns in plants and vegetation could be a drawback of this study.
Chunjing et al.
investigates the application of convolutional neural network in classification of high resolution agricultural aerial images. They designed a network of 11-layer convolution neural network, including the input layer, three convolution layers, two pooling layers, two local contrast adjustment layers, two full connection layers and one output layer. Aside from the last classification layer, they employed Sigmoid activation function for all other convolutional and fully connected layers. They have taken the strong temporal and regional (Ezhou, China) characteristics of crop into account. Combined with the image feature of various types, the class category labels including: Pond, Rice, Algae, Waste-land, River, Building, Wood, Road and other planting were used in this study. The results of this study shows that convolution neural network method has significantly higher precision rate than classic machine learning techniques such as SVM. Using sigmoid activation function increases the likelihood of vanishing gradient, also sigmoid function decreases the sparsity of the network which decreases its resolving power.
In another study Rebetez et al.
proposed a hybrid deep neural network which combines convolutional layers with per-window histograms to increase crop classification performance. A dataset of high resolution aerial images from experimental farm fields issued from a series of experiments conducted by the Swiss Confederation’s Agroscope research center used in this study. The dataset consists of 22 different types of crops taken with RGB cameras mounted on UAV. The deep network fed with window blocks of 21x21 pixels with both RGB colors and texture features perceived as discriminative parameters for crop classification. They proposed a deep neural network which consists of a convolutional side (CNN) which uses the raw pixel values and a dense side which uses RGB histograms (HistNN). The output of both networks was merged by a dense layer of 128 neurons which then passed to the final layer to predict the class probability among the 22 target classes using a softmax function. This study shows the RGB histograms network outperformed the simple convolutional network in terms of classification accuracy and F-score. However the combined CNN-HistNN network generate overall superior results. The major limitation of this study is absence of temporal information in both data collection stage and the deep model. Plants tend to have different texture and color characteristics at different time and seasons. This urges the deep learning methods to perform data acquisition at various time spans to address the color and texture diversity.
A study by Fan et al.
focused on automatic tobacco plant detection in UAV images via deep neural networks. They proposed a new 3-stage algorithm based on deep neural networks to detect tobacco plants in high resolution images captured by UAVs. In the first stage, a number of candidate tobacco plant regions are extracted from UAV images with the classic computer vision approaches such as morphological operations and watershed segmentation. Each candidate region contains a tobacco plant or a non-tobacco plant to maintain the balance between two classes. In the second stage, they built a deep convolutional neural network and trained it with the purpose of classifying the candidate regions as tobacco plant regions or non-tobacco plant regions. The proposed network composed of three convolutional layers, one pooling layer and two fully connected layers. The network utilizes 3x3 convolutional kernels with stride 1 and 2x2 pooling kernels with stride 2. They employed Stochastic gradient descent as the optimizer function. In the third stage, post-processing such as Manhattan inter-class distance is performed to further remove the non-tobacco plant regions. They evaluate their model using a dataset of 14 high resolution images captured by UAVs. The experimental results show that the proposed algorithm outperformed SVM and Random forest classifiers on the detection of tobacco plants in UAV images. Deep learning techniques demand for fairly large annotated dataset for training. The annotation could be a labour intensive process. However semi-supervised approach used in this study can simplify this task and improve crop classification using deep learning techniques.
A study by Gao et al. used terrestrial data and faster R-CNN deep network along with regional growth algorithms for individual maize segmentation. This study used scanned 3D points LIDAR training data and sliced them into 3D window, then points within each window were compressed into deep images. Faster R-CNN deep model has trained with these images to detect maize stem. The detected stems in the images were mapped into 3D points, which were used as seed points for the regional growth algorithm to grow individual maize from bottom to up. The results reports that their method generates promising results in individual maize segmentation. The unsupervised maize stem clustering, waivers the labour intensive annotation process. However, the accuracy and precision of this approach is questionable.
Table 1 summarize the studies that used UAVs and deep learning for classification and segmentation of crops.
Iii Crop Counting and Yield Predictions
|Tri et al.||Paddy fields||800 UAV images||Google inception deep sparse model|
|Rahnemoonfar et al.||Tomato||Synthetic images||Modified Inception-ResNet layers|
|Dijkstra et al.||Not specified||10 sample UAV images + cell-nuclei dataset||CentroidNet model|
Growth of UAV industry from a niche to mainstream market, significantly lowered the cost of aerial imagery acquisition and paved the way to many interesting applications such as crop counting and yield predictions.
Tri et al.
proposed a novel approach based on deep learning techniques and UAVs for yield assessment of paddy fields. The proposed method consists of four stages including: image acquisition, image pre-processing, sampling, classifying the imagery by deep learning and yield assessment. The images were acquired by means of high resolution cameras mounted on UAV. They employed several pre-processing operations such as sliding window techniques, brightness/contrast adjustment, image rotation/flipping to the acquired images to enhanced and adopt the raw information for DNN model. They used Google inception deep sparse model to train and classify the yield value of each image. The statistic measure in rice bushes are manually collected within three small-area samples of paddy fields (about 1 square meter ). Then, they count amounts of nuts per bush, amounts of bushes per sample from which derives the yield of paddy fields per hectare. google inception is fairly large model with mostly require transfer learning methods to perform reasonably in custom dataset. However this study attempted to train this model from the ground solely with their proprietary data which undermines its performance.
A study by Rahnemoonfar et al.
proposed real-time yield estimation based on deep learning and UAV imagery. The proposed method in this study is capable to estimates the count of tomatoes explicitly from the glance of the entire image which reduces the overhead of object detection and localization. The proposed convolutional network was trained using synthetic images and tested on real images. Rahnemoonfaret al.
claimed their approach is robust and efficient even if there is illumination variance in the images and it can also count the tomatoes which are under shadows or occluded by foliage or overlap between tomatoes which is relatively bold claim. Their network includes a 7X7 convolution layer followed by 3X3 max pooling layer with stride of two pixels. The convolutional layers map the 3 bands (RGB) in the input image to 64 feature maps using a 7X7 kernel function. The feature maps then fed to 2 modified Inception-ResNet layers followed the normal convolutional layers. Inception-ResNet captures features at multiple sizes by concatenating the results of convolutional layers with different kernel sizes, and residual networks. this feature enables the proposed model to count tomatoes with different sizes. The experiment results claims that this method achieved over 91 percent of accuracy and less than 3 percents of residual error. The synthetic images that used to train the network are significantly different than the real-world actual aerial images of tomato and cant be used to train a deep network. A generative adversarial networks (GAN) or an autoencoder could be a better alternative to generate larger synthetic training samples from smaller real-world set of images.
In another study, Dijkstra et al. proposed a deep neural network model named CentroidNet for crop localization and counting. CentroidNet relies on centroids of image objects rather than bounding boxes and combines image segmentation and centroid majority voting to regress a vector field with the same resolution as the input image. Each vector in the field points to its relative nearest centroid which makes the CentroidNet architecture independent of image size. CentroidNet can be attached to a fully convolutional networks as backbone. This study used U-Net segmentation network as a basis. A dataset of 10 frames, captured by a low-cost UAV with resolution of 3840x2160 pixels comprising crops with various sizes and heavy overlap has been created to compare CentroidNet to the other networks such as YOLOv2. Experiment results indicate that CentroidNet outperformed other detectors in detection and localization accuracy. The major drawback of this study is limited number of sample images in dataset.
Table 2 summarize the studies that used UAVs and deep learning for crop counting and yield predictions.
Iv Crop Mapping
|Nijhawan et al.||Not specified||6834 multispectral images of vegetation and 8673 images of non-vegetation area||Pre-trained Alex-net architecture|
|Baeta et al.||Coffee||9 Hi-res images of images of coffee cultivation||Multiple ConvNet models|
Crop and vegetation mapping is an important strategic technique for managing yield and agricultural products in larger scale and over an extended period of time. In recent years UAV and deep learning techniques have been adopted in various crop mapping studies.
Nijhawan et al.
proposed a deep learning hybrid CNN framework for vegetation crop mapping. They employed an exhaustive combination of a vast number of input parameters including spectral bands, topographic and texture parameters. A new deep learning framework model that contains four individual CNNs combined together. Principle Component Analysis (PCA) used in order to find the most uncorrelated spectral, textural, and topographical information before passing them to each CNN. The proposed deep convolutional neural network architecture is based on the pre-trained Alex-net architecture which has been slightly modified to work with the data in this study. They used a network architecture consists of 8 layers, 5 out of which are convolutional layers and the rest are fully connected ones. ReLU activation function is used for every layer which introduces non-linearity into the model. Outputs from the CNN’s are then combined to form a final feature vector and classified using the SVM classifier. They used fairly large dataset, consist of 6834 images of vegetation area and 8673 for non-vegetation area which is one of the strength of this research. The proposed method in this research is computationally expensive. A daisy chained list of exhaustive spatial and frequency domain operations on top of a relatively deep Alex-net makes this model impractical for any real-world scenario.
Baeta et al. used deep learning features on multiple scales for coffee crop recognition and mapping. They combined deep learning and fusion/selection of features from multiple scales for coffee crop recognition and mapping. Multiple ConvNet models served as deep model in this research. The proposed approach is a pixel-wise strategy that consists in the training and combination of convolutional neural networks designed to receive as input different context windows around labeled pixels. Final maps are created by combining the output of those networks for a non-labeled set of pixels. The results of this study claimed that multiple scales produces better coffee crop maps than using single scales approach. The main contribution of this study is adaptation of established scaling technique for coffee crop recognition and mapping.
Table 3 summarize the studies that used UAVs and deep learning for crop mapping.
V Weed Detection
|Bah et al.||Weeds||45022 labeled and 17044 unlabeled aerial images||Background segmentation along with pretrained ResNet architecture|
|Liujun et al.||Morningglory, cocklebur, palmer amaranth||low-altitude and high-altitude UAV images||Thresholding based on centroid for initial semi-supervised segmentation|
|Ferreira et al.||Weeds in soybean crops||15 thousands high resolution UAV images||CaffeNet|
|Huang et al.||Weeds||91 UAV images||Modified VGG16 FCN and deconvolutional network|
|Sa et al.||Weeds||10,000 aerial UAV images||SegNet encoder-decoder|
Weeds are one of the major reasons for most agricultural yield losses. To deal with this threat, farmers resort to spraying the fields uniformly with herbicides. This method not only requires huge quantities of herbicides but impacts the environment and human health. Precision agriculture techniques allows to allocate the right doses of herbicide to the right place and at the right time which significantly reduces the costs as well as negative environmental impacts of herbicide. In recent years, UAVs are transforming to an aerial image acquisition system for weed localization and management. Despite notable advances in UAVs acquisition systems, the automatic segmentation of weeds remains a challenging problem because of their strong similarity to the crops. Deep learning techniques are capable to generalize complex classification and segmentation problems beyond what classic machine learning techniques were able to achieve.
Bah et al.[23, 24] proposed a deep learning based classification system for identifying weeds in vegetable fields such as spinach, beet and bean using high-resolution UAV imagery. They combined deep learning with background segmentation and line detection to classify weed from actual crop. Their method comprises three main phases. First, the crop rows were automatically detected and used to identify the inter-row weeds. In the second phase, inter-row weeds were used to constitute the training dataset of weed patches and Finally, convolutional neuronal networks used on this dataset to build a model able to detect the crop and the weeds in the images. ResNet architecture used for the classification. The proposed method is applied to high-resolution Unmanned Aerial Vehicles (UAV) images of vegetables taken about 20m above the soil. The results showed that the proposed method of weeds detection was effective in different crop fields. This study assumes that all vegetables are planted in the uniform rows and there is a distinctive borders in between each row and grown weeds. This assumption significantly cripples the applicability of this study in the real-world.
Liujun et al. proposed a real-time UAV weed scout for selective weed control by adaptive robust control and machine learning techniques. They used a UAV, capable to identify weeds from far above the field and close to the canopy, and measuring the plant/weed density (weed infestation rate)/weed species. They used thresholding technique to segment the greens and basically weed canopy from the background. The crop row and their centroid line was calculated and masked by its pixel density. Then,the anomalous weed patches between the crop rows was identified and its population was mapped. Convolutional Neural Network (CNN) used for weed species classification and probability assessment. The weed distribution maps and individual weed extraction used to generate the training data labels. The preliminary result of this study shows the specific weed type could be classified using this technique. This study relies on mediocre technique such as thresholding to segment the green regions from the presumably non-green background. Thresholding technique is extremely unreliable in real-world dynamic environment.
In another study Ferreira et al. proposed a weed detection in soybean crops using CNN. They used Dji Phantom 3 to capture over fifteen thousands high resolution images (4000x3000) of crop comprising images of the soil, soybean, broadleaf and grass weeds. Then, Simple Linear Iterative Clustering (SLIC) Superpixels algorithm used to segment the images and assist in the construction of an image dataset. Various feature extraction techniques including Gray-level co-occurrence matrix, Histogram of oriented gradients, Local binary patterns as well as color distributions used to generate the feature vector of this study. They employed CaffeNet CNN architecture which is similar to famous AlexNet structure to classify the soybean and weed pathces. The proposed algorithm claimed to have superior accuracy compare to classic machine learning techniques such as SVM, Adaboost and Random Forest. This research put a fairly large dataset of soybean plant together which can be extreemly beneficial for research in this domain. The proposed method in this study is accurate and outperforms the classic machine learning techniques. However the computational complexity of the proposed model is extremely high for any real-time application.
Huang et al. proposed a fully convolutional network for weed mapping using unmanned aerial vehicle (UAV) imagery. A modified version of VGG16 FCN used to classify the weeds from other specimen. However, to generate the localized map, they used deconvolutional layers to recreate the segmented image. They employed transfer learning to improve generalization capability, and skip architecture was applied to increase the prediction accuracy. Performance results of FCN architecture was compared with Patch-based CNN algorithm as well as Pixel-based CNN method. They claimed that FCN method outperformed techniques, both in terms of accuracy and efficiency. The major drawback of this study is lack of a mechanism to handle the scale variance. In a very similar study, Huang et al. 
proposed a semantic labeling approach for accurate weed mapping of high resolution UAV images. They adopted pretrained imageNet with residual framework in a fully convolutional form, and transferred and fine-tuned with their proprietary dataset. The ResNet and VGG-16 were used as the baseline classification architectures. They applied Atrous convolution to extend the field of view of convolutional filters. They also applied multi-scale processing to simultaneously employ several branches of Atrous convolution to feature map which expected to enhace the network capability in capturing objects at different scales. A fully connected conditional random field (CRF) was applied after the CNN to further refine the spatial details. The results claims that the proposed approach outperforms pixel-based-SVM and the classical FCN-8s. The main drawback of this study is its extremely small training dataset. However, they have attempt to address this issue with the use of pretrained models.
Sa et al. proposed a semantic weed mapping framework using aerial multispectral imaging and deep neural networks. They addressed several issues including limited ground sample distances (GSDs) in high-altitude datasets, sacrificed resolution resulting from downsampling high-fidelity images, and multispectral image alignment by adopting a stand sliding window approach that operates on only small portions of multispectral orthomosaic maps (tiles), which are channel-wise aligned and calibrated radiometrically across the entire map. To counter resolution loss, they defined the tile size to be the same as that of the DNN input. SegNet which is a popular encoder-decoder deep network used in this research. A fairly large dataset, consists of over 10,000 aerial images acquired using multispectral and RGB cameras. The results claimed that the proposed method outperformed existing approaches in both accuracy and efficiency. Use of multispectral and RGB cameras in conjuction with fairly large dataset is the main advantage of this study.
Table 4 summarize the studies that used UAVs and deep learning for weed detection.
Vi Disease and Nutrient Deficiency Detection
|Gennaro et al.||Grapevine leaf stripe disease||Hi-res multispectral images||NDVI map made from multispectral images|
|Julio at al.||Late blight in potato||Multispectral aerial images||Convolutional neural networks|
|Poblete et al.||Vine water status||Multispectral images at wavelengths 530, 550, 570, 670, 700 and 800 nm||MultiLayer Perceptron (MLP)|
|Ha et al.||Fusarium wilt infected radish||139 RGB images||Local Binary Patterns and VGG-A network|
|Hunag et al.||Helminthosporium Leaf Blotch||RGB aerial images||LeNet-5|
A significant challenge farmers continually grapple with are diseases to the crops. Early detection and diagnosis of crop diseases is crucial to reduce the damage to yield production and to further contain the disease infestation. Traditional methods of manually surveying the farms to identify infected plants and treat them is labour intensive and time consuming. Aerial surveying of large farms using satellite based technologies enables identification of infested areas, however, they are expensive in terms of time and cost, and usually cover large areas that makes it difficult to In this context, UAV are advantageous as they can offer an aerial surveying of the farm and with the appropriate sensors mounted, disease infested crop and crop regions can be quickly identified.
Crop disease and nutrient deficiency detection via UAV employ different types of camera sensors. Multispectral cameras are popular sensors mounted on UAV for disease and nutrient deficiency identification in several studies. For instance, Gennaro et al.  presented their study using multispectral cameras on UAV for identifying grapevine leaf stripe disease. The methodology computes the NDVI map from the high-resolution images obtained from the multispectral cameras. The NDVI map allowed analyses at each plant level and compared the NDVI with the foliar symptoms of the plants to find high correlation between the indices and the grape vine disease. Their methodology relies on statistical analyses and does not consider deep learning methods for analysing the multispectral images.
The research study presented by Julio at al.  applied deep learning methods for prediction of severity of late blight in potato crops caused by Phytophthora infestans. Their work used a UAV to capture images of different phenotype of potato crop in the fields with a multispectral sensor. Along with CNN, they considered other machine learning algorithms including random forests, multi layer perceptron and support vector regression. The ground truth data was generated with the help of experts who rated the severity of the infestation on the potato crop. Further, the authors exploited the spectral band differences from the multispectral images to create additional datasets with different band combinations to train the machine learning models. The results of their study showed that the random forest and the CNN models outperformed other models that were considered in the study in identifying infested potato crops from the UAV images.
Along with disease identification, spectral images of crop can be used to detect nutrient deficiencies. For instance, Poblete et al.  demonstrated that multispectral cameras mounted UAV to predict vine water status with the help of neural network models. In their study, they carried out five flights over vineyards during two seasons to account for variability of field and plant condition. NDVI was computed from the spectral images for soil and plant classification. A MultiLayer Perceptron (MLP) neural network was applied to several different spectral bands from the images to identify the best relationship between the neural network model and the water status. ANN models of different spectral band combinations were evaluated and an accuracy range between was obtained. The results demonstrated plant stresses such as nutrient components can be evaluated from spectral bands.
The results of the studies discussed demonstrate the capabilities of multispectral and hyperspectral sensors for disease and nutrient deficiency identification. However, spectral sensors are relatively more expensive and are more complex to analyse since data from multiple spectral bands need to be combined and computed to gain an insight into the disease identification and classification. With the advancements in deep learning methods and sensor technologies, studies have demonstrated that RGB cameras can be successfully employed for crop disease identification. For instance, Ha et al. 
proposed a CNN for detecting Fusarium wilt infected radish captured using high-resolution RGB cameras mounted on an UAV. Their system captures images of radish fields at low altitudes. The radish farm images is segmented into three regions, i.e. radish, ground and mulching film using a softmax classifier k-means clustering. A CNN is then applied for training on the segmented images and subsequent identification of health radish and Fusarian wilt of radish with 93% accuracy.
Similarly, Hunag et al.  conducted a study with a RGB camera on a UAV for identification and classification Helminthosporium leaf blotch (HLB) disease in wheat crop. The RGB camera mounted on top of a DJI Phantom4 UAV acquired images at a resolution of pixels. Ground investigation consisted of generating ground truth of the disease severity into four classes: normal, light, medium, and heavy. A CNN is trained to identify the four classes and results of up to 94% accuracy were obtained. Further, they compared the performance of the CNN with other methods such as SVM, histogram, and vegetation indices and found that CNN performs better in identifying and classifying the disease infestation.
Future agriculture will use sophisticated IoT technologies such as self driving agricultural machineries,temperature and moisture sensors, aerial images and UAVs, multi-spectral and hyper-spectral imaging devices and GPS and other positioning technology. The vast quantity of data, acquired by these new technologies paired with recent advancement in parallel and GPU computing, enabled researcher to to adopt and deploy data driven analysis and decision making techniques such as deep learning into agriculture domain. This advancements paved the way for resolving highly complex classification and segmentation tasks in precision agriculture. This study in particular investigated the use of deep learning and UAVs imagery as two major drivers in agricultural 4.0.
This study conducted a comprehensive comparison among various studies that used deep learning techniques and UAV based image acquisition and investigated their strengths and weaknesses. Based on the application, we categorized these studies into five major groups including: vegetation identification, classification and segmentation, crop counting and yield predictions, crop mapping, weed detection and crop disease and nutrient deficiency detection. We believe the potential benefits of deep learning and UAVs in agricultural industry are not fully compromised and there are several limitations and challenges to be addressed.
This work is co-funded by the EU-H2020 within the MONICA project under grant agreement number 732350. The Titan X Pascal used for this research was donated by NVIDIA
-  A. Kamilaris and F. X. Prenafeta-Boldú, “Deep learning in agriculture: A survey,” Computers and Electronics in Agriculture, vol. 147, no. July 2017, pp. 70–90, 2018.
-  C. Weltzien, “Digital agriculture or why agriculture 4.0 still offers only modest returns,” Landtechnik, vol. 71, no. 2, pp. 66–68, 2016.
-  V. Bloom, V. Argyriou, and D. Makris, “Hierarchical transfer learning for online recognition of compound actions,” Comput. Vis. Image Underst., vol. 144, no. C, pp. 62–72, Mar. 2016. [Online]. Available: https://doi.org/10.1016/j.cviu.2015.12.001
V. Bloom, D. Makris, and V. Argyriou, “Clustered spatio-temporal
manifolds for online action recognition,” in
2014 22nd International Conference on Pattern Recognition, Aug 2014, pp. 3963–3968.
-  B. Ozdogan, A. Gacar, and H. Aktas, “Digital agriculture practices in the context of agriculture 4.0,” Journal of Economics, Finance and Accounting (JEFA), vol. 4, pp. 184–191, 2017.
-  Z. Khan, V. Rahimi-Eichi, S. Haefele, T. Garnett, and S. J. Miklavcic, “Estimation of vegetation indices for high-throughput phenotyping of wheat using aerial imaging,” Plant methods, vol. 14, no. 1, p. 20, 2018.
-  G. Lindner, K. Schraml, R. Mansberger, and J. Hübl, “Uav monitoring and documentation of a large landslide,” Applied Geomatics, vol. 8, no. 1, pp. 1–11, 2016.
-  H. Huang, J. Deng, Y. Lan, A. Yang, X. Deng, and L. Zhang, “A fully convolutional network for weed mapping of unmanned aerial vehicle (UAV) imagery,” PLoS ONE, vol. 13, no. 4, 2018.
-  V. Argyriou and T. Vlachos, “Quad-tree motion estimation in the frequency domain using gradient correlation,” IEEE Transactions on Multimedia, vol. 9, no. 6, pp. 1147–1154, Oct 2007.
-  V. Argyriou and T. Vlachos, “Performance study of gradient correlation for sub-pixel motion estimation in the frequency domain,” IEE Proceedings - Vision, Image and Signal Processing, vol. 152, no. 1, pp. 107–114, Feb 2005.
-  V. Argyriou, “Sub-hexagonal phase correlation for motion estimation,” IEEE Transactions on Image Processing, vol. 20, no. 1, pp. 110–120, Jan 2011.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
-  Y. Chunjing, Z. Yueyao, Z. Yaxuan, and H. Liu, “Application of convolutional neural network in classification of high resolution agricultural remote sensing images,” International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, vol. 42, no. 2W7, pp. 989–992, 2017.
-  Y. Shi, S. Ji, C. Zhang, Y. Duan, and A. Xu, “3D Convolutional Neural Networks for Crop Classification with Multi-Temporal Remote Sensing Images,” Remote Sensing, vol. 10, no. 2, p. 75, 2018.
-  J. Rebetez, H. Satizábal, M. Mota, D. Noll, L. Büchi, M. Wendling, B. Cannelle, A. Pérez-Uribe, and S. Burgos, “Augmenting a convolutional neural network with local histograms—a case study in crop classification from high-resolution uav imagery,” in European Symp. on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2016, pp. 515–520.
-  Z. Fan, J. Lu, M. Gong, H. Xie, and E. D. Goodman, “Automatic Tobacco Plant Detection in UAV Images via Deep Neural Networks,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 3, pp. 876–887, 2018.
-  S. Gao, T. Hu, Y. Jiang, S. Jin, F. Wu, W. Li, S. Pang, J. Liu, Y. Su, S. Chen, D. Wang, and Q. Guo, “Deep Learning: Individual Maize Segmentation From Terrestrial Lidar Data Using Faster R-CNN and Regional Growth Algorithms,” Frontiers in Plant Science, vol. 9, no. June, pp. 1–10, 2018.
-  N. C. Tri, T. Van Hoai, H. N. Duong, N. T. Trong, V. Van Vinh, and V. Snasel, “A novel framework based on deep learning and unmanned aerial vehicles to assess the quality of rice fields,” in International Conference on Advances in Information and Communication Technology. Springer, 2016, pp. 84–93.
-  M. Rahnemoonfar and C. Sheppard, “Real-time yield estimation based on deep learning,” Autonomous Air and Ground Sensing Systems for Agricultural Optimization and Phenotyping II, vol. 10218, no. May 2017, p. 1021809, 2017.
-  K. Dijkstra, J. van de Loosdrecht, L. Schomaker, and M. Wiering, “Centroidnet: A deep neural network for joint object localization and counting,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2018, pp. 585–601.
R. Nijhawan, H. Sharma, H. Sahni, and A. Batra, “A deep learning hybrid CNN framework approach for vegetation cover mapping using deep features,”Proceedings - 13th International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2017, vol. 2018-Janua, pp. 192–196, 2018.
-  R. Baeta, K. Nogueira, D. Menotti, and J. A. Dos Santos, “Learning Deep Features on Multiple Scales for Coffee Crop Recognition,” Proceedings - 30th Conference on Graphics, Patterns and Images, SIBGRAPI 2017, pp. 262–268, 2017.
-  M. D. Bah, A. Hafiane, and R. Canals, “Deep Learning with unsupervised data labeling for weeds detection on UAV images,” pp. 1–11, 2018. [Online]. Available: http://arxiv.org/abs/1805.12395
-  M. D. Bah, A. Hafiane, and R. Canal, “Weeds detection in uav imagery using slic and the hough transform,” in 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA). IEEE, 2017, pp. 1–6.
-  L. Li, Y. Fan, X. Huang, and L. Tian, “Real-time uav weed scout for selective weed control by adaptive robust control and machine learning algorithm,” in 2016 ASABE Annual International Meeting. American Society of Agricultural and Biological Engineers, 2016, p. 1.
-  A. dos Santos Ferreira, D. Matte Freitas, G. Gonçalves da Silva, H. Pistori, and M. Theophilo Folhes, “Weed detection in soybean crops using ConvNets,” Computers and Electronics in Agriculture, vol. 143, no. November, pp. 314–324, 2017. [Online]. Available: https://doi.org/10.1016/j.compag.2017.10.027
-  H. Huang, Y. Lan, J. Deng, A. Yang, X. Deng, L. Zhang, and S. Wen, “A semantic labeling approach for accurate weed mapping of high resolution UAV imagery,” Sensors (Switzerland), vol. 18, no. 7, 2018.
-  I. Sa, M. Popović, R. Khanna, Z. Chen, P. Lottes, F. Liebisch, J. Nieto, C. Stachniss, A. Walter, and R. Siegwart, “WeedMap: A large-scale semantic weed mapping framework using aerial multispectral imaging and deep neural network for precision farming,” Remote Sensing, vol. 10, no. 9, 2018.
-  S. F. Di Gennaro, E. Battiston, S. Di Marco, O. Facini, A. Matese, M. Nocentini, A. Palliotti, and L. Mugnai, “Unmanned aerial vehicle (uav)-based remote sensing to monitor grapevine leaf stripe disease within a vineyard affected by esca complex,” Phytopathologia Mediterranea, vol. 55, no. 2, pp. 262–275, 2016.
-  J. Duarte-Carvajalino, D. Alzate, A. Ramirez, J. Santa-Sepulveda, A. Fajardo-Rojas, and M. Soto-Suárez, “Evaluating late blight severity in potato crops using unmanned aerial vehicles and machine learning algorithms,” Remote Sensing, vol. 10, no. 10, p. 1513, 2018.
-  T. Poblete, S. Ortega-Farías, M. Moreno, and M. Bardeen, “Artificial neural network to predict vine water status spatial variability using multispectral information obtained from an unmanned aerial vehicle (uav),” Sensors, vol. 17, no. 11, p. 2488, 2017.
-  J. G. Ha, H. Moon, J. T. Kwak, S. I. Hassan, M. Dang, O. N. Lee, and H. Y. Park, “Deep convolutional neural network for classifying fusarium wilt of radish from unmanned aerial vehicles,” Journal of Applied Remote Sensing, vol. 11, no. 4, p. 042621, 2017.
-  H. Huang, J. Deng, Y. Lan, A. Yang, L. Zhang, S. Wen, H. Zhang, Y. Zhang, and Y. Deng, “Detection of helminthosporium leaf blotch disease based on uav imagery,” Applied Sciences, vol. 9, no. 3, p. 558, 2019.