Timely detection of plant diseases and pests is one of the major challenges in agriculture sector. Many diseases and pests, if not timely cured, can have devastating effect on the overall production of crops. Rice occupies about 70 percent of the grossed crop area and accounts for 93 percent of total cereal production in Bangladesh coelli2002technical. Rice also ensures food security of over half the world population fao. In Bangladesh, more than 15 diseases and pests of rice are seen throughout the year miah1985survey. Researchers have observed 10-15% average yield loss because of 10 major diseases of rice in Bangladesh BRRI So, it is very important to detect the diseases of rice timely for ensuring a sustainable production of rice. Currently, when a rice disease outbreak occurs somewhere, rice disease specialists of different agriculture research centers or agriculture officials appointed by the government visit the place, and give advice to the farmers. So, quick remedy is not possible in many cases. Sometimes farmers themselves have to go to rice disease specialists to seek for advice. In many areas, there are not adequate rice disease specialists compared to the number of farmers. There is a great need for automatic rice disease detection using easily available devices in rural areas .
Deep learning techniques have shown a great promise in image classification. In recent years, they have been used to analyze diseases of tea karmokar2015tea, apple wang2017automatic, tomato fuentes2017robust, grapevine, peach, and pear sladojevic2016deep. In most of the cases, they have used leaves or fruits to detect the diseases from the images. In many of these cases, they have used images from homogeneous backgrounds. Moreover, in most cases, the datasets have been crawled from different internet sources. There are some fundamental differences regarding the pattern of diseases between rice plants and the above mentioned plants. First of all rice leaves are narrow in width and diseases can occur in any part of the leaves. Second, in addition to leaves, the diseases and pests of rice plant can affect both stem and grain. Third, the healthy area and the affected area of the rice plants do not have any significant contrast in color. All these factors make it extremely difficult to collect and label the affected rice plants and finally to recognize the correct disease or pest.
Two studies related to rice disease detection can be found in lu2017identification and atole2018multiclass. Lu et al. conducted a study on detecting 10 different rice plant diseases using a small handmade CNN architecture inspired by older deep learning frameworks such as LeNet-5 and AlexNet lu2017identification. They used only 500 images. Some of them were collected from the field. The rest were collected from agricultural pest and insect pests picture database. Though they have reported a high accuracy of 95.48%, We have got only 82% accuracy on our test set using their model. Atole et al. used AlexNet to distinguish among three classes - normal rice plant, diseased rice plant and snail infected rice plant atole2018multiclass. They used only 227 images taken from rice field.
Our research is the first comprehensive study on rice disease and pest detection using deep convolutional neural networks. From the field survey in BRRI for the duration of seven months from December, 2017 to June, 2018, we have collected approximately 1500 images of different diseases and pests occurring in the field. BRRI cultures many diseases and pests round the year and collect samples for their research. To collect those data, we have jointly worked with scientists of BRRI to gain agricultural insight regarding the major diseases and pests of rice plants. In our dataset, there are six different diseases and three different pests. The six diseases are - False Smut, Sheath Blight, Sheath Rot, Bacterial Leaf Blight (BLB), Neck Blast and Brown Spot. The three pests are - Brown Plant Hopper (BPH), Stemborer and Hispa. We could not collect enough field data for Ufra and Leaf Blast disease. We also could not find enough occurrences of Gall Midge and Leaf Roller pest in the field. We have excluded them from this research. We have kept Sheath Blight and Sheath Rot disease in the same class, because their treatment method and place of occurrence are the same. We have also created a separate class for healthy plant. Finally, we have total nine classes - five classes for disease, three classes for pest and one class for healthy plant.
Our focus is to build a deep convolutional neural network which can recognize if a rice plant is healthy (Figure 0(a)) or not, and if not healthy, which disease it has been suffering from. It can also differentiate between a diseased leaf and a dead leaf (Figure 0(b)). It can detect and recognize diseases on any part of the plant, whether it be leaf, stem (Figure 0(c)) or grain (Figure 0(d)). A rice disease may show different symptoms according to weather and soil. We have taken this issue into consideration. In Figure 1(a) and 1(b), we see two different symptoms of False Smut disease. Pest attack can show different symptoms according to their stage of attack. We have taken this into consideration. The symptoms of two stages of attack of Brown Plant Hopper (BPH) have been shown in Figure 1(c) and 1(d). Our model is expected to do well in real life scenario, because most of our training images were collected in heterogeneous background.
We have used eight different types of convolutional neural network architectures. They are - VGG16, ResNet50, InceptionV3, InceptionResNetV2, Xception, DenseNet121, DenseNet169 and DenseNet201. For each of them, we have used fine tuning, transfer learning and training from scratch to assess their performance. In fine tuning, we initialize the convolution layer weights using pre-trained imagenet weights and then train all the network layers from that point. In transfer learning, we fix the convolution layer weights as the pre-trained imagenet weights and train only the dense layers. When we train from scratch, we train all the network layers from randomly initialized weights. For all of our architectures, fine tuning the model while training has given the best result. For all three training methods, VGG16 architecture has consistently shown very high accuracy on the test set.
In summary, we have made the following contributions:
We have collected the largest dataset of rice diseases and pests of 1500 images in real life scenerio. They cover eight classes of rice disease and pest. We expect this dataset to facilitate further research on rice diseases and pests.
We have used eight state-of-the-art CNN architectures for automatic rice disease and pest classification. We have used three different training methods on each of the architectures.
We have shown in our experiments that for each of our architectures, fine tuning the CNN model on our dataset gives the best accuracy. Fine tuned VGG16 architecture on our training dataset has given us the best accuracy of 99.53% on our test set.
We describe the previous related works in the field of automatic plant disease detection in Section 2. We put forward the challenges in our approach in Section 3. We describe the data collection process in detail in Section 4. Then we describe the methodology that we use for making our approach a success in Section 5. The experimental setup of our research has been described in Section 6. Next we discuss the results and findings of our research in Section 7. Finally, we suggest some possible improvements in our current research to the upcoming researchers on this topic and conclude the paper in Section 8.
2 Related Works
2.1 Automated Rice Disease Detection Approach
Convolutional neural network based classifier was used to detect rice plant anomaly in atole2018multiclass
. There were three classes in total which included normal, diseased and snail infected rice plant. They collected 227 images in rice fields from nearby districts. They used transfer learning on AlexNet which is a relatively small scale and old CNN architecture. They used various image augmentation techniques on these collected images and trained their CNN model for ten epochs managing to get test accuracy of 91.23%. During the classification task, this approach did not consider the specific class of disease that may affect rice plant. The model can only tell if the plant is affected by disease or not. The dataset contained only 227 images. So, it is likely that the model is not generalized to rice plants outside the image collection region. Many better performing CNN architectures have appeared after AlexNet. They were not used in this paper.
In lu2017identification, the authors used CNN to detect ten rice diseases- rice blast, rice false smut, rice brown spot, rice bakanae disease, rice sheath blight, rice sheath rot, rice bacterial leaf blight, rice bacterial sheath rot, rice seeding blight and rice bacterial wilt. They used a CNN architecture inspired by LeNet-5 and AlexNet, both of which are old architectures. Their dataset contained 500 images of healthy and diseased rice leaves and stems. Most of the images were captured from the field using digital camera, while some other images were collected from agricultural pest and insect pests picture database. The images were resized to size 512
512 before training. The authors applied normalization, PCA and Whitening as preprocessing steps. The trained CNN model achieved an accuracy of 95.48% on the test set. An interesting aspect is that they used stochastic pooling, unlike max pooling used by most of the newer architectures. They argued that stochastic pooling enhances the generalization ability of the CNN model and prevents overfitting. The drawback here is that 500 images for 10 classes is a very small number when it comes to convolutional neural network. This particular CNN architecture does not work well on our dataset where each image has been resized to 224224. For a comparatively large image size of 512512 which the authors of this paper used, the CNN architecture has given poor training and validation accuracy on our dataset.
2.2 Automated Disease Detection Approach for Other Plants
In mohanty2016using, the authors used a deep convolutional neural network to detect disease from leaves. They trained the neural network with 54306 images of 14 crop species, which represented a total of 26 diseases along with healthy leaves. They used the images made openly available through the project PlantVillage. They used colored, grayscale and segmented images to train the model. Though the accuracy was 99.35% on held-out test set, the accuracy fell to 31.4% when tested on another verified dataset of 121 images captured in real life scenerio. But a disease may appear on other parts of the plant too, which the authors did not consider. All the images of PlantVillage dataset have homogeneous background, but real time captured images have heterogeneous background most of the time.
In sladojevic2016deep, the authors used CaffeNet model to recognize 13 different types of plant diseases. They considered a total of 15 classes- 13 classes for diseases, one class for healthy leaves and one class for background. The plants included apple, peach, grapevine, pear etc. The authors used affine transformation, perspective transformation and rotation to increase the number of images. But they collected all the images for training from the Internet. Images in the Internet are often mislabeled and differ greatly from the real world images taken in the field, which may introduce data mismatch error.
Neural network ensemble (NNE) has also been used to recognize five different diseases of tea plant from tea leaves karmokar2015tea. The authors achieved an accuracy of 91% on the test set. The dataset consisted of only 50 images, 10 images of each class. They used various image processing techniques which will not be effective in case of heterogeneous background.
A feed forward back propagation neural network was built from scratch in babu2007leaves to detect the species of plant from leaf image. The percentage of leaf affected by diseases or pests was also detected. The neural network did not classify different diseases. The interesting part is that no convolutional neural network was used here. Image edges were detected using Prewitt edge detection algorithm. This algorithm generates tokens. Thinning algorithm makes multiple pixel edge into single pixel edge. The inputs for this neural network are the individual tokens of a leaf image. As a token normally consists of a cosines and sinus angle, the amount of input layers for this network are the amount of tokens multiplied by two. The problem with this particular paper is that it needs a finely scanned image of leaf. So, different image capture conditions and non-homogeneous backgrounds may cause failure.
Some authors used visual spectrograph in ambient lighting conditions to detect yellow rust infestation on winter wheat moshou2004automatic
. They investigated the difference in spectral reflectance between healthy and diseased wheat plants at an early stage in the development of the disease. A spectrograph was mounted at spray boom level which took in-field spectral images. The classification accuracy was found to be around 99% when multilayer perceptrons were used. This approach incorporates a complex method, which needs sophisticated instruments that are not available to the mass people. Ambient lighting conditions are not always available in the field as well. So, it is not very user friendly, though it shows a very high accuracy.
In ferentinos2018deep, the authors used an open database of 87,848 images, containing 25 different plants. There are 58 distinct classes of [plant, disease] combinations, including healthy plants in the dataset. Five CNN architectures which included AlexNet, AlexNetOWTBn, GoogLeNet, Overfeat and VGG were used for identifying plant diseases from the images of their leaves. The task is to return a [plant, disease] combination when a leaf image is provided. This dataset contains images captured in both laboratory condition and in real life scenerio. 80/20 splitting ratio has been followed in train/test set in this paper. VGG achieved the highest test accuracy of 99.53% among the architectures used. There were only 12 classes among the 58 classes, which contain both real-life image and laboratory image in this open dataset. The models were trained by putting all real-life images in the training set and all laboratory images in the test set of these 12 classes and vice versa. This experiment showed training with real-life images to have a much better accuracy than the opposite case. This approach did not include any data collection. Both training set and test set were from the open dataset. Diseases occurring in parts of plants other than leaves were not considered. 46 classes did not have any real-life images to train or test with. Simultaneous occurrence of multiple disease in plant leaf and stage of any of the diseases were also not considered.
2.3 Plant Disease Stage Detection and Disease Forecasting
Recently some works have been done in recognizing the stage of the disease. In wang2017automatic, the authors detected four severity stages (healthy stage, early stage, middle stage, and end stage) of apple black rot disease using PlantVillage dataset. They used two different types of training methods: training small convolutional neural networks of different depths from scratch, and fine tuning four state-of-the-art deep learning models such as VGG16, VGG19, Inception-v3, and ResNet50. The best model they found was the fine-tuned VGG16 model, which achieved an accuracy of 90.4% on the test set. The scope and dataset of their work were limited. It is possible to extend their work by detecting the stages of other diseases of apple or even some other plants. Homogeneous background of their dataset makes it somewhat impractical for real life use.
Bhagawati et al. trained a neural network with weather parameters such as temperature, relative humidity, rainfall and wind speed to forecast rice blast diseasebhagawati2015artificial. They used Feed Forward Multilayer Perceptron Architecture with two hidden layers to analyze the ambient environmental conditions to predict disease risk, with all input data normalized to fit the range [-1, 1]. It is actually a regression model for prediction. The prediction accuracy of the model was found to be between 81-87% when data of the same site were used. But they did not test the accuracy of the model in other sites.
2.4 Plant Disease Localization
A real time tomato plant disease detector was built using deep learning in fuentes2017robust
. They considered simultaneous occurrence of multiple diseases and pests, and they also considered different infected areas like stem, leaves, fruits etc. They also collected images of different stages of the same disease. The dataset consisted of about 5000 images collected from different farms of Korea. Most of these images had heterogeneous background. Several geometric and intensity transformations were used to increase the number of images. The authors used three main families of detectors: Faster Region-based Convolutional Neural Network (Faster R-CNN), Region-based Fully Convolutional Network (R-FCN), and Single Shot Multibox Detector (SSD), which they considered as “deep learning meta-architectures”. Each of these meta-architectures were combined with “deep feature extractors” such as VGG16 and Residual Network (ResNet). Their models both recognized and localized nine different diseases and pests with the best accuracy of 85.98%. The diseases, pests and other syndromes they considered were (a) Gray mold, (b) Canker, (c) Leaf mold, (d) Plague, (e) Leaf miner, (f) White fly, (g) Low temperature, (h) Nutritional excess or deficiency, (i) Powdery mildew. This paper is very comprehensive in dealing with tomato diseases. Localizing a disease is not necessary most of the time, because plants are not like human bodies. Besides, tomato leaves are big and wide in size, which makes the recognition process easier compared to rice leaves.
There are many challenges in correctly identifying rice diseases and pests. We have overcome the limitations of the previous works on this field. We have also kept the challenges mentioned in barbedo2016review in our mind while using CNN architectures for classification.
3.1 Image Collection
People working in the agriculture institutes and agriculture universities of Bangladesh do not generally collect images of rice diseases and pests except for presentation and demonstration purposes in a very small scale. We had to collect them from the paddy fields. While collecting images, we have tried to capture images in various capture conditions. We have taken images in windy, sunny and rainy weather. We have also captured images of diseased parts in both summer and winter. This has helped with training the model in a way that it can do well in real life scenerio in any possible weather. Heterogeneous background is an indispensable feature of any real life image. We have considered the presence of human, colored sheet, rice field, human body part and many other possible backgrounds while capturing our desired disease and pest images. Our models have achieved a very high classification accuracy on images captured in realistic condition.
3.2 Increasing Classification Accuracy
When someone tries to capture the image of diseased area of a rice plant in a rice field, he is likely to capture an image with a background composed of other rice plants, soil, humans, and many other objects. Heterogeneous background makes it quite difficult to segment the region of interest. The symptoms of many rice diseases do not have any well defined boundary. Rather, the color of the diseased area gradually fades into the healthy part of the plant. As a result, image segmentation before using neural network becomes almost impossible. We have tried to keep the number of classes as small as possible using in depth agriculture related knowledge. We have used various image augmentation techniques on our training set to increase accuracy.
3.3 Training Our Model
Each of the convolutional neural network architectures we have used has a very large number of trainable parameters. For example, VGG16 has 138 million parameters. Training these CNN architectures from scratch or fine tuning these architectures take a lot of time in non-GPU environments. We have used a remote server to get fairly good amount of GPU which has benefited us with much smaller training time. The GPU instance is shared.
4 Data Collection
Rice diseases and pests occur in different parts of the rice plant. Their Occurrence depends on many factors, such as temperature, humidity, rainfall, variety of rice plant, season, nutrition etc. So, the task of data collection in field level is a lengthy and challenging task.
4.1 Classes Considered
We have a total of five classes for disease, three classes for pest and one class for healthy plant. Symptoms of different diseases are seen at different parts of the rice plant such as leaf, stem and grain. Bacterial Leaf Blight disease, Brown Spot disease, Brown Plant Hopper pest (at its late stage) and Hispa pest occur on rice leaf. Sheath Blight disease, Sheath Rot disease, Brown Plant Hopper pest (at its early stage) occur on rice stem. Neck Blast disease and False Smut disease occur on rice grain. Stemborer pest occurs on both rice stem and rice grain. So, we have considered all these parts while capturing images.
To prevent our model from being confused between dead parts and diseased parts of rice plant, we have collected enough images of dead leaf, dead stem and dead grain of rice plants. Images of the dead parts of the plant are considered in the class of healthy plant. We consider a total of nine classes. A sample image of each class has been provided in Figure 3. Sheath Blight, Sheath Rot and their simultaneous occurrence have been considered in the same class, because their treatment method and place of occurrence are the same.
4.2 Quantity of Image Data
We have captured 1426 images of rice plants infected with diseases and pests along with healthy rice plant from the field of BRRI from December, 2017 to June, 2018 for a total of seven months. The total number of images of these different classes that we have collected is shown in Table 1. Some major pests such as Gall Midge and Leaf Roller are not included in this study as we could not get enough of these pests in the field. Moreover, Ufra and Leaf Blast disease occurred in small scale in rice field during our study. There were very small amount of these diseases in the rice fields during our data collection period. But Image capture conditions and image backgrounds from field are certainly different from images captured in the nursery. The low amount of data is not sufficient to train a neural network. We have not included these two diseases.
4.3 Variation in Same Class Data
We have created necessary variations while collecting data from the field. The more the variation in the data set, the better is the generalization of the trained model. This means that a model trained with a data set with a lot of variations will be able to generalize and perform well on test set. Four different types of camera have been used in capturing the images. The images have been captured in the rice field in real life scenerio. We have captured images in different types of backgrounds. In some images, the background is the surroundings of the field, and in some other images, the background is our hand or papers of different colors. This makes our model robust to any change in background. Weather conditions are also different at different times. Some images have been captured in overcast conditions, some have been captured in sunny weather. False Smut, Stemborer, Healthy Plant class, Sheath Blight and/or Sheath Rot class have multiple types of symptoms. We have covered all the symptoms of these classes. Moreover, early stage symptoms of Hispa and Brown Plant Hopper are different from their later stage symptoms. We have also covered these aspects while collecting data of these classes.
|Class Name||No. of Collected Images|
|Brown Plant Hopper (BPH)||71|
|Bacterial Leaf Blight (BLB)||138|
|Sheath Blight and/or Sheath Rot||219|
4.4 Agriculture Specific Knowledge
We have used in depth knowledge of agriculture in order to reduce the number of classes for better accuracy. Our goal is to help farmers take fast and appropriate initiative in the face of disease infection in the rice field by developing a convolutional neural network based rice disease detection system. Brown Spot disease often occurs with other diseases in the same plant. We have considered only the disease occurring with Brown Spot as a separate class, because when a particular disease occurs with Brown Spot disease, treatment for that particular disease is applied only. No separate treatment is applied for Brown Spot disease. Treatment for Brown Spot is applied only when it occurs as a single disease. We have also kept Sheath Blight, Sheath Rot and their simultaneous occurrence in the same class. Symptoms of Sheath Blight and Sheath Rot look quite similar in many cases and sometimes they are seen to occur together. These two diseases attack the same organ of rice plant (rice stem), and have the same treatment method. So, they have been kept in the same class. The different stages of the diseases and pests which we have considered do not differ in terms of treatment method. So, we have not created different classes for different stages of attack of the same disease or pest.
4.5 Image Processing Techniques
A lot of images are needed for each class in order to train the convolutional neural network. We have used a random mixture of the various image augmentation techniques in order to create eight images from each of our captured image. We have associated a probability with each image augmentation technique. Based on the probability, a particular technique is used or is not used while generating a particular augmented image.
We have used random rotation from -15 degree to 15 degree. We have used rotations of multiple of 90 degree at random. Convolutional neural network classification is not rotation invariant in general. So, these two transformations are of high importance, and they are assigned a high probability. Other transformations such as random distortion, shear transform, vertical flip, horizontal flip and skewing have also been used. Every augmented image is the result of a particular subset of all these transformations. The results of the combinations of these geometric transformations are shown in5. We have also generated two more images from each image of different intensity using intensity transformation 4. This kind of variation in training data helps the model to actually learn the features of each disease rather than just memorizing training set examples for each class. Memorization of training data causes overfitting and leads to poor test set accuracy.
5 Our Solution
Our work aims to effectively detect eight classes of diseases and pests that affect rice plants along with healthy rice plants using a deep convolutional neural network. The overview of our system is shown in Figure 6.
We have collected a large dataset from the field of BRRI by capturing images of rice plants infected with diseases and pests in real life scenario described in Section 4. We have a total of nine classes- five classes for diseases, three classes for pests and one class for healthy plant. We annotate the images to train our convolutional neural network by putting the images of different classes in separate folders. We randomly pick 70% of all the images of each class and put them into training set. Similarly, another 15% of the images of each class are put into the validation set and the rest of the images are put into the test set. The intersection of training set, validation set and test set is empty. Next we increase the number of images in the training set ten times using different image augmentation techniques and intensity transformations. The number of images of each class in training set, validation set and test set are shown in Table 2.
|Class (Disease/Pest)||No. of images|
|Training set||Validation set||Test set|
|Brown Plant Hopper (BPH)||560||7||8|
|Bacterial Leaf Blight (BLB)||1130||14||11|
|Sheath Blight and/or Sheath Rot||1660||23||30|
We use Keras framework
with tensorflow backend to train our models with our training set. After each epoch of training, we evaluate the performance of the model on our validation set. If the validation accuracy exceeds the best validation accuracy we get so far, we save this model. In this way, we always save the model with the best validation accuracy so far during training. After training for 150 epochs, we stop training and evaluate the performance of the model on our test set.
We use some modern CNN architectures such as - VGG16, InceptionV3, ResNet50, Xception, InceptionResnetV2, DenseNet.We have used three versions of DenseNet - DenseNet121, DenseNet169 and DenseNet201. Each of the architectures has some unique characteristics. VGG16 VGG16 is a sequential convolutional neural network using 33 convolution filters. After each maxpool layer, the number of convolution filters gets doubled in VGG16. InceptionV3 Inception is consisted of inception blocks. In each inception block, convolution filters of various dimensions and pooling are used on the input in parallel. They are concatenated along their channels just before providing output. Resnet50 Resnet is a very deep convolutional neural network with skip connections from earlier layer apart from the direct connection
from the immediate previous layer. InceptionResnetV2 InceptionResnet combines the notion of parallelism of Inception architecture and skip connection of Resnet architecture. Xception Xception is the result of
depthwise separable convolution which implies that spatial convolution and cross channel convolution are completely separated. In Densenet Densenet, each layer is directly connected to every other layer in a feed-forward fashion (within each
dense block). For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers.
We use three variations of training methods for each of the architectures.
Baseline training: In this method, we train all the network layers from scratch. We randomly initialize all the layers and train them from scratch. This method of training takes a lot of time to converge but produces fairly good accuracy. We denote this training method as B.
Fine Tuning: In this training method, we keep the pre-trained imagenet weights of the convolution layers in tact. We randomly initialize the weights of densely connected layers only. Then we train all the layers until convergence. It is to note that the convolution layers are trained from their pre-trained imagenet weights, and the dense layers are trained from randomly initialized weights. We denote this method as FT.
Transfer Learning: In this method, we do not train the convolution layers of the CNN architectures at all. Rather we keep the pre-trained imagenet weights. We only train the dense layers from their randomly initialized weights. We denote this method as TL.
We resize all the images of our dataset to the default image size of each architecture before working with that architecture. This makes training and validation step in each epoch faster compared to run-time resizing. For example, we resize all the images of our training, validation and test set to 299299 pixel size before working with architectures such as Xception, InceptionV3 and InceptionResNetV2. Similarly, for other models, all the images are resized to 224224 pixel size.
6 Experimental Setup
In experimental setup, we determine performance evaluation metric of the CNN architectures, form their general structure, identify hyperparameters, tune those hyperparameters and set up environment for conducting experiment.
Performance metric: We use accuracy as our performance metric. We save the model weights after the epoch in which we have the best validation accuracy. In our data set, number of samples is fairly distributed among the nine classes. Accuracy is a good performance measure when number of samples is not biased towards any particular class.
Loss function: We use Categorical crossentropy
as our loss function. We have total ten classes. This loss function is actually a log loss version of multiclass case.
Basic Architecture: We use several CNN architectures such as VGG16, InceptionV3 etc. We remove the top three layers of the original convolutional neural network architectures. We flatten the output of the last layer among the remaining layers of the architectures. The remaining convolutional layers carry the pre-trained weights from imagenet classification task. As a result, these architectures are already well-acquainted with basic image features. We add three densely connected layers with reluactivation function and one dense layer with softmax activation function on top. This topmost layer with softmax activation function contains nine nodes for the nine classes.
Hyperparameter Tuning: Hyperparameters are not trainable. We fix their values at the start of training. We consider validation accuracy while tuning hyperparameters. We tune two hyperparameters. Dropout rate
is the first hyperparameter that we tune. A dropout rate of 0.3 means that our model will ignore 30% of the neurons of the previous layer at random. It helps to reduce overfitting. We add dropout layer after each dense layer except for the last layer. We test our models with dropout rate of 0.3, 0.4 and 0.5. Dropout rate of 0.3 has given the best result in general. The second hyperparameter that we tune isLearning rate. Learning rate determines how fast our model weights are adjusted in order to get to the local or global minima of our loss function. We have tested our model with learning rate of 0.01, 0.001, 0.0001 and 0.00001. In terms of convergence speed and accuracy, learning rate of 0.0001 has given the best result.
We use Adaptive Moment Estimation (Adam) for training our models. It is a method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients, Adam also keeps an exponentially decaying average of past gradients. It combines the advantages of two other extensions of stochastic gradient descent - AdaGrad and RMSProp.
Experimental Environment: We use a remote Red Hat Enterprise Linux Server of RMIT University for training our convolutional neural network architectures. The processor is Intel (R) Xeon (R) CPU E5-2690, whose clock speed is 2.60 GHz. It has 56 CPUs with two threads per core. There is 503 GB RAM available to us. Each user can use upto 1 petabyte of storage. There are also two GPUs available. Both of them are 16 GB NVIDIA Tesla P100-PCIE GPU.
7 Experimental Evaluation
We have trained our dataset with eight state-of-the-art convolutional neural network architectures. They are - VGG16, ResNet50, InceptionV3, InceptionResNetV2, Xception, DenseNet121, DenseNet169 and DenseNet201. The performance of these CNN architectures for fine tuning (FT), transfer learning (TL) and training from scratch(B) is shown in Table 3.
To compare our approach with an existing approach lu2017identification, we trained the model used in lu2017identification with our dataset. If we set image size to , we get 87% accuracy on test set. The image size mentioned in lu2017identification is . This image size gives validation and test accuracy below 30%.
Here we see that all of our architectures have given their best accuracy on test set when we fine tune them from pre-trained imagenet weights. Moreover, test accuracy of the architectures when trained from scratch is comparable to the test accuracy of corresponding architectures when fine tuning them from imagenet weights. For example, the test accuracy of DenseNet169 for fine tuning and training from scratch are 97.6% and 96.1% respectively. In fine tuning, we start from imagenet pre-trained weights and then train the whole network on our dataset. So, it is easy to reach the global optimum. On the other hand, we do not train the layers of the original convolutional architecture in transfer learning. So, the model may not capture all the characteristics of the dataset. That is why, accuracy in transfer learning has been found to be lower compared to the other two training methods. For example, InceptionV3 and DenseNet121 have given test accuracy of only 82% and 73.45% when using transfer learning while their test accuracy when using fine tuning were 89.1% and 95.7% respectively. ResNet50 has failed to capture the characteristics of our dataset when we apply transfer learning on it. It achieved only 24.8% test accuracy while applying this training method. In general, architectures with skip connections (ResNet50, Xception, DenseNet) have shown bad performance on the test set when we apply transfer learning on them compared to other models with no skip connection. Both the validation accuracy and the test accuracy are very high for the best performing variation of each architecture, i.e. when we fine tune an architecture. As we use dropout in the dense layers which we have added, there is very little chance of overfitting. For example, fine tuned ResNet50 and Xception have given high test accuracy of 99.5% and 98.1% respectively. Our training, validation and test set come from the same distribution. This may be a reason for such good performance. From the results in Table 3, we see that fine tuned VGG16 model performed best, achieving an accuracy of 99.53% on the test set.
We have generated graphs of accuracy vs epoch number and loss vs epoch number for each architecture. These graphs are given in Figure 7. Here, FT stands for fine-tuning the weights from imagenet pre-trained weights. We have used batches for training. That is why our accuracy and loss fluctuate a bit for all the architectures. In all the graphs, we see accuracy increasing and loss decreasing gradually with increasing epoch number. Simple sequential architecture like VGG16 show more stability in change than non sequential architectures such as ResNet50 and InceptionV3.
We generate confusion matrix for each variation of each architecture on the test set and on the validation set. A confusion matrix gives a quantitative representation of each class of image being misclassified as an image of another class. A sample confusion matrix is provided in Figure8. If the number of misclassifications between two particular classes becomes high, it indicates that we need to collect more data on those classes to properly train the CNN architecture so that it can differentiate between those two classes. For this purpose, we generate confusion matrix on our validation set for each CNN architecture. There were a lot of confusions between Healthy Plant class and Brown Spot class. By collecting more data on these two classes, we have eliminated those confusions.
Figure 8 shows the confusion matrix of VGG16 (FT) on the test set. We see from the figure that Bacterial Leaf Blight disease has not been misclassified with any disease or pest. On the other hand, one image of Brown Plant Hopper has been misclassified with one image of Healthy Plant class. BLB represents Bacterial Leaf Blight, BPH represents Brown Plant Hopper, BS represents Brown Spot, FS represents False Smut, NB represents Neck Blast, SBR represents Sheath Blight and/or Sheath Rot in Figure 8.
We have proposed deep convolutional neural network based classifier for real time rice disease and pest recognition. We have conducted a comprehensive study on rice disease and pest recognition, incorporating nine classes of rice diseases, pests and healthy plant. We have collected a lot of images of rice plants infected with various diseases and pests from the rice field in real life scenario. We have applied the knowledge of agriculture in solving rice disease classification problem. We have used various types of convolutional neural network architectures, and have implemented various training methods on each of them. We have successfully been able to distinguish between inter-class and intra-class variations of diseases and pests in rice plant in complex environment. Validation accuracy and test accuracy of most of the convolutional neural network architectures are found to be very high, because our training, validation and test set have been collected from the same site. We plan on incorporating location, weather and soil data along with the image of the diseased part of the plant to develop a comprehensive and automated plant disease detection mechanism. Our convolutional neural network architectures have large size because of a large number of parameters. We plan to put effort in order to achieve high accuracy in plant disease and pest classification with memory efficient convolutional neural network architectures for deployment purpose.
9 Credit Author Statement
Chowdhury Rafeed Rahman: Data curation, Writing- Original draft preparation, Conceptualization, Methodology. Preetom Saha Arko: Software, Data curation. Mohammed Eunus Ali: Visualization, Investigation, Supervision. Mohammad Ashik Iqbal Khan: Data curation, Methodology. Abu Wasif: validation, Writing- Reviewing and Editing. Md. Rafsan Jani: Software. Md. Shahjahan Kabir: Data curation.
We thank the authority of BRRI (Bangladesh Rice Research Institute) for supporting the research by providing us with the opportunity of collecting a lot of images of rice plant diseases in real life scenerio. We also thank RMIT University for giving us the opportunity to use their GPU server.