Unsupervised adulterated red-chili pepper content transformation for hyperspectral classification

by   Muhammad Hussain Khan, et al.

Preserving red-chili quality is of utmost importance in which the authorities demand the quality techniques to detect, classify and prevent it from the impurities. For example, salt, wheat flour, wheat bran, and rice bran contamination in grounded red chili, which typically a food, are a serious threat to people who are allergic to such items. This work presents the feasibility of utilizing visible and near-infrared (VNIR) hyperspectral imaging (HSI) to detect and classify the aforementioned adulterants in red chili. However, adulterated red chili data annotation is a big challenge for classification because the acquisition of labeled data for real-time supervised learning is expensive in terms of cost and time. Therefore, this study, for the very first time proposes a novel approach to annotate the red chili samples using a clustering mechanism at 500 nm wavelength spectral response due to its dark appearance at a specified wavelength. Later the spectral samples are classified into pure or adulterated using one-class SVM. The classification performance achieves 99 for adulterated samples. We further investigate that the single classification model is enough to detect any foreign substance in red chili pepper rather than cascading multiple PLS regression models.


page 1

page 3

page 4

page 5

page 8


Ensemble Hyperspectral Band Selection for Detecting Nitrogen Status in Grape Leaves

The large data size and dimensionality of hyperspectral data demands com...

Multi-vision Attention Networks for On-line Red Jujube Grading

To solve the red jujube classification problem, this paper designs a con...

Convolution Based Spectral Partitioning Architecture for Hyperspectral Image Classification

Hyperspectral images (HSIs) can distinguish materials with high number o...

NightVision: Generating Nighttime Satellite Imagery from Infra-Red Observations

The recent explosion in applications of machine learning to satellite im...

Online Red Packets: A Large-scale Empirical Study of Gift Giving on WeChat

Gift giving is a ubiquitous social phenomenon, and red packets have been...

Discovering and Generating Hard Examples for Training a Red Tide Detector

Currently, accurate detection of natural phenomena, such as red tide, th...

Hyperspectral Imaging to detect Age, Defects and Individual Nutrient Deficiency in Grapevine Leaves

Hyperspectral (HS) imaging was successfully employed in the 380 nm to 10...

I Introduction

Hyperspectral Imaging (HSI) is leading-edge technology which considers a wide electromagnetic spectrum range of light instead of just primary colors made from the visible range such as red, green, and blue to characterize a pixel [4]. The light striking a pixel, indeed, is divided into many different spectral bands to provide more detailed information on what is imaged. HSI provides not only the spatial information (Shape of object) of the object but also its spectral information (Type of material and chemical distribution of elements with which it is composed of). HSI exists in a 3D cube form (Hyper-Cube) containing all the spatial images in the form of stack with respect to its wavelength spectrum having coordinates , where and are spatial coordinates and is the spectral wavelength coordinate and thus each pixel of a Hyper-Cube can be interpreted as an individual spectrum.

HSI systems are usually classified in two regions: NIR (near infra-red) & MIR (mid infra-red). NIR region () detect patterns which reveals chemical and physical combination of materials while MIR region () describes rotational and vibrational motion of molecules which are highly sensitive to composition of materials [51]

. However penetration power of NIR radiations (up to several millimeters) is superior than MIR (in microns) which effects reliability and precision of multivariate analysis


HSI has been adopted in a wide range of real-world applications including food sciences, biomedical imaging, geosciences, forensic and surveillance to mention a few [2]. One of the main challenges in the HSI domain is the characteristics of the data, which typically yields hundreds of contiguous and narrow spectral bands with very high spatial resolution throughout the electromagnetic spectrum [3]. Therefore, HSI classification (HSIC) is complex and can be dominated by a multitude of classes and nested regions, than the traditional monochrome or RGB images [5].

HSI has been used in remote sensing for years but recently it has become an important and robust tool to collect information about how the light reacts with different materials and thus has been widely used in food and agriculture quality control. For example, characterizing the quality of food [36] or detection of adulterants in various food items [6]. Quality is of utmost importance in food industry, where the industry not only demands traditional techniques like preservation of food by freezing and cooling, detection of deceased food items and adulterants, it also demand modern and convenient methods to ensure the quality of food.

Adulteration can be defined as addition of constituents in food items which is forbidden by law, customs, norms and practices. Deception is one of the most frequently used practice internationally. Adulteration in food items was first reported by Thephrastus and effort to address this problem traces back to Roman civil law. Although, the reasons behind adulteration of foods are mostly economical or financial, it can lead to serious public health concerns. For examples, in , powdered paprika was found to be adulterated with lead oxide which caused death of several people in Hungary [9].

Product Adulterant Method Reference
Saffron Saffron, Mrigold, and Turmeric FT-NIR, Raman, and LIBS [19]
Onion Powder Cornstarch FT-NIR [37]
Chili Powder Sudan Dyes Raman and NIR [23]
Tumeric metanil yellow and Chalk Powder FT-NIR and Teragertz Spectroscopy [45]
Black Pepper Millet and Buckwheat NIR [42]
Paprika Tomato Skin and Brick Dust FT-NIR [20]
TABLE I: Example of spectroscopic method for detection of adulterants

Red pepper is a fruit of capsicum anum and widely used in many cuisines as spice [34]. Usually red pepper is used in powdered or crushed form, made by drying and grinding or crushing the ripped fruit. However, due to high prices of red chili, many inexpensive material are added by vendors to make more profits. Now a days, food a are charging on the basis of purported reports without any mechanism of testing the red chili pureness on run time. The experts mostly rely on observations or the samples are sent to distant laboratories to conduct testing for more accurate results. Traditional methods that has been widely used for the detection of adulteration in food items are Liquid chromatography [13], isotope ratio mass spectrometry [18], gas chromatography [43], and hyphenated mass spectroscopy [43]. All these method requires skilled analyst and careful analysis which can take days to conclude the results. This obstruction commonly creates a conflict between food authorities and vendors.

Recently, in pursuit of rapid detection of adulterants in food items, fingerprinting of human food using vibrational spectroscopy techniques [18]

such as Fourier Transform near-infrared (FT-NIR)

[20], near-infrared (NIR) [46], Raman [24], and visible near-infrared (VNIR) spectroscopy [34] is gaining interest of many researchers and industries. NIR spectra arises due to weak and broad molecular bonds and overtones, mostly associated with methine, hydroxy, and amine functional groups [17]. By using interferogram and Fourier transform techniques in acquisition of spectra, FT-NIR spectroscopes have enhance performance in wave number precision and reproducibility as compared to dispersion NIR [17]. However, the choice of instrument used is dependent on the application while considering many vital aspects like spatial and spectral resolution, accuracy and wavelength range. To detect adulterant in food items, a number of methods based on spectral imaging and chemomatric analysis has been proposed in last few decades [18]. Table I enlist few state-of-the-art works with the focus on adulteration identification in food items.

Capsaicionoids are key components of red chili, cause sensations of burning in tissues when comes in contact. Red Chili quality is graded on the basis of capsaicionoids content [26]. Jongguk Lim, et al. develop a system which determines capsaicionoids content in Korean red pepper powder. They used Visible and Near Infra Red (VNIR) spectrometer () along with first order derivative pre-treatment method and Partial Least-Squares Regression (PLSR) model to predict amount of capsaicionoids content in red paprika powder [34]. In another work, they also detect the moisture content in grounded red chili of various particle sizes by using NIR spectroscopy combined with PLSR model and concluded that the performance can be enhanced by limiting the range of particle size [35].

Aflatoxin are poisonous carcinogen produced by aspergillus. Aspergilus is common in warm, humid environment with native habitat in soil and it permeate in organic matter whenever conditions are conducive for its growth [10]. As red chili need to be dried before storing, crushing and grinding, the poor sanitary measures may results in contamination with soil born diseases like aflatoxin. There are various instance globally which reported presence of in red chili [21, 1]. H. Kalkan, et al. used multispectral imaging system with effective wavelength of to detect aflatoxin contamination in red pepper and hazelnut. They used a modified three dimensional version of linear discriminant analysis (LDA) algorithm for generation and selection of features. The developed algorithm were able to classify contaminated pepper flake with uncontaminated with accuracy of 79.17% [29]. Smita Tripathi and H.N. Mishra proposed a FT-NIR based method along with PLSR to detect the presence of ranging from to . They compared their result with techniques like high performance liquid chromatography and thin layer chromatography and postulate that developed FT-NIR based system’s efficiency is comparable with traditional arduous chemical methods [53].

As earlier discussed, red chili is preserved by drying. Traditionally, red chili is dried in sun for to days which increase risk of contamination with fungal disease, foreign particles, etc [25]. Another method used for drying red chili is hot air () which is more convenient and hygienic [54]. However, In Min Hwang, et al., found that the sun dried chili price is higher than hot air dried. Hence, they characterize the red pepper by using High Performance Liquid Chromatography (HPLC) and NIR spectroscopy with LDA and found out that sun dried chili have slightly higher values of American Spice Trade Association (ASTA) color values, free sugar, lactic acid and capsaicin [25]. Xi-YU Wu, et al. used NIR spectroscopy coupled with PLSR to detect commonly used adulterants (wheat bran, rice bran, rosin powder, and corn flour) in sichuan pepper powder. They mixed different quantities of adulterants in sichuan pepper powder and predicted different composition with prediction coefficients for Sichuan pepper powder, rice bran, wheat bran, corn flour and rosin powder respectively [56].

Most of the efforts in past utilizes HSI with PLSR for detection of adulterants in powdered materials [50]. However, PLSR is facing a lot of criticism in research community. Antonakis et al. suggest and encourage researcher to abandon PLS [7] while Rönkkö et al. recommended to avoid use of PLS as it is extremely tough to justify results [48]. In another paper Rönkkö et al., proposed a ban on the use of PLS [49]

. Beside the criticism, PLSR worked on finding the strongest relationship between provided information and PLS factors which is highly dependent on the type of adulterant added. Hence, a separate model is needed for each and every adulterant. It has also been noted that spectra for different varieties of same substance results in different variance which is explained by different number of PLS components, thus separate models are required for each variety. Previously, researcher considered and limited their models to commonly used adulterants and specific type of material, while in real-life scenario the types of adulterants cannot be limited and it is difficult to identify the between varieties specifically in powdered form.

Hence, in this study we are proposing a novel technique based on HSI system with wavelength of range

combined with different basic hyper-Cube processing techniques and one class Support Vector Machine (SVM) to detect unhygienic adulterants in grounded red chili. The proposed methodology works in three steps; first, the data has been labeled by exploiting the International Commission on Illumination (CIE) standard by annotating the class labels to chili and the adulterant materials. The characteristic red color of chilli powder appears to be the darkest in the wavelength range of

, therefore in spectral response of mixed sample at

, the bright pixels are of adulterant. Second, to remove the curse of dimensionality effect, PCA has been before building the model. finally, a single class SVM model has been trained on pure red chili and tested with adulterated samples. Most of the domestic users and food authorities are only interested in knowing whether the spices are pure or adulterated, with little or no interest in the type of adulterants added. Hence this model will be able to suffice their requirements to some extent.

The remainder of the paper is structured as follows: Experimental datasets section II details the information of samples and their preparation, acquired for this study. Methodology section III presents the theoretical aspects of data acquisition, spectral pre-processing, data annotation (one of the novel approach of this study), reducing the hyper-cube dimension and one-class SVM. Results and Discussion section IV presents the experimental evaluation of our proposed techniques and discusses opportunities and obstacles of model. Finally, conclusion sectionV summarizes the major contributions of this study and potential future research directions that can derived from this work.

Ii Experimental Datasets

In this study, experiment has been carried out on two types of chili samples grown in different origins (i.e, Kunri; Sindh, Pakistan, and Hybrid, Rajhastan, India) and are collected from open market. The red chili samples used in this study are shown in Fig. 1. To use red chili in cuisines, it has to be crushed multiple times to make it a fine powder. To make the model consistent, the similar crushing procedure has been replicated for this study. However, during this milling process the chili may get overheat due to which it can loose its natural color. To overcome the aforementioned issues, the samples were cooled at room temperature multiple times during the milling process to preserve the original texture of chili. Three commonly used adulterants wheat bran, rice bran, and saw dust were acquired from local market and cleaned from foreign matter.

(a) Kunri; Sindh, Pakistan
(b) Hybrid; Rajhastan; India
Fig. 1: Whole Chili samples analyzed in the study.

Samples were prepared by adding adulterants individually to both types of chili in range from to with increment by weight, respectively. Both red chili and adulterants were weighed separately with an electronic balance and mixed by using National mixer grinder to obtain homogenized samples. A total of samples with pure chili ( of each origin), pure adulterants ( of each adulterant), and adulterated samples ( of each adulterant) were prepared. The samples were seal, packed, labeled and kept at room temperature in order to protect from humid environment.

Iii Methodology

In this section, first we describe the mode for data acquisition for this study and preprocessing techniques, reflectance calculation, smoothing, scatter removing and dimensionality reduction, needed for spectral evaluation. Further we have discussed our two novel approaches of this research i.e; data annotation and usage of one-class SVM classifier for identifying the adulterant in red chili.

Iii-a Data Acquisition

The HSI system used in this study is shown in Fig. 2 which consists of a hyperspectral camera FX-10 (Specim, Spectral Imaging Ltd, Finland) coupled with lens (Scheiner Cinegon ). The camera is mounted on a lab scanner system which consists of three halogen lamps and a moving platform () operated by stepper motor. A computer (Dell-P46g) is connected to the camera through GigE-Vision and scanner via serial communication port. The camera has resolutions. The complete system is sealed in a dark box to avoid ambient noise. All the samples (pure & adulterated) are placed in petri dishes and leveled by surface leveler to obtain uniform surface and imaged separately. Each sample was scanned at constant speed of with exposure time . The acquired hyper-Cube consist of spectral images representing electromagnetic spectrum (radiance) of scanned materials at different wavelengths spanned from .

(a) VNIR Hyperspectral Imaging System
(b) Empirical Line method for reflectance conversion [55]
Fig. 2: Data acquisition apparatus and reflectance calculation method.

The encoded radiance in acquired Hyper-Cube was converted to reflectance spectra by using empirical line method (ELM) 1(b). Two more reference target surface with widely different brightness were required for this method; a white reference of reflectance which was placed with the sample and dark current acquired by closing camera’s shutter. Using known reflectance values and acquired image radiance, the reflectance was calculated for each wavelength by following linear equation:


where is reflectance of data cube, is the radiance captured of given sample, and are the data captured for dark and white reference, respectively. Due to the variations in size of translational platform and the sample holder, Region of Interest (ROI) needed to be segregate from the acquired hyper-Cube. To automate the process of segregating region of interest from acquired hyper-cube, a false color image was created using bands from , and wavelengths. Furthermore, several contextual (pixel connectivity) and non-contextual (thresholding) image segmentation techniques were applied to extract ROI.

Iii-B Spectral Prepossessing

Acquired spectral data is highly sensitive to physical properties of samples (temperature, surface , etc) and systematic noise (Ambient light, scattering, etc.). These noises can induce errors in acquired data and effect the reliability of build model. These errors can be avoided by standardizing the pure samples, which is often time consuming, may create artifact, expensive and sometimes physically impossible [31]. However, there are mathematical techniques available in literature which can remove the effect of errors from spectral data. However, there has not been any method proven to be a standard for removing or avoiding such errors, rather several hit and try methods have been used to investigate which method suits best according to the nature of hyper-Cube. Therefore, in this work, pre-treatment techniques such as, savitzky golay filtering, standard normal variant (SNV), and multiplicative scatter correction (MSC) were applied to data separately before building model. Savitzky golay filtering was used for smoothing spectral data with eleven points and third order polynomial fitting. MSC and SNV are usually used to remove non-uniform scattering and effect of particle size [40]. MSC works on mean spectrum of data while SNV works only on data. Hence for adulterated samples SNV was used, so that spectrum of adulterant and chili should not tangle while for pure samples, MSC was utilized to standardize the spectrum.

Fig. 3: Spectral response of Red Chili and adulterants at
(a) False Color Image of adulterated Chili.
(b) Adulterated chili spectral response at .
(c) Annotated Data
Fig. 4:

HSI data annotation process using K-Means clustering.

Iii-C Data Annotation

To develop classification model, in order to differentiate red chili from adulterants and adulterants from each other, acquired spectrum of pure adulterants should be labeled. For labeling the data, this study exploit CIE standard (colorimetric) observer model (shown in Fig 7) which represent average human charomatic response. According to model, red color appears to be Dark (negative) from . As most of the digital camera used color filter array (CFL) and demosaicing algorithms to recover image [30], this phenomena is not apparent in blue channel of image capture using digital camera. However, in HSI system hundred adjacent wavelength band can be acquired separately. Exploiting this feature of HSI system, one can observe and process the spectral response of the substance under consideration at specific wavelength. As red chili absorb blue wavelength due to its color, one can differentiate in red chili and adulterants with the acquired spectral response in blue wavelength. The most noticeable difference in red chili and adulterants reflectance is observe able at , hence image pixel at this wavelength can be grouped into two clusters.i.e. red chili & adulterants.

K-means clustering is one of the most popular clustering algorithm [28]. It works on calculating the smallest distance of each data point from centroids and assigning it to nearest centroids. The best cluster center is selected by assigning data points to randomly chosen centroid and choosing cluster center again based on current data assignment [39]. However, D Arthur et al. proposed that instead of choosing initial centroids randomly, the furthest point should be considered as initial centroid [8]. For data annotation of this experiment, spectral image at 3(b) is fed to K-means algorithm with two cluster centers. The algorithm group image cells in two clusters based on their intensity values and assigned labels to each group member. Fig. 3(c) displays the labels assigned to each pixel where black color represent adulterants in mixture while white color is for red chili.

Iii-D Dimensionality Reduction

Hyper-Cube is composed of hundreds of spectral bands covering a range of electromagnetic spectrum with very high spectral resolution narrow bands that not only improves the measurement capabilities of spectral cube but at the same time it brings challenges like storing, processing and classifying that data. With the increase in spectral bands/dimensions, the precision of classification decreases [47]. It is due to the problem of finding and learning the structure of data embedded in high dimensional space as due to the rise in number of features the Hyper-cube has, the more data points are needed to fill the space. This phenomenon is known as the curse of dimentionality and to avoid this different approaches have been used in order to reduce the dimensions of hyper-Cube [38].

(a) Red Chili PCA: of total information
(b) Saw Dust PCA: of total information
(c) Wheat Bran PCA: of total information
(d) Rice Bran PCA: of total information
Fig. 5: PCA score plot of the four materials: Red Chili, Saw Dust, Wheat Bran and Rice Bran.

Dimensionality reduction deals with complexities of large data set like hyper-Cubes and reduce its dimensionality by keeping the important features intact [27]. Supervised methods like Linear Discriminate Analysis (LDA) [11], Local Fisher Discriminate Analysis (LFDA) [33]

and unsupervised methods like Principal Component Analysis (PCA)

[22] and Maximum Noise Fraction transform (MNF) [57] are mostly used in reducing hyperspectral dimensions by projecting the original data in a lower dimensional space. To reduce the dimension of hyper-cube data used in this study, PCA has been applied before further implementation.

In this study, Support Vector Machines (SVM) algorithm is used to identify the type of adulterants in chili and as the number of classes, that needs to be classified, increases the numbers of parameters increases which in return affects the accuracy of the classification [44]. Therefore, PCA has been used to avoid the curse of dimensionality and to ensure the quality of classifying adulterants. PCA ignores the spatial information of the data and reveals the internal structure in a way that most explains the variance in data. Orthogonal transformation projects the property of hyper bands, where the first projection of red chili, saw-dust, wheat bran, and rice bran contains 4(a), 4(b), 4(c) and 4(d) of total variance respectively. Therefore, PC1 and PC2 has been selected in this study for classification purpose.

Iii-E SVM: One Class

SVM is a supervised machine learning algorithm which map input data into feature space and draw a linear decision boundary

[15]. SVM in its original form was developed as binary classifier and could only assign two labels; and to a given dataset. It classifies the data by taking into account only those training samples that lies on the boundary of the class distribution, known as support vectors, and identifying the optimal hyper-plane between two classes. Illustration of these definitions are given in Fig. 5(a).

(a) Illustration of Support Vector Machine.
(b) B. Schölkopf method for one class SVM [41]
Fig. 6: Support Vector Machine Concept.

SVM algorithm was initially designed for linear separable data, however, in real-life scenarios, the data is not always linearly separable. Boser, et al. proposed a method to create non-linear classifier by using kernel tricks [12]. They replaced each dot product with kernel function which transformed data to non-linear or high dimensional space. The classifier remained as hyper-plane in transformed space but its non-linear in input space. The choice of appropriate kernel and its parameters (width (), step size, regularization parameter (), etc.) is dependent on the application-domain and type of training data. The larger value of and small may lead to over fitting of model, other way around it may cause under fitting. Similarly, the higher value of enlarge the area of support vector and also increase the elasticity of decision boundary while the smaller value of increases maximum margin and decreases the flexibility of decision boundary. There is no single criteria to decide these parameters and the only approach is hit and trial [52].

SVM in its true nature is a binary algorithm i.e. positive and negative examples are required to train an algorithm. B. Schölkopf et al., proposed a modification in SVM algorithm for one class classification problem. They use origin as the only member of second class and draw a hyper-plane to segregate class of interest from origin with maximal margin 5(b). Therefore, in this study, one class linear (without feature transformation) as well as non-linear SVM with different kernels (polynomial, Gaussian and Gaussian radial based function (rbf)) are considered. For non-linear SVM kernel, kernel width (

) is estimated using grid search which evaluate the model for all possible values hyper-parameters in specified range and proposed best suitable value. The detailed illustration of these settings is explained in results and discussion section.

Iv Results and Discussion

In this experiment, Halogen lamps are used for the illumination of samples. Although, halogen lamp shown in Fig. 7 cover the whole range of HSI system () spectrum use in this study in contrast to LEDs and fluorescence tubes which lack continuous spectrum, but it has very low intensity of blue light 7

. This limitation cause lower signal to noise ratio (SNR) in first

bands, which can effect the performance of developed model. Therefore, initial bands are discarded and spectral range of is considered for experiments. As all pixels in a pure sample exhibit same characteristics spectra with negligible variations, therefore, each pixel is considered as a separate sample for model development. PCA is applied to spectral data of all pure samples to describe the characteristics features of spectra and number of important PC’s are selected by broken stick method[14]

which consider eigenvalue only if its value is greater than broken stick distribution.

Fig. 7: Halogen illumination Spectrum

Iv-a Detection of Red Chili

During the red chili detection, there is only a single class i.e. pure red chili, which do not need to be labeled. One class SVM algorithm with linear as well as non-linear kernels are trained on data. An important parameter , which control the upper and lower limit of training error is need to be estimated as the higher value of parameter will not incorporate all training data. On other hand, very small value of -parameter will also consider outliers in training data and decrease the testing accuracy. The value of -parameter is estimated by hit and trial method while training has been done on pure red chili and for testing purpose pure adulterant and adulterated samples are used. Fig. 8 depicts that accuracy of classifier sharply decreased as -parameter is set to be less than . The accuracy is measured while testing with pure adulterants and minimum accuracy among all test samples is considered. Hence, for this experiment the value of -parameter is considered to be .

A linear SVM model is trained on red chili spectral data with and classifier predict different samples of red chili with an accuracy of . But when pure adulterants samples are fed to classifier for prediction, the classifier was unable to distinguish between red chili and adulterants. The maximum accuracy achieved using linear SVM is with -parameter value . Therefore, non-linear one class SVM with kernel is considered. The optimal value of is found to be with -parameter using grid search method. Non-linear kernel is able to differentiate among chili and pure adulterants (rice bran, wheat bran and saw-dust) with an accuracy of . Similarly, polynomial kernel with is trained on data, where the value of -parameter is set to be . The best accuracy value i.e is achieved with . This result depict that only one class SVM with kernel is able to differentiate red chili from adulterant with sufficient accuracy.

Fig. 8: ()-parameter Vs. Accuracy
Fig. 9: Red Chili adulterated with Rice Bran
Fig. 10: Red Chili adulterated with Saw Dust
Fig. 11: Red Chili adulterated with Wheat Bran

However, in case of adulterated samples, the efficiency is sharply reduced to . The efficiency of classifier decreased with the increased in adulteration due to the limitation in data annotation process as the penetration depth of blue light is while IR wavelength can penetrate to a depth of [16]. This indicates that grains below surface particles cannot be labeled with confidence, however, IR radiations can detect their properties. Similarly, due to smaller grain size, there exist several pixels which contains particles of both i.e. red chili & adulterants. Although mixed pixel spectra is considered as anomaly, hence adulterants, in one class classification but data annotation process assigned labels based on the proportion of red chili and adulterants.

Another limitation faced during this experiment is the difference in densities of adulterants and red chili. Each gram of red chili and different adulterant have different bulk volume which results in different number of particles in acquisition process. As the HSI system do not incorporate density information explicitly and scan only the information of particles on the upper layer of sample, hence this troubles algorithm in predicting the accurate proportion of chili and adulterants. It is worth noting that all above mentioned limitations are related to data annotation process while the classifier predict pure adulterants samples with accuracy.

V Conclusion

In this research, one of the main challenge in spice industry i.e., adulteration in red chili, has been addressed. The HSI data acquired by VNIR HSI system has been pre-treated using savitzky golay filtering and standard normal variant (SNV). Data has been labeled by applying k-means clustering at spectral response due to color feature of red chili. To reduce the dimensions of hyper-cube and to increase the classification accuracy, PCA was applied to spectral data before one class SVM classification. Overall, accuracy has been achieved in case of pure red chili and a decreasing trend has been observed with the increase of adulterant in sample due to limitation in data annotation process and different wavelengths penetration depth. In order to detect adulterant types and estimate the adulteration proportion, spectral unmixing is a viable solution which will be exploited in further studies.


  • [1] G. Adegoke, A. Allamu, J. Akingbala, and A. Akanni (1996) Influence of sundrying on the chemical composition, aflatoxin content and fungal counts of two pepper varieties—capsicum annum andcapsicum frutescens. Plant foods for human nutrition 49 (2), pp. 113–117. Cited by: §I.
  • [2] M. Ahmad, M. A.Alqarni, A. M. Khan, R. Hussain, M. Mazzara, and S. Distefano (Feb. 2019)

    Segmented and non-segmented stacked denoising autoencoder for hyperspectral band reduction

    Optik-International Journal for Light and Electron Optics 180 (), pp. 370–378. External Links: Link, Document Cited by: §I.
  • [3] M. Ahmad, A. K. Bashir, and A. M. Khan (July. 2017) Metric similarity regularizer to enhance pixel similarity performance for hyperspectral unmixing. Optik-International Journal for Light and Electron Optics 140C (), pp. 86–95. External Links: Link, Document Cited by: §I.
  • [4] M. Ahmad, A. Khan, A. M. Khan, M. Mazzara, S. Distefano, A. Sohaib, and O. Nibouche (May. 2019) Spatial prior fuzziness pool-based interactive classification of hyperspectral images. Remote Sensing 11 (9). External Links: Link, ISSN 2072-4292, Document Cited by: §I.
  • [5] M. Ahmad., A. M. Khan., M. Mazzara., and S. Distefano. (Feb. 2019) Multi-layer extreme learning machine-based autoencoder for hyperspectral image classification. In

    Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP,

    pp. 75–82. External Links: Document, ISBN 978-989-758-354-4 Cited by: §I.
  • [6] M. Al-Sarayreh, M. M Reis, W. Qi Yan, and R. Klette (2018) Detection of red-meat adulteration by deep spectral–spatial features in hyperspectral images. Journal of Imaging 4 (5), pp. 63. Cited by: §I.
  • [7] J. Antonakis, S. Bendahan, P. Jacquart, and R. Lalive (2010) On making causal claims: a review and recommendations. The leadership quarterly 21 (6), pp. 1086–1120. Cited by: §I.
  • [8] D. Arthur and S. Vassilvitskii (2007) K-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035. Cited by: §III-C.
  • [9] A. S. T. Association et al. (2012) Spice adulteration—white paper. Cited by: §I.
  • [10] A. Aydin, M. E. Erkan, R. Başkaya, and G. Ciftcioglu (2007) Determination of aflatoxin b1 levels in powdered red pepper. Food control 18 (9), pp. 1015–1018. Cited by: §I.
  • [11] S. Balakrishnama and A. Ganapathiraju (1998) Linear discriminant analysis-a brief tutorial. Institute for Signal and information Processing 18, pp. 1–8. Cited by: §III-D.
  • [12] B. E. Boser, I. M. Guyon, and V. N. Vapnik (2003) A training algorithm for optimal margin classifiers. In

    Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory

    pp. 144–152. Cited by: §III-E.
  • [13] P. Botek, J. PouStka, and J. Hajslova (2007) Determination of banned dyes in spices by liquid chromatography-mass spectrometry. Czech Journal of Food Science 25 (1), pp. 17–24. Cited by: §I.
  • [14] R. Cangelosi and A. Goriely (2007) Component retention in principal component analysis with application to cdna microarray data. Biology direct 2 (1), pp. 2. Cited by: §IV.
  • [15] C. Cortes and V. Vapnik (1995) Support-vector networks. Machine learning 20 (3), pp. 273–297. Cited by: §III-E.
  • [16] A. Douplik, G. Saiko, I. Schelkanova, and V. Tuchin (2013) The response of tissue to laser light. In Lasers for Medical Applications, pp. 47–109. Cited by: §IV-A.
  • [17] J. K. Drennen, E. G. Kraemer, and R. A. Lodder (1991) Advances and perspectives in near-infrared spectrophotometry. Critical Reviews in Analytical Chemistry 22 (6), pp. 443–475. Cited by: §I.
  • [18] D. I. Ellis, V. L. Brewster, W. B. Dunn, J. W. Allwood, A. P. Golovanov, and R. Goodacre (2012) Fingerprinting food: current technologies for the detection of food adulteration and contamination. Chemical Society Reviews 41 (17), pp. 5706–5727. Cited by: §I, §I.
  • [19] S. V. Er, H. Eksi-Kocak, H. Yetim, and I. H. Boyaci (2017) Novel spectroscopic method for determination and quantification of saffron adulteration. Food analytical methods 10 (5), pp. 1547–1555. Cited by: TABLE I.
  • [20] Fast Detection of Paprika Adulteration Using FT-NIR Spectroscopy. Note: https://www.azom.com/article.aspx?ArticleID=13251Accessed: 2019-09-30 Cited by: TABLE I, §I.
  • [21] J. Gilbert (1999) Overview of mycotoxin methods, present status and future needs. Natural Toxins 7 (6), pp. 347–352. Cited by: §I.
  • [22] M. R. Gupta and N. P. Jacobson (2006) Wavelet principal component analysis and its application to hyperspectral images. In 2006 International Conference on Image Processing, pp. 1585–1588. Cited by: §III-D.
  • [23] S. A. Haughey, P. Galvin-King, Y. Ho, S. E. Bell, and C. T. Elliott (2015) The feasibility of using near infrared and raman spectroscopic techniques to detect fraudulent adulteration of chili powders with sudan dye. Food Control 48, pp. 75–83. Cited by: TABLE I.
  • [24] A. M. Herrero (2008) Raman spectroscopy a promising technique for quality assessment of meat and fish: a review. Food chemistry 107 (4), pp. 1642–1651. Cited by: §I.
  • [25] I. M. Hwang, J. Y. Choi, E. Y. Nho, G. H. Lee, N. Jamila, N. Khan, C. H. Jo, and K. S. Kim (2017) Characterization of red peppers (capsicum annuum) by high-performance liquid chromatography and near-infrared spectroscopy. Analytical Letters 50 (13), pp. 2090–2104. Cited by: §I.
  • [26] S. W. Hwang and U. Oh (2002) Hot channels in airways: pharmacology of the vanilloid receptor. Current opinion in pharmacology 2 (3), pp. 235–242. Cited by: §I.
  • [27] T. K and A. Vasuki (2018-12) Dimension reduction methods for hyperspectral image: a survey. International Journal of Engineering and Advanced Technology 8, pp. 160–167. Cited by: §III-D.
  • [28] K-means Clustering: Algorithm, Applications, Evaluation Methods, and Drawbacks. Note: https://towardsdatascience.com/k-means-clustering-algorithm-applications-evaluation-methods-and-drawbacks-aa03e644b48aAccessed: 2019-09-30 Cited by: §III-C.
  • [29] H. Kalkan, P. Beriat, Y. Yardimci, and T. Pearson (2011) Detection of contaminated hazelnuts and ground red chili pepper flakes by multispectral imaging. Computers and Electronics in Agriculture 77 (1), pp. 28–34. Cited by: §I.
  • [30] S. Kaur and R. Sharma (2015) A study on various color filter array based techniques. International Journal of Computer Applications 114 (4). Cited by: §III-C.
  • [31] B. R. Kowalski (2013) Chemometrics: mathematics and statistics in chemistry. Vol. 138, Springer Science & Business Media. Cited by: §III-B.
  • [32] J. Lammertyn, A. Peirs, J. D. Baerdemaeker, and N. Scheerlinck (2000) Light penetration properties of nir radiation in fruit with respect to non-destructive quality assessment. Cited by: §I.
  • [33] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce (2011) Locality-preserving dimensionality reduction and classification for hyperspectral image analysis. IEEE Transactions on Geoscience and Remote Sensing 50 (4), pp. 1185–1198. Cited by: §III-D.
  • [34] J. Lim, G. Kim, C. Mo, and M. Kim (2015) Design and fabrication of a real-time measurement system for the capsaicinoid content of korean red pepper (capsicum annuum l.) powder by visible and near-infrared spectroscopy. Sensors 15 (11), pp. 27420–27435. Cited by: §I, §I, §I.
  • [35] J. Lim, C. Mo, G. Kim, S. Kang, K. Lee, M. S. Kim, and J. Moon (2014) Non-destructive and rapid prediction of moisture content in red pepper (capsicum annuum l.) powder using near-infrared spectroscopy and a partial least squares regression model. Journal of Biosystems Engineering 39 (3), pp. 184–193. Cited by: §I.
  • [36] J. Lim, C. Mo, G. Kim, S. Kang, K. Lee, M. S. Kim, and J. Moon (2014) Non-destructive and rapid prediction of moisture content in red pepper (capsicum annuum l.) powder using near-infrared spectroscopy and a partial least squares regression model. Journal of Biosystems Engineering 39 (3), pp. 184–193. Cited by: §I.
  • [37] S. Lohumi, S. Lee, W. Lee, M. S. Kim, C. Mo, H. Bae, and B. Cho (2014) Detection of starch adulteration in onion powder by ft-nir and ft-ir spectroscopy. Journal of agricultural and food chemistry 62 (38), pp. 9246–9251. Cited by: TABLE I.
  • [38] W. Ma, C. Gong, Y. Hu, P. Meng, and F. Xu (2013) The hughes phenomenon in hyperspectral classification based on the ground spectrum of grasslands in the region around qinghai lake. In International Symposium on Photoelectronic Detection and Imaging 2013: Imaging Spectrometer Technologies and Applications, Vol. 8910, pp. 89101G. Cited by: §III-D.
  • [39] D. J. MacKay and D. J. Mac Kay (2003) Information theory, inference and learning algorithms. Cambridge university press. Cited by: §III-C.
  • [40] M. Maleki, A. Mouazen, H. Ramon, and J. De Baerdemaeker (2007) Multiplicative scatter correction during on-line measurement with near infrared spectroscopy. Biosystems Engineering 96 (3), pp. 427–433. Cited by: §III-B.
  • [41] L. M. Manevitz and M. Yousef (2001) One-class svms for document classification. Journal of machine Learning research 2 (Dec), pp. 139–154. Cited by: 5(b).
  • [42] C. M. McGoverin, D. J. September, P. Geladi, and M. Manley (2012) Near infrared and mid-infrared spectroscopy for the quantification of adulterants in ground black pepper. Journal of Near Infrared Spectroscopy 20 (5), pp. 521–528. Cited by: TABLE I.
  • [43] J. C. Moore, J. Spink, and M. Lipp (2012) Development and application of a database of food ingredient fraud and economically motivated adulteration from 1980 to 2010. Journal of Food Science 77 (4), pp. R118–R126. Cited by: §I.
  • [44] T. Moughal (2013) Hyperspectral image classification using support vector machine. In Journal of Physics: Conference Series, Vol. 439, pp. 012042. Cited by: §III-D.
  • [45] K. Nallappan, J. Dash, S. Ray, and B. Pesala (2013) Identification of adulterants in turmeric powder using terahertz spectroscopy. In 2013 38th International Conference on Infrared, Millimeter, and Terahertz Waves (IRMMW-THz), pp. 1–2. Cited by: TABLE I.
  • [46] B. G. Osborne (2006) Near-infrared spectroscopy in food analysis. Encyclopedia of analytical chemistry: applications, theory and instrumentation. Cited by: §I.
  • [47] R. Rojas (2015) The curse of dimensionality. Cited by: §III-D.
  • [48] M. Rönkkö and J. Evermann (2013) A critical examination of common beliefs about partial least squares path modeling. Organizational Research Methods 16 (3), pp. 425–448. Cited by: §I.
  • [49] M. Sarstedt, J. F. Hair, C. M. Ringle, K. O. Thiele, and S. P. Gudergan (2016) Estimation issues with pls and cbsem: where the bias lies!. Journal of Business Research 69 (10), pp. 3998–4010. Cited by: §I.
  • [50] I. Singh, P. Juneja, B. Kaur, and P. Kumar (2013) Pharmaceutical applications of chemometric techniques. ISRN Analytical Chemistry 2013. Cited by: §I.
  • [51] D. Sun (2009) Infrared spectroscopy for food quality analysis and control. Academic Press. Cited by: §I.
  • [52] A. Tarigan, R. Dewi Agushinta, A. Suhendra, and F. Budiman (2017) Determination of svm-rbf kernel space parameter to optimize accuracy value of indonesian batik images classification.. JCS 13 (11), pp. 590–599. Cited by: §III-E.
  • [53] S. Tripathi and H. Mishra (2009) A rapid ft-nir method for estimation of aflatoxin b1 in red chili powder. Food control 20 (9), pp. 840–846. Cited by: §I.
  • [54] T. Tunde-Akintunde (2011) Mathematical modeling of sun and solar drying of chilli pepper. Renewable energy 36 (8), pp. 2139–2145. Cited by: §I.
  • [55] F. Van der Meer (1994) Calibration of airborne visible/infrared imaging spectrometer data (aviris) to reflectance and mineral mapping in hydrothermal alteration zones: an example from the “cuprite mining district”. Geocarto international 9 (3), pp. 23–37. Cited by: 1(b).
  • [56] X. Wu, S. Zhu, H. Huang, and D. Xu (2017) Quantitative identification of adulterated sichuan pepper powder by near-infrared spectroscopy coupled with chemometrics. Journal of Food Quality 2017. Cited by: §I.
  • [57] N. Yokoya and A. Iwasaki (2010) A maximum noise fraction transform based on a sensor noise model for hyperspectral data. In Proc. 31th Asian Conf. Remote Sens., Cited by: §III-D.