DeepAI
Log In Sign Up

Physico-chemical properties extraction from the fluorescence spectrum with 1D-convolutional neural networks: application to olive oil

03/14/2022
by   Francesca Venturinia, et al.
ZHAW
1

The olive oil sector produces a substantial impact in the Mediterranean's economy and lifestyle. Many studies exist which try to optimize the different steps in the olive oil's production process. One of the main challenges for olive oil producers is the ability to asses and control the quality during the production cycle. For this purpose, several parameters need to be determined, such as the acidity, the UV absorption or the ethyl esters content. To achieve this, samples must be sent to an approved laboratory for chemical analysis. This approach is expensive and cannot be performed very frequently, making quality control of olive oil a real challenge. This work explores a new approach based on fluorescence spectroscopy and artificial intelligence (namely, 1-D convolutional neural networks) to predict the five chemical quality indicators of olive oil (acidity, peroxide value, UV spectroscopic parameters K_270 and K_232, and ethyl esters) from simple fluorescence spectra. Fluorescence spectroscopy is a very attractive optical technique since it does not require sample preparation, is non destructive, and, as shown in this work, can be easily implemented in small and cost-effective sensors. The results indicate that the proposed approach gives exceptional results in the quality determination and would make the continuous quality control of olive oil during and after the production process a reality. Additionally, this novel methodology presents potential applications as a support for quality specifications of olive oil, as defined by the European regulation.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

01/10/2023

Dataset of Fluorescence Spectra and Chemical Parameters of Olive Oils

This dataset encompasses fluorescence spectra and chemical parameters of...
07/13/2020

Analyzing Raman Spectral Data without Separability Assumption

Raman spectroscopy is a well established tool for the analysis of vibrat...
08/18/2017

Deep Convolutional Neural Networks for Raman Spectrum Recognition: A Unified Solution

Machine learning methods have found many applications in Raman spectrosc...
07/29/2021

Semi-supervised Learning for Data-driven Soft-sensing of Biological and Chemical Processes

Continuously operated (bio-)chemical processes increasingly suffer from ...
10/31/2020

A review of neural network algorithms and their applications in supercritical extraction

Neural network realizes multi-parameter optimization and control by simu...

1 Introduction

Determining the quality of olive oil is an expensive and complex procedure that requires a chemical analysis by accredited laboratories and an organoleptic evaluation by accredited testing panels. For producers it is thus impossible to determine olive oil quality effectively and frequently enough during the production process. This is important as olive oil chemical composition changes dramatically with time Gómez-Coca et al. (2016) due to, for example, storage and temperature conditions. Chemical analysis are complex, time consuming, and require expensive equipment and scientific training that are not available to producers. The chemical parameters, the procedures for their determination (methods ranging from titration to gas chromatography), as well as the limiting value for each olive oil quality class are specified in the European regulation 5 and amendment 4

. These regulations provide a decision tree for the verification if an olive oil class is consistent with the declared quality. This work describes a new technology based on a low-cost fluorescence sensor

Venturini et al. (2021) and specificially designed one-dimensional convolutional neural networks Michelucci (2018) that makes the chemical analysis low-cost, possible on-site and without the need of having scientifical training or expensive equipment. All the chemical parameters necessary for the determination of the olive oil quality (Acidity, Peroxide Value, , and Ethly esters) can be determined with one single measurement that is done on one unprepared and undiluted oil sample in less than one second. Thus making continuous olive oil quality control a reality for even small producers.

The challenge of determining olive oil quality is fundamental, as olive oil plays an important role in the cultural and culinary heritage of the Mediterranean countries, and its demand has grown in the latest years to other regions of the world. The growing interest, particularly in its highest quality grade, extra virgin olive oil (EVOO), is due to its high nutritional value, its richness in bioactive molecules Serrano et al. (2021), and its importance to our health due to its content of anti-inflammatory and antioxidant substances. For these reasons, extra virgin olive oil (EVOO) is a fundamental ingredient of the dietary pattern known worldwide as the ”Mediterranean diet”, which has been associated with important health benefits, such as the reduction of the prevalence of cardiovascular and metabolic diseases Uylaşer and Yildiz (2014); Fabiani (2016); Gorzynik-Debicka et al. (2018).

Fluorescence spectroscopy has attracted significant research efforts in the last years, as it offers a rapid, cost-efficient and at the same time sensitive technique to investigate the properties of vegetable oils Karoui and Blecker (2011); Kongbonga et al. (2011); Sikorska et al. (2012). Several fluorescent compounds are naturally present in olive oil, like pigments such as chlorophyll and beta-carotene, phenolic compounds, such as tocopherol, and primary and secondary oxidation products Martín-Tornero et al. (2021). These compounds are related to the quality criteria established in the European regulation. It is therefore of great importance to develop methods for extracting those physico-chemical information from fluorescence spectra. The extraction of information from the spectral data can be a difficult task depending on the type of data acquired, which may range from a single spectrum to the more complex excitation emission matrices (EEMs), synchronous scanning data Skoog et al. (2017) or near-infrared spectroscopy Yuan et al. (2020)

. Typical approaches consist in multivariate analysis techniques and classification methods, like for example, principal component analysis (PCA), partial least square regression (PLS), and PLS discriminant analysis (PLS-DA) to mention only a few.

The use of artificial neural networks (ANN) is known to be a useful tool, particularly because it does not require a pre-processing of the data or a dimensionality reduction Michelucci (2018). Several reviews describe the application of statistical and machine learning methods, including ANN, to the analysis and quality determination of olive oil Sikorska et al. (2014); Zaroual et al. (2021); Meenu et al. (2019); Gonzalez-Fernandez et al. (2019)

. Feed-forward neural networks have been up to now successfully employed for classification purposes starting from fluorescence data

Venturini et al. (2021), but do not offer sufficient flexibility for more complex tasks that analyse data that have some kind of spatial structure (like two-dimensional images or one-dimensional optical spectra). To address this issue, various architecures, as vision transformers or convolutional neural networks, have been applied to the classification problem of vegetable oils Zhao et al. (2022) with fluorescence data.

A neural network architecture more efficient with one-dimensional input data is the one-dimensional convolutional neural network (1D-CNN) one, as recent works has shown for spectroscopic classification Acquarelli et al. (2017), electrocardiography real-time classification Kiranyaz et al. (2015), for chemometric analysis from, for example, near-infrared reflectance spectra, and near- and mid-infrared absorption spectra Malek et al. (2018).

By drastically reducing the requirement on the measuring hardware and on the quality of data, this work presents a novel method to extract the physico-chemical properties relevant for the quality characterization of virgin olive oil from fluorescence spectra using 1D-CNN to fluorescence spectra. The spectra need to be acquired with a very simple and compact sensor from undiluted and unprepared samples. To the best of our knowledge, this is the first time that all the key parameters are extracted simultaneously, without pre- and post-processing of the data from a simple fluorescence spectrum. The limitations and further development possibilities are also discussed in the conclusions.

The contributions of this paper are four. Firstly, it describes an approach to extract the five physico-chemical characteristics relevant for the determination of olive oil’s quality from one single fluorescence measurement that can be done with a low-cost sensor in less than a second. The method is described in detail with guidelines and criteria for the implementation. Secondly this approach does not require a technical training to use once the neural network has been trained and therefore highlight the high impact and applicability of this approach in the olive oil industry. Thirdly, by using a sensor based on low-cost components this approach highlight a highly probable democratisation of olive oil quality control. Finally, the method is demonstrated by the application on a dataset of Spanish oils and shows, for the first time, that is possible to compete for quantitative analysis with complex chemical analysis, for example, chromatography, using a simple and fast optical measuring method supported by convolutional neural networks in one dimension.

2 Materials and Methods

2.1 Olive Oil Samples

In this study 22 virgin olive oils of three qualities were investigated: extra virgin olive oil (EVOO), virgin olive oil (VOO), and lampante olive oil (LOO). The oils were provided by the producer Conde de Benalúa, Granada, southern Spain, from the 2019-2020 harvest. All the samples were analyzed by accredited laboratories for the chemical and organoleptic properties according to the current European regulation 5; 4. The selected properties relevant to this study are listed in Table 1.

Label Acidity Peroxide value Ethyl esters Quality
(%) (mEq O/kg) (mg/Kg)
D03 0.35 8.4 0.123 1.435 26 VOO
D04 0.34 8.6 0.108 1.403 40 VOO
D05 0.36 10.3 0.112 1.44 18 VOO
D06 0.31 9.2 0.151 1.484 18 VOO
D07 0.50 8.9 0.150 1.537 47 VOO
D08 0.40 8.5 0.158 1.546 25 VOO
D19 0.25 4.9 0.13 1.540 10 EVOO
D20 0.26 4.6 0.14 1.540 10 EVOO
D35 0.17 6.4 0.12 1.63 8 EVOO
D38 0.16 6.4 0.12 1.63 9 EVOO
D45 0.17 4.9 0.12 1.63 7 EVOO
D46 0.18 5.0 0.13 1.63 8 EVOO
D47 0.18 5.2 0.13 1.64 16 EVOO
D49 0.9 9.9 - - - LOO
D51 2.16 - - - - LOO
D52 1.78 22 - - - LOO
D53 0.7 8.7 - - - LOO
D64 0.2 7.1 0.13 1.63 29 VOO
D73 0.2 8.9 0.14 1.66 15 EVOO
D77 0.24 10.4 0.13 1.74 26 VOO
D81 0.16 4.9 0.12 1.63 9 EVOO
D92 0.18 5 0.17 1.91 15 EVOO
Table 1: List of the olive oils samples analyzed in this study including selected physico-chemical characteristics. EVOO: extra virgin olive oil, VOO: virgin olive oil, LOO: lampante olive oil.

Following the European regulation 5 and its amendments 4 the parameters used to determine the quality of the olive oil are shown schematically in Fig. 1. The same parameters are investigated in this study. Note the parameter is not considered in this study since the measured values were almost identical in all the samples within the experimental error in the measurement from the accredited laboratories.

Figure 1: Sequence of parameters to be analysed for the verification of olive oil quality. Adapted from 5; 4.

2.2 Instrumentation

The fluorescence spectra were taken with a sensor that has a very simple and compact design, schematically shown in Figure 2. The excitation light is provided by an excitation UV LED. The LED can be exchanged. In this study, three wavelengths were investigated: 340 nm, 365 nm, and 395 nm. These excitation wavelengths were chosen because they correspond to an absorption maxima in the absorption band of the fluorophores present in olive oil, such as chlorophylls Ferreiro-González et al. (2017); Torreblanca-Zanca et al. (2019); Borello and Domenici (2019). The oil samples were placed into commercial transparent 4 ml glass vials, taking care that no headspace was present to reduce oxidation. The fluorescence is collected by a miniature spectrometer (STS-Vis, Ocean Optics, USA) placed at 90 with respect to the LED to avoid the excitation light transmitted by the sample to reach the spectrometer. Both the LED driver and the spectrometer are controlled by a Raspberry Pi. The details of the device are reported in Venturini et al. (2021).

Figure 2: Schematics of the fluorescence sensor. Blue: excitation light, red: fluorescence light.

All the measurements in this work were performed on undiluted samples. Although fluorescence in olive oil is subjected to the inner filter effect Skoog et al. (2017)

, the problem is not relevant for the analysis discussed in this work. In fact, the fluorescence is intense enough that the strong absorption does not decrease the signal-to-noise ratio, and possible sample-dependent effects are learned and compensated by the artificial neural network model. For each olive oil sample, 20 spectra were taken, each acquired with 1 second integration time. All the spectra were acquired under identical conditions (illumination intensity, integration time, and geometry) to be able to quantitatively compare the different intensities.

2.3 Dataset Preparation

Since the oils were measured by different laboratories and are of different qualities, the amount of data available per oil varies. For example for some LOO oils like D49 or D52, only the acidity and peroxide value were measured. If the value of the parameter is missing, such a sample was not considered for the training and test of the ANN. Therefore, the number of oils available for the estimation of the chemical parameters depends on the parameter itself. The number of samples considered for each parameter is listed in Table

2.

Parameter Number of samples
Acidity 22
Peroxide value 21
18
18
Ethyl esters 18
Table 2: Number of olive oils samples used for the training and test of the CNN for for each parameter.

All the spectra are normalized after the dark background is subtracted so that each of the spectra has an average of 0 and a standard deviation of 1.

2.4 Convolutional Neural Network Model

The model developed for this work is shown in Figure 3

and consists of a one-dimensional convolutional neural network (1D-CNN) with one convolutional layer, followed by a max-pooling and a second convolutional layer with finally two dense layers and an output layer with one single neuron with the identity activation function. This choice was inspired by previous studies, where 1D-CNNs with two or three convolutional layers were applied to different spectroscopic data, as for example reflectance spectra and Raman spectra

Malek et al. (2018); Zhang et al. (2019); Liu et al. (2018). The idea behind the sequence of layers is that the first layer extracts rough data patterns, and the subsequent layers learn more high-level abstractions. A convolutional layer is characterized by the number of filters and their size. During the 1-D convolution operation, each filter is convolved across the length of the input array, computing the dot product between the filter entries and the input, producing a 1-dimensional array (called feature map) for each of the filters Michelucci (2019).

In a CNN the learnable parameters are the filter themselves that are learned by backpropagation

Michelucci (2019); LeCun et al. (1989); Gu et al. (2018).

Figure 3: A schematic representation of the 1D-CNN used in this paper. The blue layers are convolutional ones, the green max pooling layers and the yellow marked ones are dense layers. The output layer has 1 neuron with the identity activation function.

The parameters varied and tested in this work were the number of filters in the first convolutional layer (4 and 6), the number of filters in the second convolutional layer (4 and 6), the pooling size (8 and 16), the number of epochs (5000, 10000) and the mini-batch size (8, 16 and 64).

The size of the filters and their numbers was chosen based on the spectra and system characteristics. Previous studies suggest that the number of expected features contained in the fluorescence spectra of olive oils is of the order of 4: possible examples are the height of the main fluorescence peak, its width, the area under the peak, and area under the second fluorescence peak Torreblanca-Zanca et al. (2019); El Orche et al. (2020)

. For this reason, the number of filters to test was chosen to be 4 or 6. The CNN architecture that will be selected with the hyper-parameter tuning process is expected to have a number of features used for regression (the number of feature maps, or in other words the output of the second convolutional layer) consistent with the literature. Additionally, since the spectrometer resolution is of ca. 30 pixels, the size of the filters was chosen to be 40. This reflects the fact that spectral features with a bandwidth smaller than the resolution of the spectrometer are convolved with the instrument response function. Choosing a size of 40 pixels for the filters prevents the network from considering much too granular information that the spectrometer cannot extract due to its resolution, with the additional positive effect that overfitting will be reduced. The layers are designed to perform feature extraction, and indirectly a dimensionality reduction, so to extract a very low number of features, by doing first max pooling and then a second convolution operation with filters of half the size of the first convolution. At the end, two small dense layers have the task to perform the regression to finally extract the chemical parameter selected.

2.5 Metrics, performance evaluation and validation

The metrics used to evaluate the model performances are two: the mean squared error (MSE) and the mean absolute error (MAE). The MSE was used as loss function for the training of the neural networks

Michelucci (2018), while the MAE was used to determine the prediction performance of the neural network. Indicating the expected (true) value of the parameters for the spectrum and the predicted value from the neural network with and respectively, the two metrics can be expressed with the following formulas:

MSE (1)
MAE

where is the number of spectra composing the dataset ( is the product of 20 repetitions for each of the oils measured). Since the dataset is small, a leave-one-out cross-validation approach Michelucci and Venturini (2021) was used to determine the generalisation properties of the network. In such an approach the (20) spectra of one single oil are removed from the dataset and used for validation, while the network is trained on the spectra of all remaining oils. This procedure is repeated for each oil, therefore resulting in values of the metrics evaluated all the oils. The results reported in this paper are thus the average and standard deviation of values. A risk of the leave-one-out cross-validation is that the neural network may simply learn to predict the value of the parameter corresponding to the oil left out for all the oils. Therefore, it is quite important to always check training predictions to make sure that evaluated on the training and validation dataset are comparable. For each of the in the leave-one-out cross-validation two models during training were saved: the one with the lowest value of the loss function evaluated on the validation set (the left out oil), and the one with the lowest value of the loss function on the training set (with ). The one that showed comparable values for for training and validation dataset was then chosen.

To choose which set of hyper-parameters (number and size of filters, pooling size, epochs, etc.) normally one would select the network parameters that give the lowest value of the chosen metric (in this case

on the validation dataset). However, this approach cannot be used directly here, as there is some variability (measured by the variance of the MAE

) within the results and many of the calculated averages overlap within one standard deviation. Therefore, is important to determine if the different models in the hyper-parameter-tuning phase give results that are statistically different. This can be checked with a -test has described in detail in Appendx LABEL:app:stat. The results showed that changing the number of the filters and their size gives results that are not significantly different , therefore by using the Occam’s razor decision criteria Hiroshi the simplest network was chosen for the final runs showed in this paper. The chosen network has 6 filters of size 40 in the first convolutional layer, and 4 filters with size 20 in the second convolutional layer. A decreasing number of filters in the first and second layers (6 and then 4 respectively) was chosen to facilitate a progressive and more stable feature extraction process Michelucci (2018). Finally, a pooling size of 8 and a dropout rate of 0.5 were taken.

10000 epochs produced better results than 5000 consistently, therefore the former value was chosen. The mini-batch size did not influenced the results in any discernible fashion for the model that had the lowest value of the loss function evaluated on the validation dataset, therefore the value of 64 was chosen in that case. For the models that had the lowest value of the loss function evaluated on the training dataset, a mini-batch of 16 was chosen, as it showed the best and most stable results.

3 Results and discussion

3.1 Fluorescence spectra of olive oil

The fluorescence signals at 340 nm were very weak and are hardly detectable with the simple device used in this study. For this reason, they are not reported here. The raw fluorescence spectra of all the oils obtained with excitation at 365 nm and at 395 nm are shown in Figure 4. For clarity, the spectra are shown divided into the three quality classes EVOO, VOO, and LOO. Each curve of Figure 4 shows one single spectrum after background subtraction, without averaging or smoothing.

Figure 4: Fluorescence emission spectra of the measured olive oils divided in the quality classes EVOO, VOO and LOO. On the left: spectra obtained with excitation at 365 nm; on the right: spectra obtained with excitation at 395 nm. Each curve shows a single spectrum without averaging or smoothing.

The fluorescence spectrum of all oils is characterized by a strong intensity in the region between 650 nm and 750 nm, with an intense peak at ca. 678 nm and a weaker broader one at ca. 722 nm, typical of chlorophyll and pheophytins Hernández-Sánchez et al. (2017); Mishra et al. (2018); Baltazar et al. (2020); Galeano Díaz et al. (2003). The strongest peak, however, shows variations in the spectra position and intensity towards higher wavelengths, which are particularly significant in LOOs. These variations are consistent with previous results Torreblanca-Zanca et al. (2019). The spectra obtained with excitation at 365 nm and 395 nm are very similar, with slighter higher fluorescence intensities for 395 nm excitation. This is consistent with the stronger absorption expected around 400 nm Torreblanca-Zanca et al. (2019); Borello and Domenici (2019). Noticeably, the fluorescence intensity below 650 nm is present only in spectra obtained with excitation at 365 nm and is characterized by a weaker absorption peak at ca. 525 nm, previously attributed to vitamin E Kyriakidis and Skarkalis (2000).

3.2 Artificial Neural Networks Results

To analyze the performance of the 1D-CNN, the predicted values of the parameters were first plotted against the true values. The results are illustrated in Fig. 5. The grey area in each panel marks the uncertainty on the true values due to the experimental error, calculated as average of the error reported by the accredited laboratory on the measured value. The yellow area marks the range of acceptability for EVOO.

Figure 5: Comparison of the predicted and true values for all the parameters. Panel A) acidity, panel B) peroxide value, panel C) , panel D) and panel E) ethyl esters. The solid line corresponds to predictions equal to the true labels. The grey area illustrates the experimental error on the true values. The yellow area marks the range of acceptability for EVOO.

Fig. 5 panel A) shows that the 1D-CNN can predict the acidity exceptionally well, with the exception of two LOO, D51 and D52, which have values well above the 0.8% limit for EVOO. This can be easily understood due to the lack of samples from which the ANN can learn for acidity values above 1%: since the cross-validation is performed with a leave-one-out method, the ANN has only one single oil to learn from for acidity values above 1%.

Fig. 5 panel B) shows the results for the prediction of the peroxide value. Also in this case the 1D-CNN can predict the value of the parameter exceptionally well. With exception of the LOO D52 and two other oils, all the predictions are within the average measurement error.

In panels C) an D) of Fig. 5 the predictions for the two UV-spectroscopy parameters and are shown. For these two parameters, the experimental error is much larger, which means the labels used in the training phase are affected by an error. The predictions nevertheless remain very well within the grey area showing that the 1D-CNN can learn also in these cases to predict both UV-spectroscopy parameters within experimental error.

Finally, panel E) shows the performance for the prediction of the ethyl esters. Here the 1D-CNN correctly predicts several oils but has more difficulties in the prediction of others. The authors attribute part of the problem to the limited number of oils, but also to the uncertainty of the labels. Differently from the other parameters, the ethyl esters measured by the accredited laboratories were reported with errors ranging from 2 to 8 mg/kg, and in some cases without error. Also, for the ANN to learn from the spectra, the parameter must possess a direct or indirect physico-chemical signature in the fluorescence. Due to the simplicity of the sensor of this study, the fluorescence signature may be insufficiently strong or clear. Nevertheless, the method described here can give a fast and inexpensive qualitative indication of the ethyl esters without the use of gas chromatography.

The analysis at 365 nm is very similar to the one performed at 395 nm, suggesting that similar information is contained in the spectra. The use of both 365 nm and 395 nm spectra was found to be more prone to overfitting without improving the prediction performance.

The results can be quantified by calculating the metric and its standard deviation , evaluated with leave-one-out cross-validation on both the training and the validation dataset. The results for all the parameters can be found in Table 3. In the table are also reported the average error, calculated as the MAE divided by the true label for every single oil and then averaged for all the oils, and the label error, calculated as the experimental error divided by the true label for every single oil and then averaged for all the oils.

Parameters
Average
error (%)
Label
error (%)
Acidity (%) 0.10 0.05 0.12 0.35 10 8
Peroxide Value (mEqO2/Kg) 1.01 0.65 1.31 3.19 12 17
0.008 0.003 0.010 0.013 7 15
0.03 0.02 0.04 0.04 2.5 13
Ethyl Esters (mg/Kg) 3.1 1.6 3.6 4.3 23 28
Table 3: Performance comparison the different neural network architectures. indicates the accuracy, its mean, and its standard deviation evaluated over 10 different splits. T: Training, V: Validation.

Comparing the for the training and for the validation shows that the chosen models were robust and did not incur in the risks associated with the leave-one-out cross-validation described. Additionally, Table 3 shows that the average error in all cases is lower (for the parameters peroxide value and ethyl esters) or much lower (for the parameters and ) than the experimental error from the measurements of the accredited laboratories. Only for the acidity, the average error in the prediction is slightly higher then the label error.

The results illustrated in Table 3 demonstrate that from a simple fluorescence spectrum, acquired with a simple and compact sensor, it is possible to predict within the typical experimental error all the chemical parameters relevant for the quality assessment of olive oil. Note that the experimental error that has been used in Table 3 is the average of the errors provided by the accredited laboratory, thus larger errors occurs quite frequently. The values in Table 3 for the label error column should therefore be considered an optimistic evaluation of the typical error. The limitations to the performance observed in this study are due to the limited number of oils available for the training and the distributions of the values of the parameters (most clearly seen for the acidity, where only two oils have values in upper range). On the other hand, it must be noted that the single origin of the olive oil samples, and thus their similar chemical characteristics, makes the task of extraction of chemical parameters somewhat easier. For a more heterogeneous dataset of olive oils it is expected that a more complex architecture will be necessary, as well as a larger dataset.

4 Conclusions

The results in this paper show clearly how the proposed method could substitute a more complex chemical analysis for regular quality assessment and help olive oil producers in keeping the quality of their oils under continuous control. The 1D-CNN used in this work was designed to account for the sensor characteristics (e.g., resolution) and the knowledge of the problem (e.g., expected number of features in the spectrum). As a result, the method has shown a very promising performance: from the simple fluorescence spectra it is possible to predict, within the typical experimental errors, all the five physico-chemical characteristics necessary for quality assessment of olive oil. Of course one should note that the dataset size in this study is small and, therefore, the results should only be considered as an indication of the potential of the method. Naturally, a larger dataset would allow a more complete analysis of the generalisation properties of such models when applied to olive oils optical spectra. Nonetheless this method is extremely cheap, fast and can be done by the producers themselves on-site practically without any scientific training, except knowing how to operate a computer and put oil in a vial.

The future potential of this approach is very exciting. For example by having multiple samples from multiple years, and using meteorological information about the geographic location of production one could correlate quality with information as amount of precipitations, temperature and so on. This would open the possibility of predicting quality based on external factors, probably one of the greatest challenge in the olive oil economy.

As briefly mentioned one of the challenge to be solved in the future is the application of this approach to olive oil samples coming from different producers, different geographical locations or from harvests of different years. It is to be expected that the chemical signatures in the phosphorescence spectra will not be similar anymore within those subgroups, making the prediction of the parameters a much greater challenge. In this case, more complex 1D-CNN architectures and larger datasets will be necessary to keep into account the heterogeneities in the olive oil samples and solve this task with the same degree of performance. Preliminary results by the authors for 1D-CNN with multiple input branches (by using producer information for example) indicates great potential in addressing this challenge. Such architectures should be able to adapt to this data complexity and should deliver a similar performance, if enough data is of course available.

Finally, this approach is of course not limited to olive oils but can be extended to other substances, making the results described here a very promising indication of what could be achieved through one-dimensional convolutional neural networks applied to optical spectra.

5 Funding

This work was supported by the projects: “VIRTUOUS” funded by the European Union’s Horizon 2020 Project H2020-MSCA-RISE-2019 Grant No. 872181; ”SUSTAINABLE” funded by the European Union’s Horizon 2020 Project H2020-MSCA-RISE-2020 Grant No. 101007702; “Project of Excellence” from Junta de Andalucia-FEDER-Fondo de Desarrollo Europeo 2018. Ref. P18-H0-4700.

Appendix A Statistical testing of equivalence of averages

Given two sets of hyper-parameters, indicated here with the subscripts 1 and 2, one can test the equality of the two means of the MAE, and respectively, by using the -statistic Hogg et al. (1977)

. The formulas used in this paper are based on the ones for confidence intervals for the difference of the means when the variances are unknown and the sample size is relatively small. Note that the

-statistics technically works when one deals with normal distributions. In general, the MAE values from the leave-one-out cross-validation approaches have an unknown distribution. However, since one is considering the average, thanks to the central limit theorem, one can assume that the distribution of

is approximated by a normal distribution (at least one that is not too skewed) and therefore the choice of this approach is justified

Hogg et al. (1977). is of the order of 20 (see Table 1

), a number typically considered not large enough for the central limit theorem. Nevertheless, being close to the suggested value of 30, it should give a useful estimate of the statistical significance of the average difference between different sets of hyperparameters. The null-hypothesis

that the two means are equal is rejected if the observed value of

(2)

where

(3)

is larger than Hogg et al. (1977) (right-trail probability of size for the -distribution with degrees of freedom, or in other words the value that satisfy that the probability ) for some chosen value of . For this work was chosen.

References

  • J. Acquarelli, T. van Laarhoven, J. Gerretzen, T. N. Tran, L. M. Buydens, and E. Marchiori (2017) Convolutional neural networks for vibrational spectroscopic data analysis. 954, pp. 22–31. Cited by: §1.
  • P. Baltazar, N. Hernández-Sánchez, B. Diezma, and L. Lleó (2020) Development of rapid extra virgin olive oil quality assessment procedures based on spectroscopic techniques. 10 (1), pp. 41. Cited by: §3.1.
  • E. Borello and V. Domenici (2019) Determination of pigments in virgin and extra-virgin olive oils: a comparison between two near uv-vis spectroscopic techniques. 8 (1), pp. 18. Cited by: §2.2, §3.1.
  • [4] (2013) Commission implementing regulation no 1348/2013 of december 17 2013. 338, pp. 31–67. Cited by: §1, Figure 1, §2.1, §2.1.
  • [5] (1991) Commission regulation (eec) no. 2568/91 of 11 july 1991 on the characteristics of olive oil and olive-residue oil and on the relevant methods of analysis official journal l 248, 5 september 1991. 248, pp. 1–83. Cited by: §1, Figure 1, §2.1, §2.1.
  • A. El Orche, M. Bouatia, and M. Mbarki (2020) Rapid analytical method to characterize the freshness of olive oils using fluorescence spectroscopy and chemometric algorithms. 2020. Cited by: §2.4.
  • R. Fabiani (2016) Anti-cancer properties of olive oil secoiridoid phenols: a systematic review of in vivo studies. 7 (10), pp. 4145–4159. Cited by: §1.
  • M. Ferreiro-González, G. F. Barbero, J. A. Álvarez, A. Ruiz, M. Palma, and J. Ayuso (2017) Authentication of virgin olive oil by a novel curve resolution approach combined with visible spectroscopy. 220, pp. 331–336. Cited by: §2.2.
  • T. Galeano Díaz, I. Durán Merás, C. A. Correa, B. Roldán, and M. I. Rodríguez Cáceres (2003) Simultaneous fluorometric determination of chlorophylls a and b and pheophytins a and b in olive oil by partial least-squares calibration. 51 (24), pp. 6934–6940. Cited by: §3.1.
  • R. B. Gómez-Coca, G. D. Fernandes, M. del Carmen Pérez-Camino, and W. Moreda (2016) Fatty acid ethyl esters (faee) in extra virgin olive oil: a case study of a quality parameter. 66, pp. 378–383. External Links: ISSN 0023-6438, Document, Link Cited by: §1.
  • I. Gonzalez-Fernandez, M. Iglesias-Otero, M. Esteki, O. Moldes, J. Mejuto, and J. Simal-Gandara (2019) A critical review on the use of artificial neural networks in olive oil production, characterization and authentication. 59 (12), pp. 1913–1926. Cited by: §1.
  • M. Gorzynik-Debicka, P. Przychodzen, F. Cappello, A. Kuban-Jankowska, A. Marino Gammazza, N. Knap, M. Wozniak, and M. Gorska-Ponikowska (2018) Potential health benefits of olive oil and plant polyphenols. 19 (3), pp. 686. Cited by: §1.
  • J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, et al. (2018) Recent advances in convolutional neural networks. 77, pp. 354–377. Cited by: §2.4.
  • N. Hernández-Sánchez, L. Lleó, F. Ammari, T. R. Cuadrado, and J. M. Roger (2017) Fast fluorescence spectroscopy methodology to monitor the evolution of extra virgin olive oils under illumination. 10 (5), pp. 949–961. Cited by: §3.1.
  • [15] S. Hiroshi External Links: Link Cited by: §2.5.
  • R. V. Hogg, E. A. Tanis, and D. L. Zimmerman (1977) Probability and statistical inference. Vol. 993, Macmillan New York. Cited by: Appendix A.
  • R. Karoui and C. Blecker (2011) Fluorescence spectroscopy measurement for quality assessment of food systems—a review. 4 (3), pp. 364–386. Cited by: §1.
  • S. Kiranyaz, T. Ince, and M. Gabbouj (2015) Real-time patient-specific ecg classification by 1-d convolutional neural networks. 63 (3), pp. 664–675. Cited by: §1.
  • Y. G. M. Kongbonga, H. Ghalila, M. B. Onana, Y. Majdi, Z. B. Lakhdar, H. Mezlini, and S. Sevestre-Ghalila (2011) Characterization of vegetable oils by fluorescence spectroscopy. 2 (7), pp. 692–699. Cited by: §1.
  • N. B. Kyriakidis and P. Skarkalis (2000) Fluorescence spectra measurement of olive oil and other vegetable oils. 83 (6), pp. 1435–1439. Cited by: §3.1.
  • Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel (1989) Handwritten digit recognition with a back-propagation network. 2. Cited by: §2.4.
  • L. Liu, M. Ji, and M. Buchroithner (2018) Transfer learning for soil spectroscopy based on convolutional neural networks and its application in soil clay content mapping using hyperspectral imagery. 18 (9), pp. 3169. Cited by: §2.4.
  • S. Malek, F. Melgani, and Y. Bazi (2018) One-dimensional convolutional neural networks for spectroscopic signal regression. 32 (5), pp. e2977. Cited by: §1, §2.4.
  • E. Martín-Tornero, A. Fernández, J. M. Pérez-Rodriguez, I. Durán-Merás, M. H. Prieto, and D. Martín-Vertedor (2021) Non-destructive fluorescence spectroscopy as a tool for discriminating between olive oils according to agronomic practices and for assessing quality parameters. pp. 1–13. Cited by: §1.
  • M. Meenu, Q. Cai, and B. Xu (2019) A critical review on analytical techniques to detect adulteration of extra virgin olive oil. 91, pp. 391–408. Cited by: §1.
  • U. Michelucci and F. Venturini (2021) Estimating neural network’s performance with bootstrap: a tutorial. 3 (2), pp. 357–373. External Links: Link, ISSN 2504-4990, Document Cited by: §2.5.
  • U. Michelucci (2018)

    Applied deep learning - a case-based approach to understanding deep neural networks

    .
    APRESS Media, LLC. External Links: ISBN 9780198520115 Cited by: §1, §1, §2.5, §2.5.
  • U. Michelucci (2019) Advanced applied deep learning: convolutional neural networks and object detection. Springer. Cited by: §2.4, §2.4.
  • P. Mishra, L. Lleó, T. Cuadrado, M. Ruiz-Altisent, and N. Hernández-Sánchez (2018) Monitoring oxidation changes in commercial extra virgin olive oils with fluorescence spectroscopy-based prototype. 244 (3), pp. 565–575. Cited by: §3.1.
  • A. Serrano, R. De la Rosa, A. Sánchez-Ortiz, J. Cano, A. G. Pérez, C. Sanz, R. Arias-Calderón, L. Velasco, and L. León (2021) Chemical components influencing oxidative stability and sensorial properties of extra virgin olive oil and effect of genotype and location on their expression. 136, pp. 110257. External Links: ISSN 0023-6438, Document, Link Cited by: §1.
  • E. Sikorska, I. Khmelinskii, and M. Sikorski (2012) Analysis of olive oils by fluorescence spectroscopy: methods and applications. pp. 63–88. Cited by: §1.
  • E. Sikorska, I. Khmelinskii, and M. Sikorski (2014) Vibrational and electronic spectroscopy and chemometrics in analysis of edible oils. pp. 201–234. Cited by: §1.
  • D. A. Skoog, F. J. Holler, and S. R. Crouch (2017) Principles of instrumental analysis. Cengage learning. Cited by: §1, §2.2.
  • A. Torreblanca-Zanca, R. Aroca-Santos, M. Lastra-Mejias, M. Izquierdo, J. C. Cancilla, and J. S. Torrecilla (2019) Laser diode induced excitation of pdo extra virgin olive oils for cognitive authentication and fraud detection. 280, pp. 1–9. Cited by: §2.2, §2.4, §3.1.
  • V. Uylaşer and G. Yildiz (2014) The historical development and nutritional importance of olive and olive oil constituted an important part of the mediterranean diet. 54 (8), pp. 1092–1101. Cited by: §1.
  • F. Venturini, M. Sperti, U. Michelucci, I. Herzig, M. Baumgartner, J. P. Caballero, A. Jimenez, and M. A. Deriu (2021) Exploration of spanish olive oil quality with a miniaturized low-cost fluorescence sensor and machine learning techniques. 10 (5), pp. 1010. Cited by: §1, §1, §2.2.
  • Z. Yuan, L. Zhang, D. Wang, J. Jiang, P. de B. Harrington, J. Mao, Q. Zhang, and P. Li (2020) Detection of flaxseed oil multiple adulteration by near-infrared spectroscopy and nonlinear one class partial least squares discriminant analysis. LWTLWTLWT - Food Science and TechnologyLWTSensorsJournal of ChemometricsAnalytica chimica actaMachine Learning and Knowledge ExtractionMachine Learning and Knowledge ExtractionProceedings of the IEEEAdvances in neural information processing systemsPattern RecognitionOffic. JLOfficial Journal of the European UnionJournal of Analytical Methods in ChemistryRemote sensing of EnvironmentApplied spectroscopyFood ControlAgronomyAnalytica chimica actaFoodsJournal of agricultural and food chemistryIEEE Transactions on Geoscience and Remote SensingFood & functionFood chemistryJournal of agricultural and food chemistryCritical reviews in food science and nutritionInternational journal of molecular sciencesJournal of agricultural and food chemistryFood ChemistryFood and Bioprocess TechnologyFood Packaging and Shelf LifeFood ControlFood and Bioprocess technologyIn Proceedings of 3rd International Conference on Learning Representations, ICLR 2015IEEE Transactions on Biomedical EngineeringFood and Nutrition SciencesJournal of AOAC InternationalFoodsFood Analytical MethodsTrends in Food Science & TechnologyTalantaEuropean Food Research and TechnologyAnalytica Chimica ActaFood chemistryEuropean Food Research and TechnologyOlive oil-constituents, quality, health properties and bioconversionsMethods in Food Analysis; Cruz, RMS, Khmelinskii, I., Vieira, M., EdsSensors and Actuators B: ChemicalCritical reviews in food science and nutritionFoodsCritical Reviews in Food Science and Nutrition 125, pp. 109247. External Links: ISSN 0023-6438, Document, Link Cited by: §1.
  • H. Zaroual, C. Chénè, E. M. El Hadrami, and R. Karoui (2021) Application of new emerging techniques in combination with classical methods for the determination of the quality and authenticity of olive oil: a review. pp. 1–24. Cited by: §1.
  • X. Zhang, T. Lin, J. Xu, X. Luo, and Y. Ying (2019) DeepSpectra: an end-to-end deep learning approach for quantitative spectral analysis. 1058, pp. 48–57. Cited by: §2.4.
  • Z. Zhao, X. Wu, and H. Liu (2022) Vision transformer for quality identification of sesame oil with stereoscopic fluorescence spectrum image. 158, pp. 113173. External Links: ISSN 0023-6438, Document, Link Cited by: §1.