Raman spectroscopy in open world learning settings using the Objectosphere approach
Raman spectroscopy in combination with machine learning has significant promise for applications in clinical settings as a rapid, sensitive, and label-free identification method. These approaches perform well in classifying data that contains classes that occur during the training phase. However, in practice, there are always substances whose spectra have not yet been taken or are not yet known and when the input data are far from the training set and include new classes that were not seen at the training stage, a significant number of false positives are recorded which limits the clinical relevance of these algorithms. Here we show that these obstacles can be overcome by implementing recently introduced Entropic Open Set and Objectosphere loss functions. To demonstrate the efficiency of this approach, we compiled a database of Raman spectra of 40 chemical classes separating them into 20 biologically relevant classes comprised of amino acids, 10 irrelevant classes comprised of bio-related chemicals, and 10 classes that the Neural Network has not seen before, comprised of a variety of other chemicals. We show that this approach enables the network to effectively identify the unknown classes while preserving high accuracy on the known ones, dramatically reducing the number of false positives while preserving high accuracy on the known classes, which will allow this technique to bridge the gap between laboratory experiments and clinical applications.
READ FULL TEXT