Deep Neural Networks to Recover Unknown Physical Parameters from Oscillating Time Series

01/11/2021 ∙ by Antoine Garcon, et al. ∙ 16

Deep neural networks (DNNs) are widely used in pattern-recognition tasks for which a human comprehensible, quantitative description of the data-generating process, e.g., in the form of equations, cannot be achieved. While doing so, DNNs often produce an abstract (entangled and non-interpretable) representation of the data-generating process. This is one of the reasons why DNNs are not extensively used in physics-signal processing: physicists generally require their analyses to yield quantitative information about the studied systems. In this article we use DNNs to disentangle components of oscillating time series, and recover meaningful information. We show that, because DNNs can find useful abstract feature representations, they can be used when prior knowledge about the signal-generating process exists, but is not complete, as it is particularly the case in "new-physics" searches. To this aim, we train our DNN on synthetic oscillating time series to perform two tasks: a regression of the signal latent parameters and signal denoising by an Autoencoder-like architecture. We show that the regression and denoising performance is similar to those of least-square curve fittings (LS-fit) with true latent parameters' initial guesses, in spite of the DNN needing no initial guesses at all. We then explore applications in which we believe our architecture could prove useful for time-series processing in physics, when prior knowledge is incomplete. As an example, we employ DNNs as a tool to inform LS-fits when initial guesses are unknown. We show that the regression can be performed on some latent parameters, while ignoring the existence of others. Because the Autoencoder needs no prior information about the physical model, the remaining unknown latent parameters can still be captured, thus making use of partial prior knowledge, while leaving space for data exploration and discoveries.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 5

page 8

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep neural networks (DNNs) have been successfully used in a wide variety of tasks, such as regression, classification (e.g, in image or speech recognition Maas2017 ; Ciregan2012 ), and time-series analysis. They are known for being able to construct useful higher-level features from lower-level features in many applications, however, these feature representations frequently remain incomprehensible to humans. This property is one of the reasons why DNNs are not more widely used in physics, in which the approach to data exploration is usually drastically different.

Most systems studied in physics are well described by physicals models, generally referred to as equations of motion. The experimental data are analysed with respect to a particular model. When doing so, the equations of motion are analytically or numerically solved, yielding a theoretical description of the data-generating process. The resulting model generally includes a set of mathematical variables that can be adjusted to span the data. The true values of these variables are generally unknown and must be recovered. For that reason, we refer to them as latent parameters. The true latent parameters are approximated by comparing the data to the model, typically by fitting the model to the data. With this in mind, the ability of DNNs to find abstract representations of the data features rather than a quantitative generating process is generally seen as a limitation rather than an advantage by physicists. For that reason, DNNs are still often viewed as black boxes in physics and started to be used in the field only in recent years Guest2018 .

We find this to be a missed opportunity for the physics community. With physical models at hand, one can generate arbitrarily large volumes of synthetic data to train the DNNs, and later process real-world signals tremblay2018training

. This circumvents many challenges of supervised learning during which DNNs are trained with data for which the true latent parameters (labeled data) need to be known. Making full use of this possibility, DNNs were recently trained on synthetic nuclear magnetic resonance (NMR) spectroscopic data, simulated by accurate physical models 

Worswick2018 . The large amount of labeled data generated this way enables convergence of the DNN, which is then used to process real NMR data with great accuracy. A similar approach has become popular in robotics and autonomous driving.

Moreover, extensive work was done in order to disentangle and make sense of DNN representations. A notable example is that of the -variational autoencoder architecture Higgins2017 . Correlation loss penalties can also be used during DNN training, without prior knowledge of the data-generating process Steeg2017 ; Gao2020 . These methods consist of penalizing the DNN if its feature representation becomes entangled during training. While doing so, the DNN is encouraged to produce an efficient or disentangled feature representation. While disentangled, the representations achieved through these methods are not readily interpretable and usually require further analysis.

Nonetheless, DNNs are being increasingly used in physics data processing, in particular for signal classification - during which unusual datasets are flagged for further analysis. It was shown that can effectively be trained on Large Hadron Collider particle-jet data to detect events or anomalies Farina2020

. In this instance, the DNN is successfully able to increase the events’ signal-to-noise ratio by a factor

. Other searches in high-energy physics, including Kuusela2012 ; DAgnolo2019

, have recently been performed also with the aim of detecting data displacement from a null-hypothesis (no anomalies). All these searches seek to perform data analyses in a model-independent setting, that is, with minimal prior information or bias. More recently, DNNs have been applied to time-series processing in nano-NMR 

Aharon2019

. In nano-NMR settings the noise model is complex and noise overpowers the weak signals, rendering standard data analyses inefficient. The DNN was tasked to classify signals (i.e. discriminating two frequencies) and outperformed full-Bayesian methods.

While often achieving great successes, to our knowledge most applications of DNNs in physics are geared toward classification problems. In addition, DNNs are still rarely employed for time-series analyses, although they are the most common form of data acquired during physics experiments. In this article, we propose to use a DNN to disentangle components of monochromatic, amplitude- and frequency-modulated sine waves (AM/FM-sine waves respectively), arguably the most prevalent forms of time-domain signals in physics. The method yields similar performance as more standard analyses such as least-square curve fittings (LS-fits), during which the data-generating process is assumed to be known and a least-squares regression is performed to predict the signal’s latent parameters.

LS-fits, however, require the user to input latent-parameters initial guesses prior to regression. These initial guesses are the prior estimation of the true latent parameters and provide a starting point for the LS-fit gradient descent. The trained DNN however, needs no initial guesses, thus requiring less prior information about the data-generating process. Indeed, we show that, precisely because DNNs find abstract data representations, they can be used in settings when prior knowledge exists, but is not complete, as it is particularly the case in “new-physics” searches 

Safronova2018 , thus leaving space for data exploration and discoveries.

The first part of this article describes the synthetic data that we generate and use throughout this work, i.e. monochromatic, AM- and FM-sine waves time series, and their relevance to real-world physics experiments. We then describe our DNN architecture, which incorporates two tasks: A 111We note here that throughout the paper the term is employed as opposed to a , not as an independent variable in a regression. Therefore, refers to the DNN predicting the signal’s latent parameters. DNN performs a regression of the signal’s latent parameters that are known to be present in the data-generating process. In addition, an  Hinton2006 denoises the signals by learning an approximation of the unknown latent parameters. As a benchmarking method, we evaluate the DNN by comparing its performance to an LS-fit with true initial guesses.

We later employ the DNN in realistic settings, when prior knowledge about the data-generating process is incomplete: LS-fit fidelity is typically highly sensitive to initial guesses, thus requiring the user to perform preprocessing work or to possess prior information in order to perform optimally. As a first application, we show that the DNN can be used to predict initial guesses for the model fit evaluation. While consistently converging to optimal solutions, the technique circumvents the usual difficulties arising from fitting signals, such as the need for initial-guesses exploration.

Next, we show that the DNN can be used when the user ignores if the time-series are monochromatic-, AM- or FM-sine waves, but still wishes to recover their main frequency component. In such settings, the user is generally required to repeat the analysis by exploring the space of data-generating processes and initial guesses. Using our architecture enables the user to input only the known information when performing the analysis. That is, the is tasked to recover the user-expected latent parameters while ignoring the existence of others. Because the needs no prior information, it is still able to capture unknown information.

Ii Experimental methods

ii.1 Data description and generation procedure

Figure 1: Examples of frequency-modulated sine wave (FM) synthetic time series. Top: pure and noisy FM-sine wave decomposition. Gaussian noise and frequency modulation are linearly added to decaying sine wave carrier. Bottom: random selection of noisy-input (blue) and pure-target (black) samples, illustrating the effect of the random latent parameter selection.

The time series studied throughout the article are exponentially decaying monochromatic, FM- and AM-sine waves. Gaussian noise is linearly added to the pure signals. An example of FM-signal is shown in Fig.1 (top) alongside its sub-components (decaying carrier, frequency-modulation signal, and noise).

Decaying monochromatic-sine waves appear and are prevalent in all fields of physics. They arise from solving the equations of motion of the two-level quantum system, or of the classical harmonic oscillator; to which a multitude of other physical systems can be mathematically reduced to. Notorious examples include the spin-

particle in a DC magnetic field, the orbital motion of planets, or RLC circuits. In information theory, the two-level quantum system also provides a complete description of the qbit. Frequency and amplitude modulation generally arise from external factors such as oscillating magnetic or electric fields applied by the experimenters. Amplitude and frequency modulation of a carrier frequency are also the most common scheme of information communication links. Some form of Gaussian noise, while not necessarily always dominant, is in general present in every real-world signal. The statistical Gaussian noise formalism provides an accurate description of electronic thermal-noise, quantum shot noise, black-body radiation, and of White noise in general.

All time series used throughout the article are s long, sampled once per second. The latent parameters used to generate the monochromatic sinewaves are the carrier frequency, and phase , in addition to the coherence time . The AM- and FM-sinewaves are generated by adding a modulation function to the carrier. The modulation function’s latent parameters are the modulation frequency and amplitude, and

, respectively. Noise is linearly added to the pure signals by sampling the Gaussian distribution with zero mean and standard deviation

. The carrier amplitude is normalized to such that the signal-to-noise ratio is solely given by . The mathematical descriptions of the monochromatic, AM- and FM-sine waves are given in the Supplementary Materials.

Before each sample generation, the latent parameters are randomly and uniformly sampled within their respective allowed range, also given in the Supplementary Materials. The range of ensures the carrier frequency remains well within the Fourier and Nqyist limits. The modulation amplitude range ensures the majority of the signal’s power remains in its first sidebands and carrier.

Despite requiring only latent parameters to generate the samples, these ranges enable a wide scope of functions to be realized. AM/FM-signals with minimum reduce to decaying monochromatic-sine waves and reach modulation with maximum . The coherence time range is wide enough to span underdamped signals up to virtually non-decaying signals. These latent parameter ranges are wide enough such that they would encompass many foreseeable real-world signals. A random selection of FM-signals with and without noise is shown in Fig. 1 (bottom), illustrating the richness of the data in a more qualitative manner.

The choice of studying monochromatic, AM-, and FM-sine waves is not only motivated by their richness and prevalence in real-world physics experiments. Indeed, despite originating from different physical models and having different mathematical descriptions, the time series share similar visual features. As a result, within some range of parameters, even expert users could mistake the three generating processes. This is especially the case for weak modulations in the presence of noise, for which visual discrimination in time- or frequency-domain (inspecting the spectrum) may be impossible. For all the reasons cited above, monochromatic-, AM- or FM-sine waves appear as good representative signals on which to perform our study. Nevertheless, the methods presented in this article can be applied to other types of signals as well.

Most DNN implementations generally require input and target data to be normalized such as to avoid exploding and vanishing gradients during training Hochreiter1998 ; NEURIPS2018_13f9896d . All signals and latent parameters are normalized to lie within the -to- range prior to the application of the DNN. The phase is mapped to two separate parameters, , such as to account for phase periodicity during loss computation, while keeping both targets properly normalized. All other latent parameters are normalized using their respective range.

ii.2 Deep neural network architecture

The latent-parameters regression and signal denoising are performed by two separate architectures described (in Python code) in the Supplementary Material.

Denoising is performed by an architecture Hinton2006 composed of an followed by a . Noisy signals are first passed through the . The output layer has neurons and thus produces a compressed representation of the input signal. Following this step, the output is passed through the , which decompresses the signal to its original size. This type of - architecture, called , is widely used, inter alia, for data denoising  Gondara2016 . As the output dimension is smaller than the dimension of the input data, the ’s output layer acts as an information bottleneck, or more specifically dimensionality reduction, thus encouraging the network to capture relevant latent features while discarding noise or redundant information Hinton2006 .

Latent-parameters regression is also performed while passing the data through the . The output is then passed through a third DNN referred to as the . The output dimension of the is adjusted to the number of latent parameters that the is tasked to detect.

The and are trained on identical sets of samples. The ’s target data consists of the latent parameters, and the

target data are the noiseless signals. For both, the loss function is the mean squared error (MSE). The optimized architectures, shown in the Supplementary Materials, achieve sufficient performance, while keeping the number of trainable parameters under

million, such as to be able to perform training on a modern laptop GPU under hours for a typical training session of training sets of samples, over epochs. Due to the number and characteristics of the instances, asymptotic loss is reached within a small number of epochs. In general, increasing the number of instances of the training set was more beneficial than increasing the number of epochs.

Figure 2: Unified DNN architecture and loss description. The produces a reduced representation of the input noisy signals. The output is passed to the , which outputs the latent parameters’ prediction. The and outputs are passed to the , which produces a noiseless prediction of the inputs. The and outputs are used to compute the regression and denoising losses, and

, respectively. The loss used during backpropagation is a weighted sum of

and using a bias parameter . Detailed Python-code description of the architecture and loss function are given in the Supplementary Materials.

After refining the base , and , we unify the three architectures into a single DNN such that the and share the same . We find that unification is best achieved by merging them into a single DNN as depicted in Fig. 2. The output is passed through the , which predicts the signal’s latent parameters. The input consists of a concatenation of the and outputs. The latent parameter regression and signal-denoising losses are computed simultaneously ( and , respectively). The loss used during backpropagation is computed as a weighted sum of and as follows:

(1)

where the hyperparameter

is the bias adjustment between the two tasks.

This architecture presents the advantage of enabling bias control via a unique hyperparameter. Moreover, both networks are naturally trained at the same time rather than alternatingly, thus accelerating training approximately two-fold and enabling high-momentum gradiant optimizers.

ii.3 Training procedure


Figure 3: FM-sine waves test-data prediction errors for varying values of . Top: Denoising loss . Bottom: Regression loss . Setting or fully biases training toward one of the two tasks, preventing the negatively-biased tasks to reach sufficient performance. Middle-range values enable both tasks to be learned simultaneously.

To illustrate the effect of the bias parameter, we train the unified DNN on identical FM-sine waves datasets with varying values of . For this experiment, training is performed using training sets of randomly generated samples for

epochs. Because the number of synthetic samples is large and the latent parameters are continuous random variables, overfitting (controlled by a validation set, unseen during training) was never an issue.

The performance of the trained DNN is evaluated using a test set of randomly generated FM-samples, which were unseen during training. Figure 3 shows the test-sample losses for the denoising (top) and regression (bottom) tasks after training. Setting fully biases training towards the denoising tasks, which achieves best performance, while the parameter regression yields the worst results; vice versa for . This behaviour is also observed in Fig. S2 in the Supplementary Materials, which shows the validation-losses during training. The training curves show that extremum values of prevents validation-loss improvement of the negatively-biased task. Middle-range values enable both tasks to be learned simultaneously.

We find that the best values of are those for which the initial -weighted regression and denoising losses are within the same order of magnitude. As a result, determining a good value for is a trivial task: A single forward pass is performed to obtain the initial values of and . We then compute such that . Regardless of the type of data (monochromatic-, AM- and FM-samples), DNNs trained with achieve good overall performance (lowest weighted total loss) and little bias towards any of the tasks. This value of is employed throughout the entire article. For all that follows, training is always performed using training sets of randomly generated samples for epochs. This training is always enough to reach asymptotic loss, while exhibiting no noticeable overfitting. Training can be performed on decaying monochromatic-, AM-, FM-sine waves or a combination of all three processes.

To illustrate the architecture’s output, we train the DNN on AM-sine waves and show an example of a prediction in Fig. 4 alongside the noisy input signal. The DNN outputs a denoised prediction of the noisy AM-sine wave and a prediction of the latent parameters used to generate the signal.

Figure 4: Example of DNN prediction using a noisy AM-sine wave input. The DNN was trained only on AM-sine waves samples. Top: Noisy input (blue) and denoised prediction (orange). Bottom: True latent parameter targets (blue) and prediction (orange). The phase is mapped to two separate latent variables to accommodate for phase periodicity during loss computation.

Iii Experimental results

iii.1 Performance evaluation

Figure 5: Comparison of DNN post-training performance to LS-fits with true latent-parameters initial guesses for random monochromatic, decaying sine waves from the test set (unseen during training). The denoising (, top) and latent-parameters regression losses (, bottom) are sorted by increasing noise levels. The DNN was trained on monochromatic sine waves samples. is the MSE from the true latent parameters to the predicted latent parameters. For the DNN, is the MSE from the true noiseless signal to the noiseless-signal prediction. For the LS-fits, is computed similarly, but the noiseless-signal prediction is generated by inputting the predicted latent parameters in the noiseless data-generating process.

As a first evaluation method, we train the DNN on a random selection of decaying monochromatic sine waves (no modulation). The training, validation, and test samples are generated using random frequency, phase, coherence time, and noise levels.

After training, we evaluate the DNN performance by comparing its prediction error to an LS-fit using the Python Scipy library. When performing the LS-fit, the input data is the noisy signals and the objective function is with respect to the noiseless data-generating process. The LS-fit then produces predictions of the true latent parameters. To this end, the LS-fit requires latent-parameters initial guesses to start the gradient descent. The initial guesses used here are the true latent parameters (i.e. true frequency, phase, and coherence time). After gradient descent, the LS-fit outputs an estimation of the latent parameters, from which we generate a prediction of the noiseless signals. This is done by inputting the LS-fit latent-parameters predictions in the data-generating process. The LS-fit and DNN performance are then compared in two ways: (i) the latent-parameters regression loss is the MSE from the true latent parameters for both the LS-fit and DNN (), and (ii) the denoising error is the MSE from the true noiseless signals for both the LS-fit and DNN (). Note that this comparison drastically favors the LS-fit, which then constitutes a good benchmark method. Indeed, in any practical applications the true value of the latent parameters are hidden from the user, and LS-fits are employed precisely to approximate them.

A random selection of noisy signals from the test set is processed using this method. Figure 5 shows the and for both the DNN and LS-fit sorted by noise level (examples of signals with extremum noise levels, alongside LS-fit and DNN predictions are shown in Fig. S1 in the Supplementary Materials).

Figure 6: Comparison of DNN post-training performance to LS-fits with true latent-parameters initial guesses for identical AM-sine wave but with increasing noise level, unseen during training. The denoising (, top) and latent-parameters regression losses (, bottom) are sorted by increasing noise levels. The DNN was trained only on AM-sine waves samples. See Fig. 5 for and computation methods.

A similar evaluation is performed using AM-samples. In this experiment, the DNN is specifically trained on AM-samples. The test samples are all generated using identical latent parameters, while the noise level is increased. Examples of such samples are given in Fig. S1 in the Supplementary Materials. Figure 6 shows the prediction errors of all samples, for both the DNN and LS-fit sorted by noise levels.

For both monochromatic and AM-signals, the DNN performs generally worse than the LS-fit for low-noise signals. However, the DNN reaches LS-fit performance-level once the noise reaches the top half of the allowed range (corresponding to a noise level before normalization), while requiring no initial guesses. The latent-parameters regression follows a similar trend. We note that, in general, DNN outputs are less sensitive to noise, and the performance is more consistent throughout both datasets.

These results show that our architecture is a good alternative to LS-fits for time-series analysis, as it reaches acceptable performance when benchmarked to standard LS-fits with true guesses, while needing no initial guesses.

iii.2 DNN-assisted LS-fit

Figure 7: DNN latent-parameters predictions used as initial guesses for DNN-assisted fits. Comparison to LS-fit with true initial guesses. Top: The denoising () and latent-parameters regression losses () are sorted by increasing noise levels. See Fig. 5 for and computation methods. Bottom: Phase and carrier frequency predictions for the DNN-assisted fits and LS-fits. Both methods converge to the same losses and predictions for over of the samples. The DNN and data employed here are identical as in Fig. 5.

We now wish to apply our DNN in more realistic settings. Fitting oscillating time series using LS-fits is notoriously difficult because the MSE is in general a non-convex function of the latent parameters and possesses numerous local minima. Consequently, the quality of the LS-fit is highly dependent on the initial guesses in addition to the noise. In the previous experiments, LS-fits were only performed as a benchmark method, and the initial guesses were the true latent parameters. In any real-world setting, the user must perform some preprocessing work or use prior information to find initial guesses leading to the global minima. In this section, we propose to employ the DNN as a preprocessing tool to assist LS-fit in the situation when the user possesses no prior information about the initial guesses and wishes to recover the signal’s latent parameters. The sine wave samples from the previous experiment are fitted while using the DNN latent predictions as initial guesses. Results of this experiment are shown in Fig. 7 alongside LS-fits with true initial guesses results.

Because the DNN predictions are always within the venicity of the true parameters, almost all DNN-assisted LS-fits converge to optimal solutions. In settings when the initial guesses are unknown or samples are numerous, the user can initially train the DNN on synthetic data and use it for DNN-assisted fits. As the latter performs optimally regardless of the noise level, this enables fast and accurate analysis of large datasets by removing the need for initial guesses exploration.

iii.3 Partial information regression and denoising

Figure 8: Performance comparison of the specialized DNN (trained on AM-sine waves, tasked to denoise signals and recover all latent parameters of AM-sine waves) and of the partial DNN (trained on monochromatic, AM- and FM-sine waves, tasked to denoise signals and recover the carrier frequency, phase, coherence time and noise level only). Top: Example of noisy input AM-signal (blue), alongside specialized- and partial-DNN denoised and latent predictions. Bottom: Individual latent parameters and signal denoising root mean squared error (RMSE), averaged over the whole AM-sinewave test set ( samples) for both DNNs.

In the experiments presented above, the data-generating process was assumed to be fully known by the user. The DNN or DNN-assisted LS-fits were employed to recover the signal latent parameters and denoise the signal. We now wish to explore the possibility of employing the DNN in a situation where the data-generating processes to be explored are multi-fold and guesses must be done. This is typically the case in “new-physics searches” experiments Safronova2018 , during which hypothetical and undiscovered particles may cause signals deviating from the null-hypothesis (i.e. no new particles). As the hypothetical particles are numerous, they may have many potential effects on the signals. We take the situation in which a potential external source could modulate a carrier signal produced by the experiment, as it is sometimes the case for bosonic dark-matter Garcon2019 .

Specifically, we study the case in which the end-user is aware of the existence of an oscillation in the signal provided by the experimental setup. The user ignores if the signal is monochromatic, amplitude or frequency modulated. Nonetheless, the user wishes to recover the frequency, phase, and coherence time of the expected oscillation.

In this situation, the typical approach is to test all allowed processes by varying the LS-fits objective functions and explore the space of initial guesses for each process. This approach presents a new set of challenges, as this exploration is time consuming and sometimes unrealistic, if the data is too large or if too many processes are to be tested. Moreover, in some situations, all guesses can be wrong.

We show that it is possible to perform the regression and denoising with partial prior information about the physical process producing the data. That is, the DNN is tasked to perform the regression only on the narrow set of latent parameters that exist across all models: frequency, phase, coherence time, and noise level. However, the DNN ignores any form of modulation. This is done by decreasing the number of neurons in the ’s output layer. The DNN is then trained on signals from every explored model (monochromatic, AM and FM). We now refer to this DNN as the partial DNN (ignoring the existence of particular modulation type).

After training, we compare the performance of the partial DNN to a specialized DNN, trained specifically on AM signals, which performs a regression of all latent parameters. Figure 8 shows the and averaged over the AM-sine wave test set ( samples) for both the AM-specialized DNN and the partial DNN. The denoising task reaches the same level of precision to that of the specialized DNN. Moreover, the estimation of the carrier frequency, phase, and coherence time reaches similar performance to that of the specialized DNN.

Using this method, the user’s prior information is encoded into the architecture and training data. The then captures the expected latent parameters, thus removing the need to iteratively explore models. The and remain unchanged and are still able to capture unknown latent parameters by reproducing noiseless signals. This method enables partial prior information to be employed, while leaving space for signal exploration and unexpected discoveries.

Iv Conclusion & Outlook

We have presented an efficient DNN that combines the denoising of times series and regression of their latent parameters. The DNN was trained and evaluated on synthetic monochromatic, frequency- and amplitude-modulated decaying sine waves with Gaussian noise; some of the most prevalent forms of signals acquired in physics.

For high-noise signals, the DNN reaches same levels of precision as an LS-fit with true initial guesses, in spite of the DNN needing no guesses at all. In addition, the architecture requires no hyperparameter fine tuning to perform consistently. Moreover, because large volumes of synthetic training data can be generated, the DNN is quickly adaptable to a broad range of physical signals. This makes our architecture a good alternative to LS-fits for analysing large volumes of data, when fitting individual signals requires too much computation or user time.

The DNN architecture is flexible and can accommodate for various levels of user prior information. First, the DNN was used to assist LS-fits and predict initial guesses, unknown by the user. In this situation, DNN-assisted LS-fits consistently converge to the optimal solutions. Moreover, the regression task can be adapted to accommodate for partial prior information about the data-generating process. The known latent parameters are encoded in the and training data, while the helps the to still capture unknown signal features, thus leaving space for data exploration and discoveries.

Because training is done on arbitrarily large volumes of synthetic data, raw performance could be improved by increasing the number of trainable parameters such as adding more layers or neurons, without too much concern for overfitting. The architecture itself could be augmented by adding an upstream classifier DNN-module, which could identify the type of signals being analyzed. Classified signals could then be processed via specialized versions of our architecture, trained on the corresponding type of signals.

Time-domain oscillations generally appear as peaks or peak multiplets in frequency-domain spectra. Frequency, amplitude, and phase information is then localized to narrow regions of the spectral data. For that reason, we believe further improvements could be attained by making use of frequency-domain information. We suggest to use Fourier transforms or power spectra as DNN inputs, in addition to the raw time series.

The proposed DNN architecture can be used to detect and approximate hidden features in time series data. The outputs a prediction of prior known parameters, but real signals could still contain unknown latent variables. These hidden latent variables can be detected and approximated by our DNN, as it also incorporates an -like structure. As such, the bottleneck layer contains a feature representation of the time series, used by the to recreate the original signal. This bottleneck layer will be further investigated, in order to detect and specify hidden latent parameters.

We remain aware that in physics data analysis, a sole estimation of latent parameters often provides insufficient information. Standard analysis usually requires a quantitative estimation of the prediction uncertainty, often represented as error bars or confidence intervals. In LS-fits, this uncertainty is naturally obtained by maximizing the fit likelihood under the assumption of Gaussian distributed latent variables 

Bishop2011chap1p29 . Despite extensive efforts, DNNs still lack the capacity for reliable uncertainty evaluation Kasiviswanathan2017 ; Kabir2018 ; Ding2020 and more work needs to be performed in this area to further generalize DNN usage in physics signal processing.

Nonetheless, we believe this architecture is readily applicable to existing physics experiments, in particular bosonic dark-matter searches, in which large quantities of data are to be analyzed with partial prior information.

References

  • (1) A. L. Maas, P. Qi, Z. Xie, A. Y. Hannun, C. T. Lengerich, D. Jurafsky, and A. Y. Ng, “Building DNN acoustic models for large vocabulary speech recognition,” Computer Speech and Language, vol. 41, pp. 195–213, 2017.
  • (2) D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,”

    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    , no. February, pp. 3642–3649, 2012.
  • (3)

    D. Guest, K. Cranmer, and D. Whiteson, “Deep learning and its application to LHC physics,”

    Annual Review of Nuclear and Particle Science, vol. 68, pp. 161–181, 2018.
  • (4) J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 969–977, 2018.
  • (5) S. G. Worswick, J. A. Spencer, G. Jeschke, and I. Kuprov, “Deep neural network processing of DEER data,” Science Advances, vol. 4, no. 8, pp. 1–18, 2018.
  • (6) “Β-VAE: Learning basic visual concepts with a constrained variational framework,” 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, pp. 1–22, 2017.
  • (7)

    G. V. Steeg, “Unsupervised Learning via Total Correlation Explanation,”

    arXiv, pp. 5151–5155, 2017.
  • (8) S. Gao, R. Brekelmans, G. Ver Steeg, and A. Galstyan, “Auto-encoding total correlation explanation,”

    AISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics

    , 2020.
  • (9) M. Farina, Y. Nakai, and D. Shih, “Searching for new physics with deep autoencoders,” Physical Review D, vol. 101, no. 7, p. 75021, 2020.
  • (10)

    M. Kuusela, T. Vatanen, E. Malmi, T. Raiko, T. Aaltonen, and Y. Nagai, “Semi-supervised anomaly detection - Towards model-independent searches of new physics,”

    Journal of Physics: Conference Series, vol. 368, no. 1, 2012.
  • (11) R. T. D’Agnolo and A. Wulzer, “Learning new physics from a machine,” Physical Review D, vol. 99, no. 1, pp. 1–34, 2019.
  • (12) N. Aharon, A. Rotem, L. P. McGuinness, F. Jelezko, A. Retzker, and Z. Ringel, “NV center based nano-NMR enhanced by deep learning,” Scientific Reports, vol. 9, no. 1, pp. 1–11, 2019.
  • (13) M. S. Safronova, D. Budker, D. Demille, D. F. Kimball, A. Derevianko, and C. W. Clark, “Search for new physics with atoms and molecules,” Reviews of Modern Physics, vol. 90, no. 2, 2018.
  • (14) G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
  • (15)

    S. Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions,”

    International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems, vol. 6, no. 2, pp. 107–116, 1998.
  • (16) B. Hanin, “Which neural net architectures give rise to exploding and vanishing gradients?,” in Advances in Neural Information Processing Systems (S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds.), vol. 31, pp. 582–591, Curran Associates, Inc., 2018.
  • (17)

    L. Gondara, “Medical Image Denoising Using Convolutional Denoising Autoencoders,”

    IEEE International Conference on Data Mining Workshops, ICDMW, vol. 0, pp. 241–246, 2016.
  • (18) A. Garcon, J. W. Blanchard, G. P. Centers, N. L. Figueroa, P. W. Graham, D. F. Jackson Kimball, S. Rajendran, A. O. Sushkov, Y. V. Stadnik, A. Wickenbrock, T. Wu, and D. Budker, “Constraints on bosonic dark matter from ultralow-field nuclear magnetic resonance,” Science Advances, vol. 5, no. 10, 2019.
  • (19) M. Bishop,

    Pattern Recognition and Machine Learning - Chapter.1

    .
    Springer, 2011.
  • (20) K. S. Kasiviswanathan and K. P. Sudheer, “Methods used for quantifying the prediction uncertainty of artificial neural network based hydrologic models,” Stochastic Environmental Research and Risk Assessment, vol. 31, no. 7, pp. 1659–1670, 2017.
  • (21) H. M. Kabir, A. Khosravi, M. A. Hosen, and S. Nahavandi, “Neural Network-Based Uncertainty Quantification: A Survey of Methodologies and Applications,” IEEE Access, vol. 6, no. c, pp. 36218–36234, 2018.
  • (22) Y. Ding, J. Liu, J. Xiong, and Y. Shi, “Revisiting the evaluation of uncertainty estimation and its application to explore model complexity-uncertainty trade-off,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2020-June, pp. 22–31, 2020.

Acknowledgement

Acknowledgments: The authors wish to thank Lizon Tijus for figure rendering. Funding: This work was supported in part by the Cluster of Excellence PRISMA+ funded by the German Research Foundation (DFG) within the German Excellence Strategy (Project ID 39083149), by the European Research Council (ERC) under the European Union Horizon 2020 research and innovation program (project Dark-OST, grant agreement No 695405), and by the DFG Reinhart Koselleck project. A.G. acknowledges funding from the Emergent AI Center funded by the Carl-Zeiss-Stiftung. Competing interests: All authors have read and contributed to the final form of the manuscript and declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions of this article are present in the article and/or the Supplementary Materials. Additional data and source code related to this article may be requested from the authors.

Appendix A Supplementary information

a.1 Data generation procedure

The time series used throughout the article are generated by propagating the time, , from to (with length ) in s increments, and using the following formula:

(1)
(2)
(3)

where and are the sine wave carrier frequency and phase, respectively. and are the modulation frequency and amplitude. The noise is sampled from the Gaussian distribution with zero mean and standard deviation . and are the first and second Bessel functions of the first kind, respectively. Before each sample generation, the latent parameters are randomly and uniformly sampled within the following ranges:

a.2 DNN architecture implementation

The composed of layers, followed by layers. The output layer has neurons. The is composed of followed by a and layers. The output dimension of the is adjusted to the number of latent parameters that the is tasked to detect. The is composed of -[--] layers, followed by a single layer. The consists of a concatenation of the and ouputs.

All activation functions are rectified linear units, with the exception of the

and

outputs which are linear and sigmoid function, respectively. Pseudo-code architectures of the

, ,

, and final DNN, as implemented in Python (Keras - Tensorflow), are given below in addition to the custom weighted-loss function.

bottleneck_dim = 64 # Output dimension of the Encoder
latent_dim     = 7  # Output dimension of the Regressor (number of latent parameters)
signal_length  = 512
## ENCODER subDNN
input_shape = (signal_length,1)
i = Input(shape=input_shape)
x = Conv1D(64, kernel_size=64, activation=’relu ,padding=’same’)(i)
x = MaxPooling1D(4, padding=’same’)(x)
x = Conv1D(64, kernel_size=32, activation=’relu ,padding=’same’)(x)
x = MaxPooling1D(4, padding=’same’)(x)
x = Flatten()(x)
x = Dense(128, activation=’relu’, kernel_initializer=”he_uniform”)(x)
bottleneck = Dense(64, activation=’relu’, kernel_initializer=”he_uniform”)(x)
encoder = Model(i, bottleneck)
## REGRESSOR subDNN
input_shape = (bottleneck_dim,1)
r_i = Input(shape=input_shape)
x = Reshape((bottleneck_dim,1))(r_i)
x = Conv1D(64, kernel_size=64, activation=’relu ,padding=’same’)(x)
x = MaxPooling1D(4, padding=’same’)(x)
x = Conv1D(64, kernel_size=32, activation=’relu ,padding=’same’)(x)
x = MaxPooling1D(4, padding=’same’)(x)
x = Conv1D(64, kernel_size=32, activation=’relu ,padding=’same’)(x)
x = Flatten()(x)
x = Dense(256, activation=’relu’, kernel_initializer=”he_uniform )(x)
x = Dense(128, activation=’relu’, kernel_initializer=”he_uniform )(x)
x = Dense(64,  activation=’relu’, kernel_initializer=”he_uniform )(x)
latent = Dense(latent_dim)(x)
regressor = Model(r_i, latent)
## DECODER subDNN
input_shape = (bottleneck_dim+latent_dim,1)
d_i   = Input(shape=input_shape)
x = Dense(128, activation=’relu’, kernel_initializer=”he_uniform”)(d_i)
x = Reshape((128,1))(x)
x = Conv1D(64, kernel_size=32, activation=’relu’, padding=’same’)(x)
x = MaxPooling1D(2, padding=’same’)(x)
x = UpSampling1D(4)(x)
x = Conv1D(64, kernel_size=32, activation=’relu’, padding=’same’)(x)
x = MaxPooling1D(2, padding=’same’)(x)
x = UpSampling1D(4)(x)
x = Conv1D(64, kernel_size=32, activation=’relu’, padding=’same’)(x)
x = MaxPooling1D(2, padding=’same’)(x)
x = UpSampling1D(2)(x)
decoded = Conv1D(1, kernel_size=32, activation=’sigmoid’, padding=’same’)(x)
decoder = Model(d_i, decoded)
## Unified DNN model
concat = Concatenate()([encoder(i),regressor(encoder(i)) ])
ae_outputs = decoder(concat)
flatten_ae_outputs = Reshape((signal_length,))(ae_outputs)
concat2 = Concatenate()([flatten_ae_outputs, regressor(encoder(i)) ])
DNN_outputs = concat2
DNN = Model(i, DNN_outputs)
## CustomLoss function
def customLoss(yTrue,yPred):
    latentSize = 7
    SignalSize = 512
    beta = 0.001
    mseSignal = K.square(yTrue[:, 0:SignalSize] - yPred[:, 0:SignalSize])
    mseSignal = K.abs(mseSignal)
    mseSignal = K.sum(mseSignal, axis=-1)
    mseSignal = mseSignal/SignalSize
    mseLatent = K.square(yTrue[:, SignalSize:] - yPred[:, SignalSize:])
    mseLatent = K.abs(mseLatent)
    mseLatent = K.sum(mseLatent, axis=-1)
    mseLatent = mseLatent/latentSize
    weighted_mse = (1-beta)*mseSignal + beta*mseLatent
    return weighted_mse
DNN.compile(optimizer = ADAM’, loss = customLoss)
DNN.summary()

a.3 Supplementary figures

Figure S1: Example of minimum (left) and maximum noise (right) sine wave (top) and AM-sine wave (bottom) samples. DNN and LS-fits denoised predictions and respective residues are shown below the respective inputs. The LS-fit with true initial guesses vastly outperforms the DNN for low-noise signals but both systems reach similar performance for high-noise.
Figure S2: Validation loss during training for various values of . Left: signal denoising losses. Right: latent-parameters regression losses. Training is performed on training sets of randomly generated FM-sine waves for epochs. For clarity, validation-loss is displayed after each individual training set backpropagation.