
Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness
The accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of modules of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. We validate our theoretical results by several data sets of images. The numerical results verify that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. In addition, we observe a clear consistency between test loss and neural network smoothness during the training process.
05/27/2019 ∙ by Pengzhan Jin, et al. ∙ 31 ∙ shareread it

Learning in Modal Space: Solving TimeDependent Stochastic PDEs Using PhysicsInformed Neural Networks
One of the open problems in scientific computing is the longtime integration of nonlinear stochastic partial differential equations (SPDEs). We address this problem by taking advantage of recent advances in scientific machine learning and the dynamically orthogonal (DO) and biorthogonal (BO) methods for representing stochastic processes. Specifically, we propose two new PhysicsInformed Neural Networks (PINNs) for solving timedependent SPDEs, namely the NNDO/BO methods, which incorporate the DO/BO constraints into the loss function with an implicit form instead of generating explicit expressions for the temporal derivatives of the DO/BO modes. Hence, the proposed methods overcome some of the drawbacks of the original DO/BO methods: we do not need the assumption that the covariance matrix of the random coefficients is invertible as in the original DO method, and we can remove the assumption of no eigenvalue crossing as in the original BO method. Moreover, the NNDO/BO methods can be used to solve timedependent stochastic inverse problems with the same formulation and computational complexity as for forward problems. We demonstrate the capability of the proposed methods via several numerical examples: (1) A linear stochastic advection equation with deterministic initial condition where the original DO/BO method would fail; (2) Longtime integration of the stochastic Burgers' equation with many eigenvalue crossings during the whole time evolution where the original BO method fails. (3) Nonlinear reaction diffusion equation: we consider both the forward and the inverse problem, including noisy initial data, to investigate the flexibility of the NNDO/BO methods in handling inverse and mixed type problems. Taken together, these simulation results demonstrate that the NNDO/BO methods can be employed to effectively quantify uncertainty propagation in a wide range of physical problems.
05/03/2019 ∙ by Dongkun Zhang, et al. ∙ 6 ∙ shareread it

Hidden Fluid Mechanics: A NavierStokes Informed Deep Learning Framework for Assimilating Flow Visualization Data
We present hidden fluid mechanics (HFM), a physics informed deep learning framework capable of encoding an important class of physical laws governing fluid motions, namely the NavierStokes equations. In particular, we seek to leverage the underlying conservation laws (i.e., for mass, momentum, and energy) to infer hidden quantities of interest such as velocity and pressure fields merely from spatiotemporal visualizations of a passive scaler (e.g., dye or smoke), transported in arbitrarily complex domains (e.g., in human arteries or brain aneurysms). Our approach towards solving the aforementioned data assimilation problem is unique as we design an algorithm that is agnostic to the geometry or the initial and boundary conditions. This makes HFM highly flexible in choosing the spatiotemporal domain of interest for data acquisition as well as subsequent training and predictions. Consequently, the predictions made by HFM are among those cases where a pure machine learning strategy or a mere scientific computing approach simply cannot reproduce. The proposed algorithm achieves accurate predictions of the pressure and velocity fields in both two and three dimensional flows for several benchmark problems motivated by realworld applications. Our results demonstrate that this relatively simple methodology can be used in physical and biomedical problems to extract valuable quantitative information (e.g., lift and drag forces or wall shear stresses in arteries) for which direct measurements may not be possible.
08/13/2018 ∙ by Maziar Raissi, et al. ∙ 2 ∙ shareread it

Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations
While there is currently a lot of enthusiasm about "big data", useful data is usually "small" and expensive to acquire. In this paper, we present a new paradigm of learning partial differential equations from small data. In particular, we introduce hidden physics models, which are essentially dataefficient learning machines capable of leveraging the underlying laws of physics, expressed by time dependent and nonlinear partial differential equations, to extract patterns from highdimensional data generated from experiments. The proposed methodology may be applied to the problem of learning, system identification, or datadriven discovery of partial differential equations. Our framework relies on Gaussian processes, a powerful tool for probabilistic inference over functions, that enables us to strike a balance between model complexity and data fitting. The effectiveness of the proposed approach is demonstrated through a variety of canonical problems, spanning a number of scientific domains, including the NavierStokes, Schrödinger, KuramotoSivashinsky, and time dependent linear fractional equations. The methodology provides a promising new direction for harnessing the longstanding developments of classical methods in applied mathematics and mathematical physics to design learning machines with the ability to operate in complex domains without requiring large quantities of data.
08/02/2017 ∙ by Maziar Raissi, et al. ∙ 0 ∙ shareread it

Machine Learning of Linear Differential Equations using Gaussian Processes
This work leverages recent advances in probabilistic machine learning to discover conservation laws expressed by parametric linear equations. Such equations involve, but are not limited to, ordinary and partial differential, integrodifferential, and fractional order operators. Here, Gaussian process priors are modified according to the particular form of such operators and are employed to infer parameters of the linear equations from scarce and possibly noisy observations. Such observations may come from experiments or "blackbox" computer simulations.
01/10/2017 ∙ by Maziar Raissi, et al. ∙ 0 ∙ shareread it

Physics Informed Deep Learning (Part II): Datadriven Discovery of Nonlinear Partial Differential Equations
We introduce physics informed neural networks  neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this second part of our twopart treatise, we focus on the problem of datadriven discovery of partial differential equations. Depending on whether the available data is scattered in spacetime or arranged in fixed temporal snapshots, we introduce two main classes of algorithms, namely continuous time and discrete time models. The effectiveness of our approach is demonstrated using a wide range of benchmark problems in mathematical physics, including conservation laws, incompressible fluid flow, and the propagation of nonlinear shallowwater waves.
11/28/2017 ∙ by Maziar Raissi, et al. ∙ 0 ∙ shareread it

Multistep Neural Networks for Datadriven Discovery of Nonlinear Dynamical Systems
The process of transforming observed data into predictive mathematical models of the physical world has always been paramount in science and engineering. Although data is currently being collected at an everincreasing pace, devising meaningful models out of such observations in an automated fashion still remains an open problem. In this work, we put forth a machine learning approach for identifying nonlinear dynamical systems from data. Specifically, we blend classical tools from numerical analysis, namely the multistep timestepping schemes, with powerful nonlinear function approximators, namely deep neural networks, to distill the mechanisms that govern the evolution of a given dataset. We test the effectiveness of our approach for several benchmark problems involving the identification of complex, nonlinear and chaotic dynamics, and we demonstrate how this allows us to accurately learn the dynamics, forecast future states, and identify basins of attraction. In particular, we study the Lorenz system, the fluid flow behind a cylinder, the Hopf bifurcation, and the Glycoltic oscillator model as an example of complicated nonlinear dynamics typical of biological systems.
01/04/2018 ∙ by Maziar Raissi, et al. ∙ 0 ∙ shareread it

Numerical Gaussian Processes for Timedependent and Nonlinear Partial Differential Equations
We introduce the concept of numerical Gaussian processes, which we define as Gaussian processes with covariance functions resulting from temporal discretization of timedependent partial differential equations. Numerical Gaussian processes, by construction, are designed to deal with cases where: (1) all we observe are noisy data on blackbox initial conditions, and (2) we are interested in quantifying the uncertainty associated with such noisy data in our solutions to timedependent partial differential equations. Our method circumvents the need for spatial discretization of the differential operators by proper placement of Gaussian process priors. This is an attempt to construct structured and dataefficient learning machines, which are explicitly informed by the underlying physics that possibly generated the observed data. The effectiveness of the proposed approach is demonstrated through several benchmark problems involving linear and nonlinear timedependent operators. In all examples, we are able to recover accurate approximations of the latent solutions, and consistently propagate uncertainty, even in cases involving very long time integration.
03/29/2017 ∙ by Maziar Raissi, et al. ∙ 0 ∙ shareread it

Neuralnetinduced Gaussian process regression for function approximation and PDE solution
Neuralnetinduced Gaussian process (NNGP) regression inherits both the high expressivity of deep neural networks (deep NNs) as well as the uncertainty quantification property of Gaussian processes (GPs). We generalize the current NNGP to first include a larger number of hyperparameters and subsequently train the model by maximum likelihood estimation. Unlike previous works on NNGP that targeted classification, here we apply the generalized NNGP to function approximation and to solving partial differential equations (PDEs). Specifically, we develop an analytical iteration formula to compute the covariance function of GP induced by deep NN with an errorfunction nonlinearity. We compare the performance of the generalized NNGP for function approximations and PDE solutions with those of GPs and fullyconnected NNs. We observe that for smooth functions the generalized NNGP can yield the same order of accuracy with GP, while both NNGP and GP outperform deep NN. For nonsmooth functions, the generalized NNGP is superior to GP and comparable or superior to deep NN.
06/22/2018 ∙ by Guofei Pang, et al. ∙ 0 ∙ shareread it

An Atomistic Fingerprint Algorithm for Learning Ab Initio Molecular Force Fields
Molecular fingerprints, i.e. feature vectors describing atomistic neighborhood configurations, is an important abstraction and a key ingredient for datadriven modeling of potential energy surface and interatomic force. In this paper, we present the DensityEncoded Canonically Aligned Fingerprint (DECAF) fingerprint algorithm, which is robust and efficient, for fitting peratom scalar and vector quantities. The fingerprint is essentially a continuous density field formed through the superimposition of smoothing kernels centered on the atoms. Rotational invariance of the fingerprint is achieved by aligning, for each fingerprint instance, the neighboring atoms onto a local canonical coordinate frame computed from a kernel minisum optimization procedure. We show that this approach is superior over PCAbased methods especially when the atomistic neighborhood is sparse and/or contains symmetry. We propose that the `distance' between the density fields be measured using a volume integral of their pointwise difference. This can be efficiently computed using optimal quadrature rules, which only require discrete sampling at a small number of grid points. We also experiment on the choice of weight functions for constructing the density fields, and characterize their performance for fitting interatomic potentials. The applicability of the fingerprint is demonstrated through a set of benchmark problems.
09/26/2017 ∙ by YuHang Tang, et al. ∙ 0 ∙ shareread it

Collapse of Deep and Narrow Neural Nets
Recent theoretical work has demonstrated that deep neural networks have superior performance over shallow networks, but their training is more difficult, e.g., they suffer from the vanishing gradient problem. This problem can be typically resolved by the rectified linear unit (ReLU) activation. However, here we show that even for such activation, deep and narrow neural networks will converge to erroneous mean or median states of the target function depending on the loss with high probability. We demonstrate this collapse of deep and narrow neural networks both numerically and theoretically, and provide estimates of the probability of collapse. We also construct a diagram of a safe region of designing neural networks that avoid the collapse to erroneous states. Finally, we examine different ways of initialization and normalization that may avoid the collapse problem.
08/15/2018 ∙ by Lu Lu, et al. ∙ 0 ∙ shareread it
George Em Karniadakis
is this you? claim profile
Researcher at MIT Sea Grant, Professor at Brown University