Use of Machine Learning for unraveling hidden correlations between Particle Size Distributions and the Mechanical Behavior of Granular Materials

A data-driven framework was used to predict the macroscopic mechanical behavior of dense packings of polydisperse granular materials. The Discrete Element Method, DEM, was used to generate 92,378 sphere packings that covered many different kinds of particle size distributions, PSD, lying within 2 particle sizes. These packings were subjected to triaxial compression and the corresponding stress-strain curves were fitted to Duncan-Chang hyperbolic models. A multivariate statistical analysis was unsuccessful to relate the model parameters with common geotechnical and statistical descriptors derived from the PSD. In contrast, an artificial Neural Network (NN) scheme, trained with a few hundred DEM simulations, was able to anticipate the value of the model parameters for all these PSDs, with considerable accuracy. This was achieved in spite of the presence of noise in the training data. The NN revealed the existence of hidden correlations between PSD of granular materials and their macroscopic mechanical behavior.



page 11


Thermodynamics-based Artificial Neural Networks (TANN) for multiscale modeling of materials with inelastic microstructure

The mechanical behavior of inelastic materials with microstructure is ve...

FEA-Net: A Physics-guided Data-driven Model for Efficient Mechanical Response Prediction

An innovative physics-guided learning algorithm for predicting the mecha...

Learning Constitutive Relations using Symmetric Positive Definite Neural Networks

We present the Cholesky-factored symmetric positive definite neural netw...

Neural Network and Particle Filtering: A Hybrid Framework for Crack Propagation Prediction

Crack detection, length estimation, and Remaining Useful Life (RUL) pred...

A Predictive Discrete-Continuum Multiscale Model of Plasticity With Quantified Uncertainty

Multiscale models of materials, consisting of upscaling discrete simulat...

ImageMech: From image to particle spring network for mechanical characterization

The emerging demand for advanced structural and biological materials cal...

A relativistic extension of Hopfield neural networks via the mechanical analogy

We propose a modification of the cost function of the Hopfield model who...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The specific values of properties such as strength, compressibility and permeability of dry and cohesionless coarse grain materials (including sand, gravel, railway ballast or rockfill) depend on the features of the constituent particles (intrinsic properties) and on the way in which the particles are arranged (state parameters). Among the intrinsic properties of a sand, the surface friction, the compressibility and the strength of individual grains, the particle shape and particle size distributions are known to play a crucial role in its macroscopic properties Sullivan2002 ; Santamarina2004 ; Cavarretta10 ; Guida2018 . Relative density and confining pressure are the most influent state variables for dry granular soils Been1991 and govern the mechanical behavior of the material to a large extent Rowe1962 ; Thornton2000 ; Andersen2013 .

The relationship between the particle size distribution, PSD, and the mechanical behavior is not yet fully understood. On one hand, the effects of variations in the PSD are not independent from those produced by variations of other intrinsic properties or state parameters. For example, the state parameter , proposed within the theoretical framework of the critical state of sands Been1991 , helps to distinguish between the contractive or dilatant behavior exhibited by a sand upon triaxial compression. However the critical state line, and hence the value of associated to given void ratio , changes with the PSD Jiang2018 . As another example, there is a complex interplay between size and shape polydispersity, as shown by numerical modeling Nguyen2015 . On the other hand, linking single quantities (maximum and minimum dry density, critical state void ratio, macroscopic friction angle, stiffness, etc.) to a PSD is not immediate, since the latter is a highly variable curve that is many times long-tailed and/or multi-modal. Descriptors derived from the PSD are not enough to anticipate macroscopic (void ratio, stiffness, friction angle) or microscopic features (average coordination number, fraction of non-contributing particles, etc.) obtained after a given process. Neither geotechnical descriptors, such as the (i.e., the sieve size passed by percent in weight of the sample), the coefficient of curvature or the uniformity coefficient

, nor statistical descriptors (mean, variance, skewness, kurtosis, etc.) enable satisfying estimations. There is not clear procedure to work directly with the whole PSD curve. Even in the case of very idealized systems (

e.g., packings of spheres) variations of the PSD may lead to considerably differences in the fabric resulting after a packing protocol Shaebani12 ; Wiacek2014 , in the relative density Guida2018 or in the shear strength Goncu2013 . In the case of non-idealized systems this can be even worse, as several kinds of physico-chemical phenomena occur on different length and time scales. Relationships between geotechnical descriptors obtained from the PSD and geotechnical properties have been sought (e.g.Wichtmann2013 ; Monkul2016 ; Xiao2016 ), but findings are always empirical and limited to a specific set of soils and stress paths.

The use of large datasets enables promising techniques to understand how the complex behavior of granular systems can be anticipated from the microscopic features. For example, the use machine learning techniques, together with complex network theory, has allowed for the establishment of relationships between the fabric of a packing and some macroscopic geotechnical properties, such as the permeability vanDerLinden2016 ; Kamrava2020 or the effective heat transfer coefficient Fei2020 . The use of artificial neural networks, or just Neural Networks (NN), has been proposed as a potentially useful technique to model materials behavior Ghaboussi1991 ; Pernot1991 . In the case of geotechnical applications, NNs have been used for unsaturated soils (to predict the shear strength Lee2003 , to model their mechanical behavior Johari2011 and to determine the effective stress parameter Ajdari2012 ), for fine-grain soils (to predict the compression index from granulometry and index properties Park2011 ), for rocks (to predict the uniaxial compressive strength and the elastic modulus Dehghan2010 ) and for coarse-grain soils –sands and gravels– to model the mechanical behavior  Ellis1995 ; Penumadu1999 ; Banimahd2005

. The inputs for these NN approaches included both intrinsic properties and state parameters. In some of these cases the target outputs were directly some model parameters (namely, the compression index 

Park2011 , the apparent cohesion Lee2003 , the effective stress parameter Ajdari2012 or the elastic modulus and unidimensional compression strength Dehghan2010 ). For all the other cases above mentioned (i.e.Ellis1995 ; Penumadu1999 ; Banimahd2005 ), the purpose of NNs was to reproduce the stress-strain curve by anticipating new values of stress or strains obtained when some others were changed in a controlled way. The datasets were the result of a limited number of laboratory experiments (around several tens to a few hundred). Only in Park2011 , the database included data from near 1 thousand experiments. In some cases, a single stress-strain curve measured in a laboratory experiment was used to gather the data.

In this research the role played by the PSD in the mechanical behavior of an idealized system of polydisperse spheres has been investigated by means of massive numerical testing with the DEM and NNs. This approach may shed light on the mechanical behavior of dry coarse-grain soils.

There are two considerable differences with respect to the previously referenced works. On one hand, the NN is built on a dataset that was the outcome of a series of more than

thousand virtual experiments, performed with samples of varying PSD. The set of PSDs is the outcome of a systematic exploration of possible cases lying within two particle sizes. The probability and size increments used during a discretization of the sample space determined the number of PSDs to explore. We simulated all the cases to have a sufficiently large data sample, to find out how the accuracy of the estimations depends on the size of the training dataset and to know what the Probability Distribution Functions, PDF, of the target outputs for the NN are. On the other hand, the simplicity of the systems (made of elastic and frictional spheres) made possible to focus directly on the effects of the PSD in the mechanical behavior.

The proposed approach is ab initio as phenomenological laws are not used (except that for the contact mechanics interaction). Neither intrinsic parameters that cannot be defined on the grain scale (such as maximum or minimum dry density, etc.) nor state parameters related to packing features (void ratio, average coordination number, etc.) were introduced. The mechanical features of particles and the packing and compression protocols have always been the same and the only difference from one case to another is the PSD. Albeit the simplicity of the systems, non-linear and stress dependent stress-strain curves were observed (showing the typical behavior of loose sands), with non-obvious variations from one case to another. The data were fitted to the celebrated Duncan-Chang hyperbolic model Duncan1970 , which is defined by two model parameters, namely, the tangent elastic modulus and the ultimate deviatoric stress .

Thus, the proposed NN receives as input a discrete description of the PSD of a granular material at hand, and returns as output and . As it will be illustrated below, the network is able to predict the Duncan-Chang model’s parameters with a high accuracy, extremely fast, and even in the presence of noisy training data. Indeed, it proved itself to be a powerful tool for unraveling the existing correlations between PSD of granular materials and their macroscopic mechanical behavior, hidden to the naked eye.

The rest of this paper is structured as follows: Initially, the discrete element method used for the generation of virtual triaxial experiments, as well as the considered PSDs and the obtained results are described in Section 2; secondly, in Section 3, we present the basic principles of artificial neural networks, together with the design of the networks used in this work and their training process; the results obtained with the NNs are presented and discussed in Section 4, as well as a study of the amount of required data to train them and their robustness with respect to noisy data; finally, conclusions are drawn in Section 5.

2 Massive DEM triaxial testing

The discrete element method Cundall1979 , DEM, has been proven to be a very effective tool for the study of the macroscopic mechanical behavior of granular materials under drained Thornton2000 ; ng2004triaxial ; kozicki2009numerical ; salot2009influence ; kozicki2014discrete ; Zhou2017 ; Xie2017 and undrained Gong12 ; Jiang2018b ; Salimi2020 triaxial or biaxial compression. In this work the DEM is used to perform virtual drained triaxial tests for a large number of sphere packings with different PSDs. In what follows we describe the model used for carrying out such simulations, as well as the obtained results.

2.1 Numerical setup

We performed DEM simulations of triaxial compression tests on samples made of particles following varying PSDs. The different PSDs used in each case were selected according to a systematic exploration described as follows: Particle diameters ranged between m and m; this interval was divided into equal size bins with , and . The central size of each bin is . The expected percentage in mass of the particles within each size bin is denoted as . We consider that is a discrete variable that can take values from to and spaced by . All possible combinations satisfying are considered. This procedure led to the cases of PSDs that were subsequently used in the triaxial tests.

Once all the PSDs were defined, a random sample of particles was generated for each of them. The mass of the particles was uniformly distributed in each bin. The considered set of PSDs includes very different kinds of granular systems: Monodisperse, well graded, gap-graded multimodal distributions, etc. A few of them, which could be more recognizable by readers, have been particularly considered for illustrative purposes. These special PSDs are labeled and shown in Fig. 


(a) Special particle size distributions
(b) Strain-stress virtual curves (markers) and Duncan-Chang model adjustment (continuous line)
Figure 1: special PSDs (out of ) were selected for illustrative purposes. The upper figures show the cumulated percentage passings. The figures below show the stress-strain curves obtained through virtual testing

For each PSD, a sample was generated by randomly locating a loose cloud of around spherical particles within a cubic box (Fig. 2(a)). We imposed periodic boundary conditions and then the cubic box was isotropically shrunk to achieve a dense packing under isotropic compression conditions kPa, where , and are the principal stresses (Fig. 2(b)). Then the stress was kept in 2 perpendicular directions ( and ), while the sample was shortened in the third perpendicular direction until reaching a unit strain (Fig. 2(c)).

(a) Initial state
(b) Isotropic compression state
(c) Final state
Figure 2: 3D Models of YADE-DEM illustrating the steps of numerical experiments: 1) A random loose cloud of around particles is located within a box; 2) the simulation box is reduced to achieve a packing that is in equilibrium under isotropic stress; and 3) the simulation box is reduced in one direction while the stress is maintained in 2 perpendicular directions.

The corresponding average stress was measured at several strain levels. The deviatoric stress-strain curve, vs , was registered and fitted to a Duncan-Chang hyperbolic model Duncan1970 , which is defined by 2 model parameters, namely, the tangent elastic modulus, and the ultimate deviatoric stress :


A few examples of generated curves (corresponding to the special PSDs) can be seen in Fig. 1(b).

2.2 Numerical model

We used the DEM implemented in YADE-DEM yade Particles behave as rigid solids that obey the laws of classical mechanics. The interaction between particles is produced through a soft contact model. In particular, we used a simple linear elastic and frictional contact law. This is a common choice in DEM simulation Cundall1979 ; Herrmann1998 . Normal forces between particles are thus computed as


where is the normal force exerted by particle on particle , is the distance overlap, and are the particles’ radii,

is their relative position vector,

is its associated unit vector, and is the normal contact stiffness. In this model, was related to the Young’s modulus of the material, GPa, as .

If two particles that were previously in contact (i.e., ) are displaced in a direction perpendicular to , an opposite shear force appears. Shear forces are limited by the inter-particle friction:


where is the shear force exerted by particle on particle , is the total tangential displacement of the contact, radians is the inter-particle friction angle and is the shear stiffness.

The density of particles kg/m (as the size of the particles and the stiffness) was scaled to reduce the collision time and therefore the critical timestep used in the explicit integration of the equations of motion. The maximum strain rate imposed during the triaxial compression was fixed according to this critical timestep and updated on the fly to speedup simulations. A numerical damping was used to dissipate the kinetic energy. Details can be found in yade .

Figure 3: Coefficient of variation of the parameters for the Duncan-Chang hyperbolic model as a function of the number of particles in the sample. Samples followed the uniform PSD in Fig.1(a). The experiment was repeated 15 times for each number of particles

2.3 Precision and performance

As the generated samples include a finite number of particles, the computed stress-strain curves for a single PSD may fluctuate around the expected values. Accordingly, the values of the Duncang-Chang model parameters obtained from a single DEM triaxial test, are only punctual estimations, , , which are generally different from the expected values, and . There are several reasons for this variability: The size of the particles used in each simulation is randomly chosen according to the PSD, particles are randomly located within the simulation box and the system is chaotic. In any case, the larger the sample, the smaller the fluctuation. The expected variability of measurements was assessed through a series of virtual triaxial tests. These tests were performed with samples made of varying number of particles but that always followed the same PSD (the uniform PSD in Fig. 1). The experiment was repeated times for each number of particles to gather a statistical sample of and values. A coefficient of variation was defined for each model parameter as (where

is the sample standard deviation,

is the sample mean and stands for measurement). Results are shown in Fig. 3. In order to achieve a good compromise between accuracy and computational cost, the size of samples was limited to around particles in the DEM experiments used to train the NN. With this number of particles the of and are expected to remain around and , respectively (see Fig. 3).

The numerical experiments were computed using the version 2018.02b of YADE-DEM yade , running on Ubuntu 18.04.4 64 bits, on a server machine with four processors Intel Xeon Gold 6148 GHz, with 20 physical cores each, and 1 TB of RAM memory. As a rough estimation, each single DEM simulation took on average 1 hour and 20 minutes on a single core. Therefore, the total computation time for processing the samples was around days. In order to speed up the process, many computer cores were used for running multiple independent simulations in parallel. Thus, the total process time was reduced to 4 and a half months of computation, approximately.

2.4 Virtual triaxial testing results

The samples were virtually subjected to triaxial compression. The corresponding stress-strain curves presented the typical behavior of loose sands. A good matching between each series of data and a Duncan-Chang hyperbolic curve was achieved. The values of obtained from DEM after a flat sampling over the set of PSDs, are distributed as shown in Fig. 4(a). The sample mean is , its standard deviation is and the maximum and minimum values are and , respectively. The coefficient of variation of this problem quantifies how the expected value of a specific PSD may separate from the mean value across all the PSDs. Regarding the tangent elastic modulus, the coefficient of variation is . With respect to the values of obtained from DEM, the distribution is shown in Fig. 4(b), the sample mean is , its standard deviation is () and the maximum and minimum values are and , respectively.

(a) Histogram of
(b) Histogram of
(c) vs. 
Figure 4: Histograms of and values obtained from virtual triaxial testing with the set of PSDs explored, and variation between both values. The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2

These results evidence that variations of the Duncan-Chang model parameters can be found depending on the PSD. Unfortunately, there is no sign of correlation between the two model parameters (see Fig. 4(c)). In addition, they neither correlate to the set of inspected statistical or geotechnical descriptors described in Table 1, as it can be observed in Figs. 5 and 6, respectively.

Descriptor Symbol Definition
Geotechnical descriptors
Uniformity coefficient
Coefficient of curvature
Statistical descriptors
Expected value
Standard deviation
Excess Kurtosis
Table 1: Set of descriptors used to relate the parameters of the Duncan-Chang model to the PSD.

For the 9 special PSDs included in Fig. 1, the values of the considered statistical and geotechnical descriptors are gathered in Table 2 and also shown in Figs. 5 and 6.

(a) vs. particles’ mean diameter
(b) vs. standard deviation
(c) vs. skewness
(d) vs. excess kurtosis
Figure 5: Variation of , obtained from virtual triaxial testing, compared to different statistical descriptors, namely the mean diameter of particles, the standard deviation, skewness and excess kurtosis of the PSD (see Table 1). The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2
(a) vs. curvature
(b) vs. uniformity
Figure 6: Variation of , obtained with virtual triaxial testing, with geotechnical descriptors of the PSD, namely the curvature and the uniformity (see Table 1). The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2
PSD Skew. Kurt.
Table 2: Values of , , obtained with virtual triaxial simulations, and other indicators for the cases in Fig. 1. The descriptors included in the table, correspond to the mean , standard deviation , skewness and excess kurtosis of the particle diameters; and the curvature and uniformity coefficients (geotechnical indicators). and . These values are presented graphically in Figs. 4, 5 and 6

In the light of these results, to establish relationships between PSD descriptors and Duncan Chang model parameters does not seem feasible.

3 Artificial neural networks

Artificial neural networks, or simply Neural Networks (NN), are biologically inspired computing systems able to learn from data. Data abundance, together with increasing computing power, are probably the two main factors behind the great success of these algorithms and their exponential growth during the last decade, despite the fact that their origin dates back to the early 40s of 20th century haykin1994neural . Artificial neural networks, together with other machine learning techniques, have been proven very successful tools for tackling tasks as image recognition, language processing or financial forecasting, to name just a few. Beyond doubt, machine learning in general, and neural networks in particular, are powerful tools for untangling complex patterns on large datasets.

Motivated by the apparent lack of correlation between PSD descriptors and the Duncan Chang model parameters evidenced in the previous section, in this work we present, as an accurate alternative, the use of NNs for inferring the macroscopic mechanical behavior of polydisperse granular packings. As it will be seen in the results presented in Section 4, this tool will help us to find hidden connections between the particle size distribution of spherical packings and their macroscopic mechanical behavior.

3.1 The multilayer perceptron

One of the most simple and commonly used NN architectures is the Multi-Layer Perceptron (MLP). The MLP can be seen as a non-linear function that maps input data to output data. It consists of several layers: One input layer, one or more (intermediate) hidden layers, and one output layer. The input information is feed-forwarded from the input layer, through all the intermediate layers, up to the output layer. Each layer is composed of one or more nodes (or neurons) that are the basic computational units (see Fig. 


Figure 7: Multilayer perceptron architecture

At each layer, the neurons are fed with the output generated by the neurons of the previous layer, they process the data, filter it through a non-linear activation function, and produce new output values that feed the neurons of the next layer (if any). The presence of non-linear activation functions grants NNs the ability of approximating non-trivial functions. Indeed, as stated by the universal approximation theorem

cybenko1989 , feed-forward NNs with a single (finite) hidden layer and differentiable activation functions, can approximate any continuous function; and in the case of two hidden layers or more, any function cybenko1988 .

Let us describe how a MLP generates output values from given input. Let be the number of layers in a NN, such that and , and let be the number or neurons of the -th layer, with , where the layer corresponds to the input layer and the -th layer is the output one. We denote as the input vector of the -th layer and, accordingly, is the output of that layer and the input of the next one. Thus, are the input values of the network and are the output ones. Starting from input vector , the values of layers to are computed through the recursive expression:


where and

are the weights matrix and bias vector,

is a matrix-vector product that results in a vector of length and

is the non-linear activation function. Among others, the Rectified Linear Unit (ReLU) function

nair2010rectified , defined as , is one of the most commonly used activation functions. In the case in which is a vector, as it is the case of Eq. (4), is applied to each vector component independently.

On the other hand, the coefficients of the weights matrix and the bias vector

are a collection of trainable parameters that describe the NN. Those parameters, initially unknown, are determined by means of a process known as training. The goal of the training is to find a set of values for those parameters that leads to an accurate input-output mapping of the network for the training dataset (a subset of the available input-output samples). Finding the locally optimal parameters is a minimization process of a (loss) function that measures the distance (in a certain norm) between the known output sample values and the ones predicted by the network. The mean squared error norm, used in this work, is one of the most commonly used loss functions. Such optimization is commonly carried out by means of gradient-based iterative algorithms, like the ones of the family of stochastic gradient descendent methods, as it is the case of Adam

kingma2014adam .

For an in-depth discussion of MLPs and other NN architectures we refer the interested reader to haykin1994neural ; Goodfellow-et-al-2016 .

3.2 NNs for predicting Duncan-Chang model’s parameters from PSDs

In this work we used MLP networks for predicting the parameters of the Duncan-Chang’s model from a discrete description of the particle size distribution of a given spherical packing. The definition, training and evaluation of the NNs was implemented using TensorFlow

tensorflow2015-whitepaper . Thus, as it can be seen in Fig. 7, the designed network receives as input the ten PSD related values , defined in Section 2.1, and returns as output and . Therefore, the input and output layers present 10 and 2 neurons, respectively.

The best network architecture (number of hidden layers and neurons) for the problem at hand is unknown a priori. Thus, in order to choose a good architecture, we systematically explored network configurations with different number of hidden layers and neurons per layer. In the results presented in Section 4, networks with 1, 2, 3 and 4 hidden layers and 8, 16, 32 or 64 neurons each (16 different architectures) were considered. For all of them, the ReLU activation function was used for all the layers, including the output one.

The available virtual triaxials dataset ( samples), was divided into three separated groups: the test dataset, composed of of the total samples available (); the cross-validation dataset, of the total samples (); and the test dataset, constituted by the remaining samples (). These three datasets were chosen randomly, nevertheless, they remain constant along the different analyses performed. Whereas the test dataset was used for training the NNs, the cross-validation dataset helped us to compare the networks’ performance and verifying the absence of undesired overfitting effects during the training process. Finally, the test dataset was used for measuring the accuracy of the chosen networks when predicting a series of cases that were not used during the training process.

The training process of all the networks was carried out using Adam kingma2014adam

with 1000 epochs (training iterations through the whole test dataset). And, in order to speedup the training process, the input and output data were previously normalized. Three different learning rates were considered for Adam, namely

. The network’s training is an inherently random process for two main reasons: Adam is by definition a stochastic algorithm in which the training samples are processed in a random order at each iteration; and the network’s weights are randomly initialized. Thus, in order to overcome these sources of randomness, each one of the 16 network architectures was trained 5 times for each learning rate.

After this training process, the network with the lowest loss function value for the cross-validation dataset was chosen. No large differences were observed among the different architectures, nevertheless, a NN with a single hidden layer and 32 neurons on that layer presented a slightly better performance (network NN1 in Table 3).

Name # hidden layers # neurons per h. layer
NN1 1 32 100
NN2 4 8 10
NN3 4 8 5
NN4 1 8 1
NN5 1 8 0.5
NN6 1 8 0.2
NN7 1 8 0.1
Table 3: Different neural networks architectures considered in this work. Each NN consists of a different number of (#) hidden layers and neurons per hidden layer, while the input and output layers have 10 and 2 neurons, respectively. Each network in the table was trained with a different number of samples ( denotes the % of the full test dataset)

4 Prediction of Duncan-Chang model parameters through neural networks

The ability of the NNs described in Section 3.2 to predict the values of and from PSD information, is discussed in this section. In order to assess the prediction ability we consider discrepancies between NN predictions and DEM measurements for the same PSD. The relative discrepancy for the model parameter ( or ) associated to a PSD is evaluated according to:


where is the DEM measurement and is the NN estimation.
In contrast to discrepancies, errors are defined with respect to the expected value of each model parameter, ( or ), associated to a PSD. However the expected values are usually unknown. They would be obtained with an infinitely large sample or by averaging over many random realizations of the triaxial test with packings following the same PSD.

As mentioned above, a randomly chosen test dataset of of the sample cases ( out of ) was used for testing the network’s accuracy. These data are new to the network, in the sense that they were used neither during the training nor the cross-validation processes. As presented below, different networks were trained using varying number of samples, nevertheless, the test dataset used for evaluating the networks’ performance was kept constant along all the cases considered.

4.1 Neural network ability to predict the Duncan-Chang model parameters

Let us consider the network NN1 (see Table 3), trained using the full test dataset (the 80% of the cases, see Section 3.2). For this specific NN the corresponding distributions of relative discrepancies within the test dataset are shown in Fig. 8.

(a) Relative discrepancy
(b) Relative discrepancy
Figure 8: Histogram of the relative discrepancies between NN and DEM in the estimation of Duncan-Chang model parameters when 80% of the experiments were used to train the network. Results obtained with the network NN1 (see Table 3)

The NN1 anticipated values with discrepancies that fluctuated around 0. The standard deviations were and .

For the sake of comparison, if the outcomes of the NN had been the sample means or random values, then the average discrepancies would have been considerably higher. We checked this by using random estimations that followed either the observed probability distribution functions, PDFs, of and (Figs. 4(a) and 4(b)), or that followed uniform distributions lying between and (or between and ). This is summarized in Table 4. These results evidence the ability of the NN to predict the model parameters.

NN1 Expected values Observed PDFs Uniform PDFs
Table 4: Comparison of the standard deviation of discrepancies with DEM for NN1 predictions and some random estimations

To correctly assess the accuracy of the NN, it is worth recalling that the data are pretty noisy (DEM measurements with a coefficient of variation for measurements of and , as seen in Section 2.3). It is also important to mention that the PDFs of and , when all PSDs are considered, are bell-shaped (cf. Figs. 4(a) and 4(b)) with and , respectively. Thus, despite the narrow margin left by the measurement precision and the distribution of values associated to this problem, the NN anticipated the Duncang-Chang model parameters from the PSD with the same accuracy than the precision of the DEM experiments with which it was trained.

This fact unveils the existence of hidden correlations between the PSD and the macroscopic mechanical behavior of granular materials, that are encoded in the NN, and, in the light of the results presented Section 2.4, seemed hidden. This result is even more interesting taking into account the fact that the DEM data used for training the NN are noisy, as discussed below in Section 4.3.

4.2 Neural network accuracy with respect to the size of the DEM training dataset

Once we knew the expected accuracy of the NN predictions, we progressively reduced the size of the training datasets. Along all the presented results, the test cases were always the same 20% subset of the total. Our goal was to estimate the number of DEM tests (out of the possible) that are needed to effectively train the NN without significantly compromising its accuracy. To assess the accuracy of a NN trained with a certain subset of the training dataset, we evaluated the network for the test dataset and computed the mean squared deviations, , where refers to the model parameter (either or ). The subscript denotes the percentage of the potential training cases (out of the possible) that were used in each training set. E.g., means that only samples of the test dataset were used to train the network. is defined as:


where is the number of cases used in the training set, i.e.:

Figure 9: Variation of the mean squared deviations between NN and DEM estimations, respect to the size of training dataset, for the parameters and . Dashed lines represent the mean squared deviation after the repetition of the most unlikely DEM simulations, whereas solid lines regard the first results. The neural networks used for each training dataset are described (see Table 3)

Figure 9 shows how the performance of the NN is barely affected by the size of the training dataset, even when this is drastically reduced. The networks used in Fig. 9 are defined in Table 3. With only of the potential cases (around DEM experiments), the network NN4 was able to predict the Duncang-Chang model parameters for the test dataset ( samples) with almost the same accuracy as NN1. Thus, we conclude that it is possible to train a NN that accurately predicts the Duncan-Chang parameters from PSDs by just using a dataset with less than one thousand DEM simulations. It is also important to remark that, to predict the model parameters for a new PSD would take more than 1 hour of computing time, using a DEM model analogous to the ones used in this work, whereas, using an already trained NN the time is in the order of the microseconds.

As it can be seen in Fig. 9, the networks’ accuracy dropped for networks that were trained with less than 1% of the potential training samples. For these small datasets, the accuracy of the NN prediction tended to the value obtained when the sample means of and are used as estimations.

4.3 Neural network robustness with respect to noisy DEM training data

As it can be observed in Fig. 8, despite the good agreement between NN and DEM predictions for most of the test cases, the discrepancies were relatively high in some of them. Using network NN1 (see Table 3), the maximum absolute discrepancies were and . Nevertheless, a high discrepancy just means that DEM and NN estimations do not agree, but does not necessarily imply that the NN prediction is wrong.

In order to determine whether these discrepancies were due to inability of the NN to predict the DEM estimation or to unlikely estimations of the model parameters from the DEM, we repeated the virtual triaxial testing in the cases with highest discrepancies. We considered the networks that were trained with 1%, 5%, 10% and 100% of potential training cases (networks NN1, NN2, NN3 and NN4 in Table 3). We identified the predictions with the highest deviation in and , for the networks NN2, NN3 and NN4, and the worst deviations for the network NN1. Many of them overlapped, so we finally selected around experiments to repeat. It is worth emphasizing that we did not repeat some of the cases to train the NN again in order to achieve a better matching with different data, the NNs remained unchanged.

After the repetition of these simulations, the relative discrepancies were considerably reduced in most of these cases (see Figs. 9 and 10). The standard deviation of the discrepancies over the whole test dataset were also reduced: went from to and went from to . This proves that for the repeated cases the first DEM measurement was very unlikely, whereas the NN prediction was much more accurate.

(a) Relative discrepancy
(b) Relative discrepancy
Figure 10: Discrepancies between NN predictions and first and second DEM measurements in the cases that had given the highest discrepancies when comparing the first DEM measurement to the NN predictions. Results obtained with the network NN1 (see Table 3)

This result does not come as a surprise: As already highlighted in some recent works (see, e.g., rolnick2017deep

), NNs are robust to a certain extent respect to mislabeled or noisy training data. In the context of this work, this can be understood based on the fact that the NN was trained using datasets that contain many test cases corresponding to PSDs that are very close to those being troublesome. Therefore the NN downplays the contribution of outliers. Thus, as it was done in this work, the trained NN can also be used as a tool for identifying unlikely estimations of the DEM.

In order to further support this claim, one of the cases with the highest discrepancy between the NN estimation and the DEM measurement (PSD case 59861, see Fig. 11(a)), was more thoroughly analyzed. This PSD was used to randomly generate new packings to be subjected to DEM triaxial compression. The stress-strain curves were fitted to Duncan-Chang model, generating statistical samples of and values for this single PSD. With such large samples we could estimate the expected values of the model parameters for PSD 59861 from the samples means. Focusing on the tangent stiffness , the obtained sample mean was , its standard deviation was () and the minimum and maximum values were and , respectively.

Figure 11(b) shows the histogram of for these 1000 triaxial tests. The value estimated in the first DEM test was . Therefore, the first DEM measurement provided very unlikely model parameters () and this is the reason why the discrepancy with NN estimation was so large. When the experiment was repeated for a second time, the new DEM estimation was , which is considerable closer to the expected value (). In contrast, the NN (which was trained from noisy data) predicted a tangent stiffness value of , which is really close to the expected value ().

(a) Particle size distribution
(b) histogram of DEM repetitions
Figure 11: The case 59861 showed one of the highest relative discrepancies between DEM and NN estimations. The packing generation and triaxial test were repeated 1000 times for that specific PSD. Its PSD and the histogram of predicted values are shown together with the NN estimation and first and second DEM measurements

5 Conclusions

We selected Particle Size Distributions, PSD, lying within two particle sizes. We performed virtual triaxial tests with the DEM on samples that followed these PSDs. We fitted the resulting stress-strain curves to Duncang-Chang hyperbolic models, gathering a statistical sample of the two model parameters, namely, and . We found variations of these parameters across the statistical sample that are not easily associated to the PSD. The parameters followed bell-shaped distributions. In the case of , and . In the case of , and .
Because of the finite number of particles used in each experiment (), the parameters obtained from a single DEM simulation may fluctuate to some extent from the expected values (with coefficients of variation for measurements of and ).
We compared DEM measurements to common statistical and geotechnical descriptors derived from the PSD but did not find any correlation. More precisely, we used the coefficient of uniformity, the coefficient of curvature, the mean size, the standard deviation, the skewness and the excess kurtosis. In contrast, by using a Neural Network, NN, trained with a dataset generated through DEM simulations, we were able to predict the expected model parameters for each experiment with high accuracy. The input for this NN was directly the PSD and the output was the model parameters. We tried several NN architectures. % of the dataset was used to test the ability of networks to anticipate the model parameters. The size of the training dataset varied between % and % of the remaining DEM experiments.

We observed that the maximum accuracy is similar to the precision of measurement. This precision is achieved with a training dataset of % of the potential training cases (about DEM simulations). This means that using NNs trained with less than one thousand triaxial experiments it is possible to predict accurately the macroscopic mechanical behavior of granular materials by just using their PSD.

We also observed that the largest discrepancies between NN predictions and DEM measurements occurred precisely when the DEM experiments led to unlikely values in the first simulation. Therefore the NN was also useful to identify unlikely DEM results. The key to achieve more accurate estimations seems to be the reduction of the data noise.

The PSD often affects the mechanical behavior of granular materials. There must exist relationships linking the mechanical behavior to the PSD that are hidden to the naked eye. Nor even using statistical or geotechnical descriptors that may quantify the PSD to some extent, relationships could be established. In contrast, neural networks were capable of finding those relationships. This research opens a way to address other problems (e.g., different stress-strain paths or sample preparation procedures), with the objective of better understanding the relationship between PSDs and the macroscopic behavior. The great advantage of the combination of DEM with NN is that we can know much more by simulating much less.


P. Antolin was partially supported by the European Research Council through the H2020 ERC Advanced Grant 2015 n.694515 CHANGE, and by the Swiss National Science Foundation through the project “Design-through-Analysis (of PDEs): the litmus test” n.40B2-0_187094 (BRIDGE Discovery 2019).


  • (1) C. Sullivan, J. D. Bray, M. F. Riemer, Influence of particle shape and surface friction variability on response of rod shaped particulate media, Journal of Engineering Mechanics 128 (11) (2002) 1182–1192. doi:10.1061/(ASCE)0733-9399(2002)128:11(1182).
  • (2) J. Santamarina, G. Cho, Soil behaviour: The role of particle shape, pp. 604–617. doi:10.1680/aigev1.32644.0035.
  • (3) I. Cavarretta, M. Coop, C. O’Sullivan, The influence of particle characteristics on the behaviour of coarse grained soils, Géotechnique 60 (6) (2010) 413–423. doi:10.1680/geot.2010.60.6.413.
  • (4) G. Guida, I. Einav, B. Marks, F. Casini, Linking micro grainsize polydispersity to macro porosity, International Journal of Solids and Structures 187 (2018) 75–84. doi:10.1016/j.ijsolstr.2018.11.032.
  • (5) K. Been, M. G. Jefferies, J. Hachey, The critical state of sands, Geotechnique 41 (3) (1991) 365–381. doi:10.1680/geot.1991.41.3.365.
  • (6) P. W. Rowe, G. I. Taylor, The stress-dilatancy relation for static equilibrium of an assembly of particles in contact, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 269 (1339) (1962) 500–527. doi:10.1098/rspa.1962.0193.
  • (7) C. Thornton, Numerical simulations of deviatoric shear deformation of granular media, Géotechnique 50 (1) (2000) 43–53. doi:10.1680/geot.2000.50.1.43.
  • (8) K. H. Andersen, K. Schjetne, Database of friction angles of sand and consolidation characteristics of sand, silt, and clay, Journal of Geotechnical and Geoenvironmental Engineering 139 (7) (2013) 1140–1155. doi:10.1061/(ASCE)GT.1943-5606.0000839.
  • (9) M. D. Jiang, Z. X. Yang, D. Barreto, Y. H. Xie, The influence of particle-size distribution on critical state behavior of spherical and non-spherical particle assemblies, Granular Matter 20 (4) (2018) 80. doi:10.1007/s10035-018-0850-x.
  • (10) D.-H. Nguyen, E. Azéma, P. Sornay, F. Radjai, Effects of shape and size polydispersity on strength properties of granular materials, Phys. Rev. E 91 (2015) 032203. doi:10.1103/PhysRevE.91.032203.
  • (11) M. R. Shaebani, M. Madadi, S. Luding, D. E. Wolf, Influence of polydispersity on micromechanics of granular materials, Phys. Rev. E 85 (2012) 011301. doi:10.1103/PhysRevE.85.011301.
  • (12) J. Wia̧cek, M. Molenda, Microstructure and micromechanics of polydisperse granular materials: Effect of the shape of particle size distribution, Powder Technology 268 (2014) 237 – 243. doi:10.1016/j.powtec.2014.08.020.
  • (13) F. Göncü, S. Luding, Effect of particle friction and polydispersity on the macroscopic stress–strain relations of granular materials, Acta Geotechnica 8 (2013) 629–643. doi:10.1007/s11440-013-0258-z.
  • (14) T. Wichtmann, T. Triantafyllidis, Effect of uniformity coefficient on g/gmax and damping ratio of uniform to well-graded quartz sands, Journal of Geotechnical and Geoenvironmental Engineering 139 (1) (2013) 59–72. doi:10.1061/(ASCE)GT.1943-5606.0000735.
  • (15) M. M. Monkul, E. Etminan, A. Şenol, Influence of coefficient of uniformity and base sand gradation on static liquefaction of loose sands with silt, Soil Dynamics and Earthquake Engineering 89 (2016) 185 – 197. doi:
  • (16) Testing and modeling of rockfill materials: A review, Journal of Rock Mechanics and Geotechnical Engineering 8 (3) (2016) 415 – 422. doi:
  • (17) J. H. van der Linden, G. A. Narsilio, A. Tordesillas, Machine learning framework for analysis of transport through complex networks in porous, granular media: A focus on permeability, Phys. Rev. E 94 (2016) 022904. doi:10.1103/PhysRevE.94.022904.
  • (18)

    S. Kamrava, P. Tahmasebi, M. Sahimi, Linking morphology of porous media to their macroscopic permeability by deep learning, Transport in Porous Media 131.

  • (19) W. Fei, G. A. Narsilio, J. H. van der Linden, M. M. Disfani, Network analysis of heat transfer in sphere packings, Powder Technology 362 (2020) 790 – 804. doi:
  • (20) J. Ghaboussi, J. H. Garrett, X. Wu, Knowledge-based modeling of material behavior with neural networks, Journal of Engineering Mechanics 117 (1) (1991) 132–153. doi:10.1061/(ASCE)0733-9399(1991)117:1(132).
  • (21) S. Pernot, C.-H. Lamarque, Application of neural networks to the modelling of some constitutive laws, Neural Networks 12 (2) (1999) 371 – 392. doi:10.1016/S0893-6080(98)00115-4.
  • (22) S. Lee, S. Lee, Y. Kim, An approach to estimate unsaturated shear strength using artificial neural network and hyperbolic formulation, Computers and Geotechnics 30 (6) (2003) 489 – 503. doi:10.1016/S0266-352X(03)00058-2.
  • (23)

    A. Johari, A. Javadi, G. Habibagahi, Modelling the mechanical behaviour of unsaturated soils using a genetic algorithm-based neural network, Computers and Geotechnics 38 (1) (2011) 2 – 13.

  • (24) M. Ajdari, G. Habibagahi, A. Ghahramani, Predicting effective stress parameter of unsaturated soils using neural networks, Computers and Geotechnics 40 (2012) 89 – 96. doi:10.1016/j.compgeo.2011.09.004.
  • (25) H. I. Park, S. R. Lee, Evaluation of the compression index of soils using an artificial neural network, Computers and Geotechnics 38 (4) (2011) 472 – 481. doi:10.1016/j.compgeo.2011.02.011.
  • (26) S. Dehghan, G. Sattari, S. C. Chelgani, M. Aliabadi, Prediction of uniaxial compressive strength and modulus of elasticity for travertine samples using regression and artificial neural networks, Mining Science and Technology (China) 20 (1) (2010) 41 – 46. doi:10.1016/S1674-5264(09)60158-7.
  • (27) G. W. Ellis, C. Yao, R. Zhao, D. Penumadu, Stress-strain modeling of sands using artificial neural networks, Journal of Geotechnical Engineering 121 (5) (1995) 429–435. doi:10.1061/(ASCE)0733-9410(1995)121:5(429).
  • (28) D. Penumadu, R. Zhao, Triaxial compression behavior of sand and gravel using artificial neural networks (ann), Computers and Geotechnics 24 (3) (1999) 207 – 230. doi:10.1016/S0266-352X(99)00002-6.
  • (29) M. Banimahd, S. Yasrobi, P.K.Woodward, Artificial neural network for stress–strain behavior of sandy soils: Knowledge based verification, Computers and Geotechnics 32 (5) (2005) 377 – 386. doi:10.1016/j.compgeo.2005.06.002.
  • (30) J. M. Duncan, C.-Y. Chang, Nonlinear analysis of stress and strain in soils, Journal of Soil Mechanics & Foundations Div 96 (5) (1970) 1629–1653.
  • (31) P. A. Cundall, O. D. L. Strack, A discrete numerical model for granular assemblies, Geotechnique 29 (1) (1979) 47–65. doi:10.1680/geot.1979.29.1.47.
  • (32) T.-T. Ng, Triaxial test simulations with discrete element method and hydrostatic boundaries, Journal of engineering mechanics 130 (10) (2004) 1188–1194. doi:10.1061/(ASCE)0733-9399(2004)130:10(1188).
  • (33) J. Kozicki, J. Tejchman, et al., Numerical simulations of triaxial test with sand using DEM, Archives of Hydro-Engineering and Environmental Mechanics 56 (3-4) (2009) 149–172.
  • (34) C. Salot, P. Gotteland, P. Villard, Influence of relative density on granular materials behavior: DEM simulations of triaxial tests, Granular matter 11 (4) (2009) 221–236. doi:10.1007/s10035-009-0138-2.
  • (35) J. Kozicki, J. Tejchman, H.-B. Mühlhaus, Discrete simulations of a triaxial compression test for sand by DEM, International Journal for Numerical and Analytical Methods in Geomechanics 38 (18) (2014) 1923–1952. doi:10.1002/nag.2285.
  • (36) W. Zhou, J. Liu, G. Ma, X. Chang, Three-dimensional DEM investigation of critical state and dilatancy behaviors of granular materials, Acta Geotechnica 12 (2017) 1107–1117 and 1125. doi:10.1007/s11440-017-0530-8.
  • (37) Y. Xie, Z. Yang, D. Barreto, M. Jiang, The influence of particle geometry and the intermediate stress ratio on the shear behavior of granular materials, Granular Matter 19 (2) (2017) 35. doi:10.1007/s10035-017-0723-8.
  • (38) G. Gong, C. Thornton, A. H. C. Chan, DEM simulations of undrained triaxial behavior of granular material, Journal of Engineering Mechanics 138 (6) (2012) 560–566. doi:10.1061/(ASCE)EM.1943-7889.0000366.
  • (39) M. Jiang, Z. Shen, W. Zhou, M. Arroyo, W. Zhang, Coupled CFD–DEM method for undrained biaxial shear test of methane hydrate bearing sediments, Granular Matter 20 (4) (2018) 63. doi:10.1007/s10035-018-0826-x.
  • (40) M. Salimi, A. Lashkari, Undrained true triaxial response of initially anisotropic particulate assemblies using CFM-DEM, Computers and Geotechnics 124 (2020) 103509. doi:
  • (41) V. Šmilauer, et al., Yade Documentation 2nd ed, The Yade Project, 2015, doi:10.5281/zenodo.34073.
  • (42) H. Herrmann, S. Luding, Modeling granular media on the computer, Continuum Mechanics and Thermodynamics 10 (4) (1998) 189–231. doi:10.1007/s001610050089.
  • (43) S. Haykin, Neural networks: a comprehensive foundation, Prentice Hall PTR, 1994.
  • (44)

    G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems 2 (4) (1989) 303–314.

  • (45) U. of Illinois at Urbana-Champaign. Center for Supercomputing Research, Development, G. Cybenko, Continuous valued neural networks with two hidden layers are sufficient, 1988.
  • (46)

    V. Nair, G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.

  • (47) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization (2014). arXiv:1412.6980.
  • (48) I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016,
  • (49) M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems, software available from (2015).
  • (50) D. Rolnick, A. Veit, S. Belongie, N. Shavit, Deep learning is robust to massive label noise (2017). arXiv:1705.10694.