1 Introduction
Many approaches are available to reliably model univariate distributions of realvalued variables. One can use various parametric distributions or employ flexible, nonparametric methods, e.g., by using empirical distribution functions, kernel density estimation (KDE), or quantile regression
[koenker1978regression]. Modeling distributions in higher dimensions is a much harder task. Parametric approaches using, e.g., Gaussian distributions, are common but inflexible. Many nonparametric approaches become infeasible in higher dimensions due to the curse of dimensionality
[hwang1994nonparametric].Recent successes in modeling multivariate distributions have been achieved with deep neural networkbased likelihoodfree implicit generative models [Mohamed2016]
, such as Generative Adversarial Networks
[Goodfellow2014](GANs) and Generative Moment Matching Networks (GMMNs)
[Dziugaite2015; Li2015].Copulas are a tool to decouple the modeling of the univariate marginal distributions from modeling the highdimensional joint dependency structure [Joe2014]. This allows to obtain highquality marginal models with the mentioned univariate techniques. Moreover, the (conditional) marginals can be tailored to the problem at hand and many fields have developed sophisticated domainspecific univariate models for this task, e.g., for probabilistic forecasts in finance [Bollerslev1986], weather [Raftery2005], or energy [Hong2016]. These models can then be combined with a suitable copula structure to enable the simulation of multivariate quantities, e.g., [Patton2012; Moller2013; Tastu2015]
. In the machine learning literature copulas have been used to increase the flexibility of Bayesian networks
[Elidan2010], to model multiagent coordination in reinforcement learning
[Wang2021], or for image generation via the Vine Copula Autoencoder
[Tagasovska2019]. Another popular application of copulas is synthetic tabular data generation [Patki2016; Meyer2021]. Note that in many of these applications, the main goal is to sample from a multivariate distribution and the copula is used as a building block for a generative model.Highdimensional copula distributions are commonly modeled via parametric structures such as the Gaussian or the student copula [demarta2005t]. [Ling2020] propose to enhance the subclass of Archimedean copulas by learning the generator functions via deep neural networks. However, the family of Archimedean copulas is of limited use in higher dimension as it makes restrictive symmetry assumptions. A more flexible, semiparametric stateofthe art approach is to use vine copulas which are built from trees of bivariate paircopulas [Aas2009]. These paircopulas can again be parametric, such as the Gumbel or Clayton copula, or nonparametric, e.g., by using bivariate kernel density estimation [Nagler2017].
Directly applying implicit generative modeling to the task of estimating copula distributions is not straightforward since copula distributions are required to have uniform marginals. Such an approach has been proposed by [Letizia2020; Hofert2021], but without ensuring the uniformity property, at least not for finite sample sizes when the model is not expected to exactly fit the true copula distribution. Ensuring the marginal uniformity of the learned copula distribution is important since deviations from uniformity will result in unwanted alterations of the marginal distributions in the data space.
In this paper we show how to design and train implicit generative copula (IGC) models to match the dependency structure of given data. A key challenge is to ensure the marginal uniformity of the estimated distribution. We achieve this through first learning a latent distribution with unspecified marginals but the same dependency structure as the training data. We then obtain the desired copula model by applying the probability integral transform componentwise to the latent distribution. The IGC model and the data distribution are matched in copula space using the energy distance [Szekely2004]. During training, the probability integral transform is approximated through a differentiable softrank layer which allows to use gradientbased methods.
Our contributions

We propose the first universal, nonparametric model for estimating highdimensional copula distributions with guaranteed uniformity of the marginal distributions.

We show how flexible likelihoodfree implicit generative models based on deep neural networks can be trained for this task through the use of a differentiable softrank layer.

Compared with the stateoftheart semiparametric vine copula approach, we demonstrate similar or improved performance for different tasks.
2 Proposed IGC Model
We introduce our IGC model following the schematic shown in Figure 1. Figure 2 provides an exemplary application for a bivariate twocomponent Gaussian mixture data distribution.
Copula Basics
The vectorvalued continuous random variable
represents the data source of our task. We denote its distribution byand the cumulative distribution function (cdf) of
by . Moreover, let be the cdf of the marginal distribution of , the th component of , for .Each sample vector can be mapped to a vector by defining as the value of with respect to the corresponding marginal cdf of , i.e., for all . This operation is called the probability integral transform (PIT). In the copula literature, values obtained by the PIT are called pseudoobservations [Joe2014]
. The random variable
follows a socalled copula distribution by construction, i.e., the distribution is defined on the unit space and its marginals are uniform. The joint cdf of is typically called the copula, which we denote by here. Sklar’s theorem states that such such a copula function exists for all random variables and it holds that , i.e., any multivariate distribution can be expressed in terms of its marginals and its copula [sklar1959fonctions].IGC Model
In this work, we aim at defining a flexible, nonparametric model for copula distributions
in high dimensions. Such a model can either be used to (approximately) represent the copula distribution of the unit space vectors or the distribution of the original data vectors , in which case the vectors have to be transformed componentwise with the inverse cdfs of the components , i.e., .A flexible model class for learning highdimensional probability distributions is the class of implicit generative models
[Mohamed2016]. Here, latent random variables with a simple and known distribution are transformed via a parameterizable mapping, typically a deep neural network, to a complex distribution that approximates the distribution of the training data. A straightforward application of this framework to model copula distributions, as done in [Hofert2021] and [Letizia2020], is difficult. The output of the generative model can be guaranteed to lie inby applying an appropriate output layer, e.g., using a sigmoid function. However, guaranteeing the desired uniformity of the marginal distributions is not straightforward. We have first experimented with additional training losses that penalize deviations from marginal uniformity. However, a simpler approach without any hyperparameters for weighting additional cost terms and with a guarantee of marginal uniformity is the following twostep procedure.
In the first step, we model the distribution of the vectorvalued latent random variable . To this end, we start with random variables
from a simple base distribution that we choose as zero mean, unit variance Gaussian here. The tuneable map
with parameters is then used to transform the noise samples to the latent samples as(1) 
We denote the resulting probability distribution of by and the marginal cdfs by .
In a second step, we transform the latent variables into unit space vectors . To this end, we define componentwise for
(2) 
This transformation step is analogous to going form to . It guarantees both that the sample values from the model lie in the unit interval
and that they are uniformly distributed, regardless of the distribution
. We refer to the distribution of as the model copula distribution and denote it by .Interpretation
Note that the distributions and might be very different. More precisely, the mapping (1) can result in arbitrary marginal distributions but an optimal model of the distribution would have
(3) 
Given a sufficiently rich function class for any distribution of can be modeled with small error [lu2020universal]. This statement naturally extends to universal approximation properties for the dependency structure of , i.e., the unit space random vectors . The approximation of the data copula distribution with our model copula distribution will thus be arbitrarily close if the generator function is flexible enough and if the optimal is selected.
3 Training Procedure
Problem statement
We aim at empirically estimating an IGC model from data, i.e., we want to determine the optimal parameter vector given a set of training samples from . The underlying target is to optimally match the copula model to the (typically unknown) copula distribution from which the samples were generated.
Parameter estimation
We first transform the given data samples into unit space vectors in , before starting the actual training procedure. Since we typically do not know the exact marginal cdfs of the components , we estimate them from the given data. To this end, we use the empirical cdfs separately for each component and define for and ,
(4) 
Here, is the indicator function of event , i.e., it is iff is true. Note that this is an unbiased and consistent estimator for the true cdf [vanderVaart2000].
We then aim at matching the copula distribution to the unit space vector samples . We base the inference of our likelihoodfree model on the energy distance [Szekely2004] between distributions and ,
(5) 
Here, and are independent copies of a random vector with distribution and , respectively. It holds that if and only if [Szekely2004].
In practice, we use a samplebased approximation of the energy distance. We draw samples from
and minimize the loss function
(6) 
Since is independent of , the second term of (5) does not need to be included. Note that the energy distance is an instance of the more general maximum mean discrepancy (MMD) measure [Gretton2012; Sejdinovic2013]. We also experimented with MMD losses based on the Gaussian kernel but found the energy distance to be more robust as it does not require to tune the kernel bandwidth.
To generate samples from our copula model , we first draw samples from the implicit generative model . To obtain vectors in unit space, we then again require the cdfs of the marginal distributions of the components of , . However, these functions are not known during model training and will likely change after each gradient step. Hence, we again use the empirical distribution functions to define for and ,
(7) 
This provides us with an unbiased estimate of the current cdf during training. However, the indicator operation in (
7) is not differentiable and hence will not allow the gradients to flow through this operation. Therefore, during training, we replace (7) with a softrank layer based on a scaled sigmoid function(8) 
where is a scaling constant [Qin2010]. For sufficiently large , (8) provides a close approximation to the empirical marginal cdfs at the current training step. A larger will result in a finer approximation of the true marginal cdfs but comes with the increased cost for computing the ranks which requires operations for all samples.
Sampling from the trained model
Once we have completed training and thus have fixed , we have also fixed the componentwise cdfs . Now, the operation to estimate does not need to be differentiable anymore and we can chose any available method to estimate the univariate marginal distributions based on samples from . We choose to simply draw a very large set of samples from and then use the empirical marginal cdfs, i.e., for each sampled value we store the corresponding value according to (4), . Given a new realization of , we obtain its approximate cdf value
by interpolation of the stored values. Other more compact approximations like univariate kernel density estimation would also be feasible.
In sum, we obtain a sample from at test time by first sampling a realization from , transform it via , and then apply the componentwise PIT approximation to obtain . Given estimates for the componentwise marginal cdfs of the data we can derive a sample in data space via .
4 Experiments
In the following we empirically demonstrate the capabilities of IGC models on a series of experiments on synthetic and real data with increasing complexity. Additional experiments can be found Appendix B.
Implementation
For all experiments except the image generation task, we use a fully connected neural network with two layers, 100 units per layer, ReLU activation functions, and train for 500 epochs. For the image generation experiment we use a three layer, fully connected neural network with 200 neurons in each layer, and train for 100 epochs. In all cases we train with a batch size of
and generate samples from the model per batch. The number of noise distributions is set as and . We use the Adam optimizer [Kingma2014] with default parameters.All experiments besides the training of the autoencoder models were carried out on a desktop PC with a Intel Core i77700 3.60Ghz CPU and 8GB RAM. For the training of the autoencoders we used Google Colab [Colab]. Training times for all experiments are in the range of a few minutes except for the image generation task. Details are provided in Appendix B
. We use tensorflow with the Keras API for the neural networks
[Tensorflow2015; Keras]. For copula modeling, we use pyvinecopulib [pyvinecopulib] in Python and the R package kdecopula [kdecopula]. Our code is available from https://github.com/TimCJanke/igc.Evaluation
We evaluate the models by comparing the distance between the learned and the true copula. More specifically, we use the integrated squared error (ISE) in unit space, i.e.,
(9) 
where is the joint cdf of , the true copula, and the joint cdf of the learned model. Since analytical integration is not possible, we approximate the integral with an empirical sum and use the data vectors for this purpose.
The copula function is not known for real data. Instead, we use the empirical cdf of the unit space data vectors in (9), i.e., for we set
(10) 
Due to the curse of dimensionality this approximation of the joint cdf will have a coarse structure in high dimensions. This is why we only use the values at the data vectors in (9).
For parametric copula models the distribution function is analytic. For the nonparametric models, including IGC, we again resort to approximation. We draw samples from the copula distribution model and use their empirical cdf in unit space in (9). We found that this procedure provided reliable and stable estimates for the underlying copula in dimensions .
4.1 Synthetic data
Learning bivariate parametric copulas
We first show that IGC models are able to emulate bivariate parametric copulas. We compare our IGC approach to two nonparametric copula density estimation techniques, kernel density estimation with beta kernels (BETA) [Charpentier2007] and the transformation local likelihood estimator (TLL) [Geenens2017]. These are stateofthe art nonparametric estimators for paircopulas [kdecopula]
. The compact support of the beta distribution on
is convenient for copula density estimation. The idea of the TLL approach is to first transform the data using the inverse normal CDF such that the data is supported on . Then the density is estimated via local regression using linear (TLL1) or cubic polynomials (TLL2) on a fixed grid of points. Finally the estimated density is transformed back to unit space. Additionally we report the performance of a parametric model. In this approach a copula family is selected from a set of copulas based on the Bayesian information criterion (BIC). This set comprises the Independence, Gaussian, Studentt, Clayton, Gumbel, Frank, Joe, BB1, and BB7 copula. Parameters are then estimated via maximum likelihood.We run experiments for data sets generated from a Studentt, a Gumbel, and a Clayton copula as well as the copula resulting from a twocomponent Gaussian mixture distribution as shown in Figure 2. For each one of these, we sampled a random parameter set and then generated 1000 training samples from the resulting model. We repeated each experiment 25 times. Appendix B contains the details on the considered parameter ranges and the exact sampling procedure.
Figure 2 shows the data and the model with intermediate steps for one instance of the Gaussian mixture case. The fitted copula matches the data distribution in unit space very well. Figure 3 presents aggregated ISE results for the different setups. The IGC model shows comparable performance scores to TLL1 and TLL2 for the Studentt, Clayton, and Gumbel copulas, and better performance than the BETA approach. For the more complex Gaussian mixture test case, IGC shows superior performance and a much lower variance than all baseline methods. Notably, the parametric approach with BICbased model selection results in large variance in accuracy for the Clayton and Gumbel data, most likely because the wrong copula family is selected in some cases.
Learning multivariate vine copulas
We now turn to the problem of estimating copulas in dimensions . To this end, we conduct a similar experiment as before using 5dimensional vine copulas as data generating distribution. For each of 25 repetitions, we first sample a random tree structure using pyvinecopulib’s RVineStructure.Simulate() method. Then, we randomly assign parametric paircopulas with random parameters to each edge in the tree. The considered families of bivariate copulas are Independence, Gaussian, Studentt, Clayton, Gumbel, Frank, Joe, BB1, and BB7. Further details are available in Appendix B. We generate 5000 samples from the resulting vine copula model and use these to train our IGC model as well as the benchmark models. As benchmarks, we use a vine copula model with paircopulas only and a vine copula model which can select all parametric copula families named above as well as the . Additionally, we estimate a parametric Gaussian copula. The parameters are estimated via maximum likelihood and the copula families are selected using the BIC. The selection of the vine structure is based on the Dissmann algorithm [Dissmann2013]. We evaluate all models with the ISE at 10000 data points sampled from the true model.
The results of the simulation study are presented in Figure 4
. The IGC model has the lowest mean ISE (0.0697) followed by the TLL2Vine model (0.2093), while the latter has a lower median ISE (0.0339 vs. 0.045). This is the result of some large outliers of the ISE for the vine models that do not occur for the IGC model. These are most likely caused by a poorly selected vine structure. Interestingly, the results specifically imply that the data is on average better modeled by the IGC model than by the vine approach although this model class entails the true data generating model. However, recovering the true model does not seem to be an easy task.
4.2 Real data
Exchange rates
Copulas are a popular tool in financial risk management as they can be used to estimate the distribution of the multivariate returns over different assets [Patton2012] under the assumption of a stationary copula. We consider a data set of size that contains 15 years of daily exchange rates between the USDollar and the Canadian Dollar, the Euro, the British Pound, the Swiss Franc, and the Japanese Yen. The data was obtained from the R package qrm_data. We preprocess the data to filter out the effects of temporal dependencies. To this end, we fit an AR(1)GARCH(1,1) [Bollerslev1986] process with Studentt innovations to the time series of the daily returns. We then obtain the standardized residuals from the ARGARCH models and transform these observations to the unit space using the empirical cdfs. Figure 4(a) shows a kernel density estimate for two selected dimensions of the resulting data set in unit space, namely the USDollar/Euro and USDollar/Pound exchange rates. While being nontrivial and multimodal, the distribution is strongly concentrated along the diagonal.
We then estimate a Gaussian copula, a vine copula, a vine copula with only TLL2 paircopulas, a GMMN as proposed by [Hofert2021]
, and our IGC model for this data. The GMMN model has the same loss function, architecture, and hyper parameters as the IGC model but uses a sigmoid activation in the final layer. To ensure a test data set of appropriate size, we use a 5fold crossvalidation scheme where 20% of the data is used for training and 80% for testing. For the IGC and the GMMN model we additionally use five different random initializations per fold for the neural network weights, i.e., we report means and standard deviations from 25 values for the IGC and the GMMN model and 5 values for the other methods. The results are presented in Table
1. The IGC model clearly outperforms the Gaussian baseline and also the GMMN model. However, the vine copula models show lower average ISEs. This might be due to the symmetric nature of the data which can be approximated well by Studentt and TLL paircopulas.exchange rates  magic  

Gauss  
Vine  
VineTLL2  
GMMN  
IGC (ours) 
MAGIC Gamma Telescopes data
In order to test our approach on a more complex dependency structure, we consider the MAGIC (Major Atmospheric Gammaray Imaging Cherenkov) Telescopes data set available from the UCI repository (https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope). This data was also used for benchmarking nonparametric copula estimation techniques in [Nagler2017]
. We only consider the observations classified as
gamma and the 5 variables fLength, Width, fConc, fM3Long, fM3Trans. The size of the data set is . We use the empirical cdfs to transform the observations to the unit space. Figure 4(b) showcases the dependency structure of the data for two selected data dimensions. The observed structures are highly asymmetric. We again fit the data with a Gaussian copula, a vine copula with all available paircopulas as above, a vine copula with TLL2 paircopulas, and a GMMN model with a sigmoid output layer [Hofert2021] as benchmarks and use the same 5fold CV strategy as before, i.e., in each fold we use 20% as training data, 80% as test data, and report the average ISE. The results are presented in Table 1. The IGC model achieves the lowest average ISE. Again the GMMN model shows a substantially larger mean ISE with a much larger standard deviation. As only TLL2 paircopulas were selected by the BIC criterion, the scores for both vine models are the same. The Gaussian copula is clearly not suited for such complex data as the average ISE is 5 times larger than for the other approaches.Copula Autoencoders
[Tagasovska2019] introduced the Vine Copula Autoencoder, a generative model which uses vine copulas for expost density estimation of the latent space of a trained autoencoder. In a first step, an autoencoder is trained on a data set to learn a low dimensional representation of the data. After training, the encoder part of the network is used to map the training data to the autoencoder’s latent space representation. Next, the univariate marginal distributions are estimated, e.g., by using the empirical cdfs or kernel density estimation, and the compressed data is mapped to unit space. After fitting a copula model to the observations in unit space, one can sample from the fitted copula model, apply the inverse marginal cdfs, and map the simulated data back to the image space using the decoder network of the autoencoder in order to generate new data samples.
image  latent  copula  

Indep  0.01912  0.00722  0.00373 
Gauss  0.00619  0.00209  0.00087 
Vine  0.00674  0.00131  0.00079 
GMMN  0.00392  0.00341  0.00073 
IGC  0.00426  0.00114  0.00069 
VAE  0.01316 
In the following, we present results for the FashionMNIST [FashionMNIST] data set. We train a convolutional autoencoder on the entire training data of 60000 samples with a latent space of dimension 25. Details on the architecture are found in Appendix B. After training the autoencoder, we estimate the empirical cdfs of the compressed data using the training data. See Figures 4(c) and 4(d) for exemplary visualizations of the resulting bivariate data densities in unit space. Notably, these distributions are more complex than the ones from the previous experiments. We fit this data with a Gaussian copula, a vine copula with TLL2 paircopulas, a GMMN with sigmoid output layer [Hofert2021], and an IGC model. We also test an independence copula, i.e., we assume independence over the latent space. Additionally we report results for a standard variational autoencoder (VAE) [Kingma2013] with the same architecture.
Exemplary samples from all models and the test set are presented in Figure 6. Sampling with the independence copula, i.e., assuming no dependency structure, leads to images with many artifacts. The Gaussian copula produces better images, but still produces some artifacts. Samples from the vine copula or the IGC model are comparable in quality. They show smaller details and few implausible artifacts. The images generated by the VAE are blurry and show much less variation in pixel intensity than the test data.
In 25 dimensions it is not feasible anymore to compute and compare empirical cdfs as required for ISE scoring. We thus resort to the MMD [Gretton2012] for the numerical evaluation since it is commonly used as a simple and robust measure for evaluating image generation models [Xu2018]. We generate 10000 images from each model and compare those to the 10000 test set images by computing the MMD with a Gaussian kernel. We use the bandwidths , , and
for the latent unit space, the latent data space, and the image space, respectively. The bandwidths were selected based on the median heuristic proposed in
[Gretton2012].The results are found in Table 2. We provide pvalues for these results using the test proposed by [Bounliphone2016] in Appendix B. The IGC and GMMN models perform better than all other models for the latent unit space distribution as well as the image space. Interestingly the GMMN model shows a relatively large MMD value for the latent data space. This could be caused by deviations from marginal uniformity of the learned copula distribution which alter the marginal distributions in the data space.
5 Conclusion
We have proposed the first fully nonparametric model framework for copulas in higher dimensions that guarantees uniformity of the marginal distributions. The proposed IGC approach, which is based on an implicit generative step and a differentiable ranking transformation, is structurally simple. Yet, if the generator class is sufficiently complex, any copula dependency structure can be modeled. The model can be well implemented with standard deep learning frameworks. For various data sets we have shown a modeling performance on par or above other stateoftheart approaches, especially, the vine copula approach.
IGC models should be further investigated in various ways. First, it would be straightforward to condition the model on external factors, either by including such values into the input of the generator network or by reparameterization of the noise distributions. This would allow to describe contextdependent changes of the dependency structure. Second, many more applications for copulas should be examined, e.g., synthetic tabular data generation [NEURIPS2019_254ed7d2]. Third, the ranking operation could probably be sped up from to using ideas from [Blondel2020]. Finally, the use of adversarial training schemes could also be investigated.
This work has been performed in the context of the LOEWE center emergenCITY.
References
Appendix A Algorithm
The algorithm for training an IGC model is described in Algorithm 1.
Appendix B Experiments
b.1 Additional experiments on toy data sets
We provide additional experiments on three toy data sets commonly used in the literature on deep generative models, the "Swiss Roll", the "Grid of Gaussians", and the "Ring of Gaussians". We test four copula based models: a Gaussian copula, a TLL2 copula, an GMMN with sigmoid output layer Hofert2021, and our IGC model. For these models we use a linear interpolation of the ECDF of the training set as models for the marginals of the data distribution. We additionally report results for two implicit generative models that directly model the data distribution, a GMMN Li2015, Dziugaite2015 and a GAN Goodfellow2014. Like the IGC model, the GMMN based models are trained by minimizing the energy distance. All neural network models use two layers, 100 neurons per layer, and are trained for 500 epochs. IGC and GMMN models are trained using the Adam optimizer with standard values. For the GAN we use a lower learning rate of and a lower momentum as the model did not converge with the standard settings. We evaluate the models using the average negative loglikelihood of a test set based on kernel density estimates. We repeat each experiment 10 times with different training and test sets of size 5000 and random initializations for the neural networks.
Table 3 presents the results. The GAN achieves the best result for the Swiss Roll data. However, the scores for the GAN show a much higher standard deviation than all other methods. The IGC model has the second lowest score overall and the lowest score of all copula based methods. For the ring of Gaussians, the TLL2 copula shows the lowest NLL, closely followed by the IGC and GMMN copula. Here the GAN shows the worst performance of all models. The Grid of Gaussians is a trivial test case for the copula based models as it is sufficient to sample from the marginal distributions with an independence copula. Both the GMMN and the GAN show substantially worse NLL values and fail at properly approximating the true data distribution as can be seen from the bottom row of Figure 7.
swiss roll  ring  grid  

Gaussian copula  6.10 (0.01)  5.72 (0.02)  4.47 (0.01) 
TLL2 copula  5.61 (0.01)  5.11 (0.01)  4.47 (0.01) 
GMMN copula  5.45 (0.09)  5.24 (0.01)  4.48 (0.01) 
IGC  5.26 (0.08)  5.19 (0.03)  4.47 (0.01) 
GMMN  5.76 (0.09)  5.40 (0.05)  6.27 (0.05) 
GAN  4.82 (0.40)  7.14 (0.52)  5.95 (0.26) 
b.2 Learning bivariate copulas
For the Studentt, Gumbel, and Clayton copulas we sample the parameters and rotations uniformly from the ranges given in Table 4.
Samples form the Gaussian mixture copula are generated by sampling times from one of the two components with equal probability and then transforming all samples to the unit space via the componentwise PIT. Samples for the component are drawn from the twodimensional Gaussian distribution , where is sampled from , is sampled from , is sampled from , and .
b.3 Learning multivariate vine copulas
The following table presents the parameter ranges of the paircopulas for the vine copula experiments. We used uniform sampling for selecting the paircopula family as well as the parameters and rotations.
rotation  

Independence       
Gaussian    
Studentt  
Clayton    
Gumbel    
Frank    
Joe    
BB1  
BB7 
b.4 Training times
Table 5 shows the average training times for the IGC model and a vine copula model with TLL2 paircopulas for the different data sets. Note that the IGC model for the FashionMNIST experiment is a threelayer neural network with 200 neurons per layer which is trained for 100 epochs while in all other cases a twolayer network with 100 neurons per layer was trained for 500 epochs. Timings are for a desktop PC with a Intel Core i77700 3.60Ghz CPU and 8GB RAM.
IGC  vine copula (TLL2)  

Learning bivariate copulas  49s  <1s  2  1000 
Learning vine copulas  131s  5s  5  5000 
Exchange rates  33s  4s  5  1169 
Magic  61s  4s  5  2466 
FashionMNIST AE  1472s  1743s  25  60000 
b.5 Autoencoder architecture
We used the following architecture for the autoencoder and the VAE:

Encoder:

Decoder:
denotes a convolutional layer with filters, a kernel of height and width
, and a stride with height and width
. denotes a deconvolutional layer. denotes a fully connected layer with neurons. and denote the sigmoid and ReLU activation functions anddenotes batch normalization layers. We use a padding of 2 to achieve an image resolution of
pixels and normalize the inputs to the range . The models are trained with the binary cross entropy as reconstruction loss for 100 epochs using the Adam optimizer with default parameters.b.6 Significance of FashionMNIST results
In the following tables we present the pvalues from the test proposed in Bounliphone2016 for the experiments on the FashionMNIST data. Values close to one/zero in the row indicate significantly better/worse performance compared to model in the column.
IGC  GMMN  Vine  Gauss  Indep  

IGC  0.5561  0.9871  0.9990  1.0000  
GMMN  0.4439  0.8645  0.9689  1.0000  
Vine  0.0129  0.1355  0.9510  1.0000  
Gauss  0.0010  0.0311  0.0490  1.0000  
Indep  0.0000  0.0000  0.0000  0.0000 
IGC  GMMN  Vine  Gauss  Indep  

IGC  1.0000  0.9812  1.0000  1.0000  
GMMN  0.0000  0.0000  0.0000  1.0000  
Vine  0.0188  1.0000  1.0000  1.0000  
Gauss  0.0000  1.0000  0.0000  1.0000  
Indep  0.0000  0.0000  0.0000  0.0000 
IGC  GMMN  Vine  Gauss  Indep  VAE  

IGC  0.1131  1.0000  1.0000  1.0000  1.0000  
GMMN  0.8869  1.0000  1.0000  1.0000  1.0000  
Vine  0.0000  0.0000  0.1152  1.0000  1.0000  
Gauss  0.0000  0.0000  0.8848  1.0000  1.0000  
Indep  0.0000  0.0000  0.0000  0.0000  0.0000  
VAE  0.0000  0.0000  0.0000  0.0000  1.0000 
Comments
There are no comments yet.