Deep learning is a subset of machine learning algorithms that construct the Deep Neural Networks (DNNs) to solve complex problems. Although it has achieved great success in various fields, such as speech recognition  and image classification , the internal logic of deep learning is still not convincingly explained and DNNs have been regarded as "black boxes" .
Based on an underlying premise that DNNs establish a complex probabilistic model [5, 6, 7, 8], numerous theories, such as the representation learning [9, 10, 11], the Information Bottleneck (IB) theory [12, 13, 14, 15], have been proposed to explore the working mechanism of deep learning. Though the proposed theories reveal some important properties of deep learning, such as hierarchy [9, 10] and sufficiency [12, 15], a fundamental problem is that the proposed theories cannot be directly validated by empirical experiments due to the fact that the distributions of the benchmark datasets, e.g., MNIST, are unknown. For example, hierarchy is an important property of DNNs, but we still cannot explicitly formulate the hierarchy property and directly validate it by empirical experiments.
To solve this problem, we propose a novel algorithm for generating a synthetic dataset obeying a Gaussian distribution based on the NIST 111https://www.nist.gov/srd/nist-special-database-19 dataset of handwritten digits by class. In particular, the synthetic dataset has the same characteristics as the benchmark dataset MNIST . Specifically, the synthetic dataset consists of 70,000 grayscale images in 10 classes (digits from 0 to 9). Each class has 6,000 training images and 1,000 testing images. Fig. 1 shows three synthetic images. Therefore, we can easily apply various DNNs on the synthetic dataset like MNIST. Since all the grayscale images are sampled from a known distribution, the synthetic dataset obeys the Gaussian distribution.
This paper is organized as follows. Section 2 describes the specific method for generating the synthetic dataset obeying a known Gaussian distribution and Section 3 shows that the synthetic dataset can be easily applied to most commonly used DNNs. Section 4 demonstrates that given the synthetic dataset, we can verify some important properties of deep learning, e.g., hierarchy, based on the recent proposed probabilistic explanation of hidden layers of DNNs [17, 18].
2 The method for generating the synthetic dataset
An underlying assumption of deep learning is that the given training dataset
is composed of i.i.d. samples from a joint distribution, where describes the prior knowledge of , describes the connection between and , and indicate the parameters of . Since we can easily formulate given , is the key of explicitly formulating .
Unlike previous works using a complex probabilistic model to formulate [19, 20] for a given dataset, we first generate a random dataset obeying a Gaussian distribution and then use the generated random dataset to construct a synthetic image based on the mask derived from a benchmark dataset. Since each data in the random dataset obeys , we can conclude that the synthetic image also obeys based on the spatial stationary property, i.e., .
More specifically, the method includes seven steps: (i) generating a random vectorby sampling the Gaussian distribution for constructing a synthetic image with dimension ; (ii) converting an image of the NIST dataset into a binary image; (iii) extracting the central part of the binary image and the dimension of the derived image is ; (iv) downsampling the derived image in the previous step to obtain a binary image with dimension ; (v) generating the mask of the binary digits image based on the Canny edge detection algorithm , and the mask indicates four parts of the binary image: outside, outside boundary, inside boundary and inside; (vi) deriving an ordered vector by sorting in the descending order and decomposing into four parts, i.e., , where corresponds to the outside, the inside boundary, the outside boundary, and the inside. (vii) generating a synthetic image by randomly placing each pixel in the four sub-vectors into a random position within the corresponding masks.
The method for generating synthetic image is summarized in Algorithm 1, and Fig. 3 visualizes the relationship between and their corresponding masks.
In this section, we demonstrate that the synthetic dataset can be easily used to DNNs. First, we design a simple but comprehensive Convolutional Neural Network (abbr. CNN1) for classifying the synthetic dataset. CNN1 has five hidden layers: two convolutional layers, two ReLU operator, and two max pooling layers. Table1 summarizes the architecture of CNN1.
We take 30 training epochs to train CNN1 for classifying the synthetic dataset, and the learning rate is 0.008. Fig.2 shows the performance of CNN1 on the synthetic dataset. We can see that CNN1 achieves zero training error after 20 training epochs, the testing error is also very small. Overall, we can conclude that the synthetic dataset can be applied to DNNs
|Maxpool + ReLU|
|Maxpool + ReLU|
R.V. is the random variable of the hidden layer(s).
In this work, we propose a novel method for generating a synthetic dataset. In contrast to the commonly used benchmark datasets with unknown distribution, the synthetic dataset has a explicit distribution, i.e., Gaussian distribution. In particular, it has the same characteristics of the benchmark dataset MNIST. As a result, we can easily apply Deep Neural Networks (DNNs) on the synthetic dataset.
-  Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep learning,” Nature, pp. 436–444, 2015.
-  G. E. Hinton, D. Li, Y. Dong, E. George, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition,” IEEE Signal Processing Magazine, vol. 29, pp. 82–97, 2012.
Krizhevsky, Sutskever A., and G. E. Hinton,
“Imagenet classification with deep convolutional neural networks,”in NeurIPS, 2012, vol. 25, pp. 1090–1098.
-  Guillaume Alain and Yoshua Bengio, “Understanding intermediate layers using linear classifier probes,” arXiv preprint arXiv:1610.01644, 2016.
-  Herbert Gish, “A probabilistic approach to the understanding and training of neural network classifiers,” in IEEE ICASSP, 1990, pp. 1361–1364.
-  Judea Pearl, “Theoretical impediments to machine learning with seven sparks from the causal revolution,” arXiv preprint arXiv:1801.04016, 2018.
-  M.D. Richard and R.P. Lippmann, Neural Computation, pp. 461–483, 1991.
-  G. Zhang, “Neural networks for classification: a survey,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 30, pp. 451–462, 2000.
-  Yoshua Bengio, Aaron Courville, and Pascal Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
-  Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.
-  Ankit Patel, Minh Nguyen, and Richard Baraniuk, “A probabilistic framework for deep learning,” in NeurIPS, 2016.
-  Alessandro Achille and Stefano Soatto, “Emergence of invariance and disentanglement in deep representations,” arXiv preprint arXiv:1706.01350, 2017.
-  Noga Zaslavsky Naftali Tishby, “Deep learning and the information bottleneck principle,” arXiv preprint arXiv:1503.02406, 2015.
-  Andrew Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Tracey, and David Cox, “On the information bottleneck theory of deep learning,” in ICLR, 2018.
-  Ravid Shwartz-Ziv and Naftali Tishby, “Opening the black box of deep neural networks via information,” arXiv preprint arXiv:1703.00810, 2017.
-  Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 11, no. 86, pp. 2278–2324, November 1998.
-  Xinjie Lan and Kenneth E. Barner, “From mrfs to cnns: A novel image restoration method,” in 52nd Annual Conference on Information Sciences and Systems (CISS), 2018, pp. 1–5.
Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet,
Zhizhong Su, Dalong Du, Chang Huang, and Philip Torr,
“Conditional random fields as recurrent neural networks,”in
International Conference on Computer Vision (ICCV), 2015, pp. 1529–1537.
-  E. P. Simoncelli, “Statistical models for images: Compression, restoration and synthesis,” in Proc 31st Asilomar Conf on Signals, Systems and Computers, November 1997, pp. 673–678.
-  Martin. J. Wainwright and Eero. P. Simoncelli, “Scale mixtures of gaussians and the statistics of natural images,” in NeurIPS, 2000, pp. 855–861.
-  Lijun Ding and Ardeshir Goshtasby, “On the canny edge detector,” Pattern Recognition, vol. 34, pp. 721–725, 2001.