Artificial intelligence (AI) is inspired by human cognitive abilities and computer vision tries to replicate, at least partially, the functionality of human visual perception. Robots and other artificial systems, on a practical level, need to be aware of the actual 3D physical structure of their surroundings to be able to move around without bumping into obstacles, to grasp, touch and recognize objects, and interact with the physical world.
Psychologists who study human visual perception agree that at some point the apparent structure of the physical world should be reflected in the perceptions that are formed in the human minds. Biederman , a perceptual psychologist, for example, proposed a theory called recognition-by-components, which states that a modest set of volumetric components called geons, can support most recognition tasks by humans. This idea to construct more complex structures from a small set of basic elements is very powerful and is the governing principle in a variety of scientific fields.
It has been also acknowledged quite early in the progress of computer vision that to mimic human visual perception, visual information must be at some point represented in terms of spatial or volumetric models since they can be directly linked to the actual 3D physical space. The search for appropriate models that could fit this role in computer vision was influenced heavily by Biederman’s theory and several options were put forward to act as geons. A particularly popular solution was introduced by Pentland  in the form of superquadrics. Superquadrics are volumetric primitives defined by a closed surface, which can be modeled using a small number of parameters, while still covering a wide variety of 3D shapes, such as ellipsoids, cylinders, parallelopipeds, and all shapes in-between. They can be further extended to represent highly complex 3D structures, as illustrated in Figure 1.
Several methods for the recovery of superquadrics from range images were already proposed in 80’s and 90’s [5, 6] and also included simultaneous segmentation of complex 3D shapes into simpler superquadric-like structures  as well as techniques for geon recognition from the superquadric representations [8, 9]. However, a wider application of these methods in computer vision was hindered by their computational complexity caused by the iterative nature of the superquadric-recovery procedures and partially by the difficulty of obtaining high-quality spatial 3D information (suitable for 3D model recovery) at that time.
With recent advancements in 3D sensing technology, computer vision and most importantly deep learning, it is today possible to devise straight forward solutions for tasks involving optimizations of highly non-linear objective functions that were considered extremely challenging only a few year ago. In line with these trends, we revisit the problem of superquadric recovery in this paper and introduce a deep learning solution for fitting superquadric models to range images. Specifically, we design a simple regressor based on a convolutional neural network (CNN) that is able to estimate the parameters of the superquadrics more accuratelly than existing state-of-the-art models and in a fraction of the time. We perform a series of experiments with synthetic, but also real-world images and show that using CNNs for superquadric recovery is a viable option that mitigates many of the shortcomings of earlier techniques. We note, however, that in this preliminary study we only approach a constrained superquadric-recovery problem, where we assume that only a single 3D shape is present in the input data and that the shape can be approximated with an unrotated superquadric model.
We make the following contributions in this paper:
We present a preliminary study on the use of deep learning models for the recovery of superquadric models from range images under the assumption that a single 3D shape is present in the input images and that the shape can be represented by an unrotated superquadric.
We introduce a simple CNN-based regressor capable of estimating the parameters of a superquadric model in a fast and efficient manner.
We benchmark the developed CNN regressor against a state-of-the-art method from the literature and report competitive performance in terms of prediction error as well as execution speed.
Ii Related Work
In this section we review relevant prior work. We first review existing approaches to the recovery of superquadrics and then discuss closely related literature on CNN models for processing of 3D data. The reader is referred to  and  for more extensive coverage of these topics.
Superquadric recovery. Pentland, who introduced supequadrics to computer vision, proposed to recover them from shading information derived from 2D intensity images . But this approached proved to be overly complicated and not successful in practice. Solina and Bajcsy used instead explicit 3D information in the form of range images [5, 6] which are a uniform 2D grid of 3D points as seen from a particular viewpoint. Solina and Bajcsy designed a fitting function that needed to be minimized to fit a superquadric model to the 3D points. Since this fitting function is highly non-linear, an iterative procedure could not be avoided for its minimization.
but still essentially relying on iterative methods of minimizing the fitting function. Instead of gradient least-squares minimization, other methods of minimization have also been tried, such as genetic algorithms. Several extensions of superquadrics were proposed in the literature [15, 16], however, the basic superquadric shape model and the recovery method of Solina and Bajcsy prevailed in most applications of superquadrics, in particular for path and grasp planning in robotics, for modelling and interpretation of medical images etc. Later, Leonardis and Solina expanded Solina and Bajcsy’s method to simultaneously deconstruct the input range image into several superquadrics, resulting in a perceptually relevant segmentation . Nevertheless, the procedure still relied on an iterative fitting procedure.
Different from the techniques outlined above, we explore in this study whether recovery techniques relying on contemporary machine learning models, i.e., CNNs, can be used to estimate the parameters of superquadric models without costly iterative optimizations. As we show later, the CNN-based solution described the next section is competitive when compared to state-of-the-art iterative recovery techniques in terms of parameter-prediciton error, but has a significant edge when it comes to processing speed.
CNN-based models for 3D visual data. CNNs have already been employed to process 3D visual data. A CNN regression approach was, for example, used in  for real-time 2D/3D registration which was, similarly to superquadric recovery, traditionally solved by iterative computation. CNNs have also been used to estimate face normals from 2D intensity images instead of standard shape-from-shading methods  or for fitting 3D morphable models to faces captured in unconstrained conditions [19, 20, 21, 22].
There has been work already on recovering volumetric models using deep neural networks [23, 24, 25, 26]. Wu et al. , for example, were building voxel representations of objects, called 3D shapenets, from range images and use CNNs to complete the shape, determine the next best view, and to recognize objects. Sharma et al.  extend shapenets into full convolutional volumetric auto encoders by estimating voxel occupancy grids. Grant et al.  use CNNs to predict volumes on previously unseen image data. Slabanja et al.  deal with superquadric recovery and segmentation of 3D point clouds.
There is ample evidence by current research that the marriage of 3D data and models relying on the CNN computational paradigm is promising, but only starting. In this preliminary study we add to this body of work by developing a CNN-based solution, which for a selected scene with a single 3D object returns a volumetric description in the form of parameters defining a superquadric model.
In this section we present our approach to the recovery of individual superquadrics with convolutional neural networks (CNNs). We start the section with the introduction of the superquadric-recovery problem and then elaborate on the CNN-model proposed in this work to estimate the parameters of superquadrics from range images.
Iii-a Problem formulation
Superquadrics represents volumetric shapes defined by the following implicit closed surface equation:
where , and determine the geometric center of the superquadric in , , , and represent the dimensions of the superquadric along each of the coordinate axes, and the parameters and determine the shape of the superquadric. By varying the values of and different 3D shapes can be generated, as illustrated in Figure 2 . The position of the superquadric in a reference coordinate system is defined by its geometric center and the size of the superquadric is determined by the scaling parameters , and . For a given point in space, it is possible to determine where it lies in relation to the shape defined by Eq. (1). If , then the point lies on the surface of the superquadric, if then lies inside of the superquadric, and if , lies outside the superquadric .
To account for potential rotations of the superquadrics with respect to the reference coordinate system, a rotation matrix with additional parameters if typically defined over the coordinates of the generated 3D shape. However, in this preliminary study we only consider non-rotated superquadrics and further assume that only a single (isolated) 3D shape is present in the input data. These assumptions make the recovery problem easier as ambiguities due to rotation in 3D are avoided and there is no need for prior segmentation of complex shapes into simpler superquadric-like building blocks. The recovery problem that we aim to address in this paper can, thus, be defined as a prediction task, where the goal is to estimate the parameters of an (individual/isolated) superquadric model given suitable input data (e.g., a range image):
where is a predictor that we want to learn from annotated training data.
Iii-B Model description
Existing solutions to the recovery of superquadric models typically involve iterative model fitting procedures, which are computationally expensive and often time-consuming . In this paper, we introduce a novel non-iterative approach that is able to recover the parameters of individual superquadric models from range images without costly iterative optimization techniques. Specifically, we formulate the superquadric-recovery problem as a regression task, where a convolutional neural network (CNN) is used to predict the parameters of the superquadric model from an input range image containing a single 3D shape, i.e.,
where defines the set of CNN parameters that need to be learned during training.
To built our regression model we design a CNN in line with architectural guidelines of the established VGG network . We use ideas from the VGG model because it has proven successful in a number of vision-oriented tasks (e.g., [28, 29, 30, 31]) and is straight forward to implement. As illustrated in Figure 3, our model consists of
real-valued outputs and a linear activation function. The output of the model, thus represents the location, dimension and shape parameters of the superquadric, i.e., the elements offrom Eq. (3). Given the known value range of every parameter, we scale them all to the range before passing the parameters to the model as training targets, so that all outputs follow a similar distribution. To get correct results, we re-scale the model’s outputs accordingly at test-time.
Similarly to the VGG model, we gradually decrease the spatial size of the data along the model layers and simultaneously increase the number of channels in the intermediate representations. Due to the particularities of our regression problem, we make a number of additional design choices, i.e.:
The initial convolutional layer in our model has a stride of and filters of size to ensure receptive field coverage. This is because the input range images contain relatively few high-frequency details compared to more general natural images. We, therefore, do not need to process the range data at the full input resolution, and accordingly design the architecture of our model to downsample the intermediate representations more rapidly.
Batch normalization layers are included after every convolutional layer. Based on our preliminary experiments, this significantly reduces overfitting and allows the model to better generalize to unseen images.
Strided convolutions are used instead of the common max-pooling operations for downsampling. As we observed during development, this (slightly) improves the computational efficiency of the model, but does not degrade performance on the regression task.
Iii-C Training objective
We use an -norm-based error between the predicted and ground truth superquadric parameters as the training objective for our CNN regressor:
where represents the ground truth parameters, represents the predictions of the CNN model, i.e., , is the CNN model with parameters , and is the input range image.
To learn the CNN regressor we use the ADAM minibatch stochastic gradient descent optimization algorithm and minimize the loss over the available training data to find the parameters of the CNN regressor,.
In this section we present experiments aimed at analyzing the performance of the proposed CNN regressor. We start the section with a description of the experimental dataset and model training procedure and then proceed to the results and corresponding discussion.
Iv-a Dataset, experimental setup and performance metrics
Experimental dataset and setup. To train and evaluate the CNN regressor presented in the previous section, we generate a synthetic dataset of 3D shapes and render them in the form of range images. Generating synthetic superquadrics allows us to create considerable amounts of data quickly and ensures that each computer generated image has a corresponding ground truth (i.e., superquadric parameters) needed for both, training and testing.
We generate the data in a controlled manner using a custom rendering tool, where superquadric models with arbitrary parameters can be created. Each range image in the dataset contains a single superquadric inside a grid, where the first two dimensions encode the image width and height and the last dimension encodes the depth, resulting in a 3D range image. Higher pixel values correspond to closer ranges, while pixels with zero values correspond to the background. The renderer accepts a total of arguments: positional parameters (, and coordinates of the geometric centre), shape parameters ( and ), dimension parameters (, and ) and rotational parameters (elements of a rotation matrix). The rotational parameters are not used in this work, as mentioned earlier.
The parameters supplied to the renderer are constrained within specific ranges, as the generated superquadrics need to reside inside a grid. The parameters defining the position and size of the superquadrics can, therefore, only take values from the interval and the parameters determining the shape of the superquadrics can only take values from the interval . However, in practice we also want the entire 3D shape to be visible and therefore use somehow narrow parameter ranges when generating our dataset. Specifically, we define the geometric centre of the 3D shape in each image by drawing coordinates independently from the distribution , the dimensions (size) of the superquadric for each of the axes by sampling from the distribution and the shape parameters, and , drawing values independently from the distribution . As shown in the Figure 2, this distribution of parameters results in a diverse set of shapes that include cuboids, ellipsoids and cylindrical shapes. These shapes and the corresponding ground truth parameters can then be used for training and testing of the proposed CNN regressor.
We generate a total of range images for training and range images for the performance assessment. Note that all generated images are rendered in an isometric projection, so that multiple sides of the superquadrics are always visible in every image.
Performance metrics. To evaluate the performance of our CNN model, we report the Mean Absolute Error (MAE) for each of the estimated superquadric parameters in our experiments:
where and are the ground truth and predicted parameter values for the -th image in the test set, respectively, and denotes the total number of test images used in the experiments.
Iv-B Model training
As indicated earlier, we use a training set of synthetically rendered superquadric range images to learn the parameters of the proposed CNN regression model. We split the training dataset into a set of images that are used for the actual learning procedure and a set of
validation images that are used to observe the generalization abilities of the model during training. We initialize the parameters of the CNN regression model using the uniform distribution method proposed by, and train it using the Adam  stochastic gradient optimization algorithm. We use a batch size of , and an initial learning rate of , which is successively reduced by a factor of
at the end of epochsand . We use the MSE loss over the validation dataset as our stopping criterion, with a patience factor of epochs. Since we have sufficient training data available, we do not perform any data augmentation. The model converges after around epochs. The training was done on a NVIDIA GTX Titan XP GPU, and took approximately hours to complete.
Once trained, the model takes a range image as input and returns the 8-dimensional vector of superquadric parameters at the output.
Iv-C Results and discussion
We now present results of the experimental evaluation of the proposed CNN model.
Model evaluation. In the first series of experiments, we assess the performance of the proposed CNN regressor using the generated test images from our synthetic dataset. For comparison purposes we also include results for the state-of-the-art iterative approach to superquadric recovery from Solina and Bajcsy . The results of the assessment are presented in Table I in the form of MAE scores for each of the superquadric parameters considered and average processing times required to process a single input image - computed over test images.
|Approach||Dimensions [0-256]||Position [0-256]||Shape [0-1]||Processing time|
|CNN regressor (ours)||ms|
|Solina and Bajcsy ||ms|
As can be seen, the average MAE scores for the proposed CNN-based approach are relatively small for all parameters compared to the possible range of parameter values. The estimates of the dimension (or scale) parameters , and , for example, all have a mean absolute error of around , which accounts for approximately of the available parameter range. The predicted positional parameters , and displays a more inconsistent behavior with MAE scores ranging from the smallest of for to largest of for . Interestingly, while the and coordinates of the geometric center of the superquadric exhibit a similar MAE value, the estimate of the coordinate exhibits an error twice as large. Nonetheless, even considering the largest of the errors on the positional parameters, the center of the superquadric is still estimated incorrectly by less than voxels in the grid on average. Similar results are also observed for the shape parameters and with average absolute errors below of the available parameter range.
When comparing the results of the proposed CNN regressor to the state-of-the-art method from Solina and Bajcsy, we see a significant improvement in performance with most of the parameters. Our method improves the prediction of dimensional parameters by and the prediction of shape parameters by relative to the corresponding parameter ranges on average. The positional parameters for the iterative methods have comparable results to our CNN regressor and share the same irregular behavior, with prediction errors for the parameter being twice as large as the errors for and . In absolute terms, the CNN regressor is able to reduce the prediction error by around one order of magnitude for the dimensional (, and ) and shape (,) parameters and produces comparable positional parameters (, and ).
Another major contribution are significantly faster processing times. Our approach requires ms on average to process a single input image, whereas the iterative method from Solina and Bajcsy needs ms. Thus, our model is able to achieve state-of-the-art prediction performance, but ensures a speed up of more than . For a fair interpretation of the results, it needs to be noted that our CNN regressor is able to make use of GPUs during processing, while the competing method needs to run on a CPU (Intel Core i7-8550U in our case) due to its iterative nature.
Model analysis. The results reported in the previous section show only a partial picture of the performance of the proposed CNN regressor. Since the synthetic images in our dataset have different characteristics it makes sense to evaluate how the prediction errors are distributed across the test images. To this end, we show in Figure 5
error probability density functions for each of thepredicted parameters. Note that the graphs show the distribution for the prediction errors and not for the absolute errors, so the errors may also be negative.
We observe that all distributions have a close-to-Gaussian shape with most of the mass located close to the mean. This observation suggests that even for difficult images, the parameter estimates produced by our CNN regressor are reasonable approximations of the true values.
Another important observation we can make from the presented densities is related to model bias. Since our CNN regressor is a statistical model, examining the model bias is useful for determining the most likely parameter values the regressor will predict. If we look at the mean error values marked by the vertical dotted line in each graph, we observe that for the shape parameters, and , the model exhibits a small bias toward positive errors on average, which means that an object, rendered from the predicted parameters will be slightly more rounded compared to the original object. Among the coordinates of the geometric center of the superquadric predicted by our CNN regressor, the coordinate exhibits a somewhat larger bias than the and coordinates. The error for the coordinate is biased toward positive prediction errors on average, which suggests that an object, generated from the predicted parameters would appear closer in the scene in relation to the original. This observation could be a case against the usage of classic convolutional filters, which appear to be spatially aware in the and directions, but lack depth perception on the
coordinate axis. This observation is also supported by the significantly larger standard deviation of the prediction error for thecoordinate, which is in comparison to the standard deviations of and for the and coordinate, respectively. The dimensional/scale parameters and show little bias with an average prediction error close to , the scaling factor along the axis, on the other hand, exhibits a small bias toward negative values, suggesting that the rendered superquadrics would be slightly smaller than the original 3D shapes along the depth dimension on average.
Performance with real data. To evaluate our CNN regressor on real-world data, we collect a small dataset of 3D shapes using an Artec MHT 3D scanner. The scanner is designed to capture clouds of 3D points and a wide spectrum of colors (up to 24 bpp). Capturing both color information of the object’s appearance and its geometry results in textured high-quality 3D models, which can then be processed by a number of 3D software packages.
We use Artec Studio to manipulate the point cloud data and create a mesh that defines a 3D shape which closely approximates the original object. The point clouds are cleaned using an outlier filter to remove the noise generated by the 3D scanner. We also manually remove all background points during preprocessing, so only the object remains in the processed data. Once the preprocessing is completed, we transform the meshes into range images using a mesh manipulation program, called Meshlab. The objects are again rendered in an isometric projection with fixed rotations and the range images are scaled to a size of. We scan a total of distinct objects and show the RGB scans of the objects on the left side of Figure 6.
The generated range images are fed to the CNN regressor, which then produces estimates of the superquadric parameters for each of the input images. We evaluate the results of this experiment in a qualitative manner and show examples of the recovered 3D shapes in Figure 6. Here, the first image in each row shows the scanned object in the form of an RGB image, the second image shows the corresponding range image that represents the input to the CNN regressor, the third image shows the recovered superquadric shape and the last image shows the absolute difference between the input range image and its superquadric representation. We observe that the recovery of superquadric parameters is relatively successful. All of the objects are mostly covered by the generated superquadric, while for most cases the estimated superquadric completely encloses the scanned objects. Even though the CNN regressor was trained on clean data without any additional augmentations, it works considerably well on images where some noise is introduced around object’s contour during the scanning process. The model seems to have no issues with objects, where range data is sparse due to reflections. As said, the general shape is fitted successfully, but we can also observe some more complex interactions. For example, both objects in second and fifth row could be approximated trivially using a cylinder shape. Our regressor, however, detects the slight difference in edge roundness and adjusts the shape parameters accordingly.
In this preliminary study we introduced a novel CNN-based approach to superquadric recovery from range images. We addressed a constrained recovery problem where a single 3D shape to be modeled by a superquadric was assumed to be present in the data and the superquadric models without rotation were considered. We showed that the proposed CNN model was able to outperform existing state-of-the-art superquadric recovery models in terms of parameter prediction accuracy, but without iterative fitting procedures and at the fraction of the time. To the best of our knowledge, this work was the first to introduce a CNN-based superquadric recovery model and show that this line of research has considerable potential. As part of our future work, we plan to extend our model to more general problems involving rotated superquadrics, where additional ambiguities are introduced into the recovery problem (superquadrics with the same appearance may have different parameters), and complex 3D shapes where a segmentation step is required before the superquadric models can be recovered.
This research was supported in parts by the ARRS (Slovenian Research Agency) Project J2-9228 “A neural network solution to segmentation and recovery of superquadric models from 3D image data”, ARRS Research Program P2-0250 (B) “Metrology and Biometric Systems” and the ARRS Research Program P2-0214 (A) “Computer Vision”.
-  I. Biederman, “Recognition-by-components: a theory of human image understanding.” Psychological review, vol. 94, no. 2, p. 115, 1987.
-  A. Jaklič, M. Erič, I. Mihajlović, Ž. Stopinšek, and F. Solina, “Volumetric models from 3D point clouds: The case study of sarcophagi cargo from a 2nd/3rd century AD Roman shipwreck near Sutivan on island Brač, Croatia,” Journal of Archaeological Science, vol. 62, no. 10, pp. 143–152, 2015.
-  Ž. Stopinšek and F. Solina, “3D modeliranje podvodnih posnetkov,” in SI robotika, M. Munih, Ed. Slovenska matica, 2017, pp. 103–114.
-  A. P. Pentland, “Perceptual organization and the representation of natural form,” Artificial Intelligence, vol. 28, no. 3, pp. 293–331, 1986.
-  R. Bajcsy and F. Solina, “Three dimensional object representation revisited,” in International Conference on Computer Vision (ICCV), 1987, pp. 231–240.
F. Solina and R. Bajcsy, “Recovery of parametric models from range images: The case for superquadrics with global deformations,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 12, no. 2, pp. 131–147, 1990.
-  A. Leonardis, A. Jaklič, and F. Solina, “Superquadrics for segmenting and modeling range data,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 19, no. 11, pp. 1289–1295, 1997.
-  N. S. Raja and A. K. Jain, “Recognizing geons from superquadrics fitted to range data,” Image and vision computing (IVC), vol. 10, no. 3, pp. 179–190, 1992.
-  J. Krivic and F. Solina, “Part-level object recognition using superquadrics,” Computer Vision and Image Understanding (CVIU), vol. 95, no. 1, pp. 105–126, 2004.
-  A. Jaklič, A. Leonardis, and F. Solina, Segmentation and recovery of superquadrics. Kluwer, 2000.
-  A. Ioannidou, E. Chatzilari, S. Nikolopoulos, and I. Kompatsiaris, “Deep learning advances in computer vision with 3d data: A survey,” ACM Computing Surveys (CSUR), vol. 50, no. 2, p. 20, 2017.
-  T. E. Boult and A. D. Gross, “Recovery of superquadrics from depth information,” in Workshop on Spatial Reasoning and Multi-Sensor Fusion, 1987, pp. 128–137.
-  K. Duncan, S. Sarkar, R. Alqasemi, and R. Dubey, “Multi-scale superquadric fitting for efficient shape and pose recovery of unknown objects,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2013, pp. 4238–4243.
-  S. Voisin, M. A. Abidi, S. Foufou, and F. Truchetet, “Genetic algorithms for 3d reconstruction with supershapes,” in International Conference on Image Processing (ICIP). IEEE, 2009, pp. 529–532.
-  D. Terzopoulos and D. Metaxas, “Dynamic 3d models with local and global deformations: Deformable superquadrics,” in International Conference on Computer Vision (ICCV). IEEE, 1990, pp. 606–615.
-  A. J. Hanson, “Hyperquadrics: smoothly deformable shapes with convex polyhedral bounds,” Computer vision, graphics, and image processing, vol. 44, no. 2, pp. 191–210, 1988.
-  S. Miao, Z. J. Wang, and R. Liao, “A CNN Regression Approach for Real-Time 2D/3D Registration,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1352–1363, May 2016.
G. Trigeorgis, P. Snape, I. Kokkinos, and S. Zafeiriou, “Face normals
“in-the-wild” using fully convolutional networks,” in
Computer Vision and Pattern Recognition (CVPR). IEEE, 2017, pp. 38–47.
-  A. Garcia-Garcia, F. Gomez-Donoso, J. Garcia-Rodriguez, S. Orts-Escolano, M. Cazorla, and J. Azorin-Lopez, “PointNet: A 3D Convolutional Neural Network for real-time object class recognition,” in International Joint Conference on Neural Networks (IJCNN), July 2016, pp. 1578–1584.
-  R. A. Güler, G. Trigeorgis, E. Antonakos, P. Snape, S. Zafeiriou, and I. Kokkinos, “Densereg: Fully convolutional dense shape regression in-the-wild,” in Computer Vision and Pattern Recognition (CVPR), vol. 2, no. 3, 2017.
-  P. Dou, S. K. Shah, and I. A. Kakadiaris, “End-to-end 3D face reconstruction with deep neural networks,” in Computer Vision and Pattern Recognition (CVPR), vol. 5, 2017.
-  A. S. Jackson, A. Bulat, V. Argyriou, and G. Tzimiropoulos, “Large pose 3D face reconstruction from a single image via direct volumetric cnn regression,” in International Conference on Computer Vision (ICCV). IEEE, 2017, pp. 1031–1039.
-  A. Sharma, O. Grau, and M. Fritz, “VConv-DAE: Deep Volumetric Shape Learning Without Object Labels,” CoRR, vol. abs/1604.03755, 2016. [Online]. Available: http://arxiv.org/abs/1604.03755
-  Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D shapenets: A deep representation for volumetric shapes,” in Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1912–1920.
-  E. Grant, P. Kohli, and M. van Gerven, “Deep disentangled representations for volumetric reconstruction,” in ECCV Workshops, 2016.
-  J. Slabanja, B. Meden, P. Peer, A. Jaklič, and F. Solina, “Segmentation and reconstruction of 3d models from a point cloud with deep neural networks,” in International Conference on Information and Communication Technology Convergence (ICTC), October 2018, pp. 118–123.
O. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” inBritich Machine Vision Conference (BMVC), vol. 1, no. 3, 2015, p. 6.
-  Ž. Emeršič, L. L. Gabriel, V. Štruc, and P. Peer, “Convolutional encoder–decoder networks for pixel-wise ear detection and segmentation,” IET Biometrics, vol. 7, no. 3, pp. 175–184, 2018.
-  Ž. Emeršič, D. Štepec, V. Štruc, and P. Peer, “Training convolutional neural networks with limited training data for ear recognition in the wild,” in International Conference on Automatic Face & Gesture Recognition (FG). IEEE, 2017, pp. 987–994.
-  K. Grm, V. Štruc, A. Artiges, M. Caron, and H. K. Ekenel, “Strengths and weaknesses of deep learning models for face recognition against image degradations,” IET Biometrics, vol. 7, no. 1, pp. 81–89, 2017.
-  K. Grm, S. Dobrišek, W. J. Scheirer, and V. Štruc, “Face hallucination using cascaded super-resolution and identity priors,” arXiv preprint arXiv:1805.10938, 2018.
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” inInternational Conference on Computer Vision (ICCV), December 2015.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.