1 Introduction
Depth information acquired by lowcost depth cameras is typically prone to severe errors and degradations. This low image quality limits the performance of depthbased computer vision algorithms, and challenges most image enhancement methods. In this work, we aim to enhance these depth images and bring them closer to the output of highquality depth cameras. We focus on enhancing realworld depth images, as produced for example by the Intel Realsense R200 (see Figure (1), left). Due to its small size and low operating power, this camera suffers from substantial noise and artifacts, exhibiting complex and nonrandom patterns. The absence of any analytic model for these degradations prohibits the use of many classical methods, such as probabilistic and modelbased reconstruction methods [inverseproblems]
. Furthermore, it makes simulating realistic degraded depth maps impractical, eliminating the possibility of generating pairs of high and lowquality images for supervised machine learning algorithms.
As an alternative, we propose a novel approach which eases the requirement for aligned groundtruth image pairs, formulating the task as an unsupervised domaintranslation problem between a lowquality sensor domain and a highquality sensor domain. Several works [deepface, cyclegan, stargan] have recently shown great success in handling such unsupervised domain translation problems. Following their success, we employ a similar approach to the challenging depth enhancement task. We base our approach on the CycleGAN framework, and develop a fully unsupervised method for training the enhancement network. To the best of our knowledge, this is the first work to formulate this depth enhancement task as an unsupervised translation task.
We focus on the lowpower RealSense R200 stereo camera as our lowquality depth sensor. As the highquality sensor, we select the timeofflight Microsoft Kinect 2, which is a significantly higherpowered and more accurate camera with substantially less noise. Our aim therefore becomes to bring the quality of the RealSense images to that of the Kinect 2 images via unsupervised domain translation.
Unfortunately, we find the original CycleGAN to perform poorly on this task, as depicted in Figure (1) (center). The main sources of this deficiency are the increased complexity of the task, as well as the asymmetry between the domains, manifested by the lack of information equivalence between them. To address these issues, we introduce several modifications to the framework. First, we replace the relatively small generative architecture with a much larger one, with sufficient representational capacity to handle the translation task. Next, we employ depthspecific losses which take into account missing pixels. Finally, we propose the TriCycle loss as an alternative informationretention metric for asymmetric domains. Combining these components, our modified CycleGAN framework significantly improves over ”vanilla” CycleGAN in this task, producing much more detailed and less noisy images, as demonstrated in Figure (1) (right). Our main contributions are therefore:

Developing a training method for depth enhancement networks, capable of handling realworld depth with severe degradation, and without requiring labeled data.

Presenting architectural design principles for CNNs aimed at processing highly degraded depth data with strong nonGaussian noise, missing pixels, and structured artifacts.

Proposing the TriCycle loss that extends the applicability of CycleGANs to asymmetric tasks which may not satisfy the informationpreservation assumption.
This work is organized as follows. We begin in Section (3.1) with a discussion of the specific challenges of realworld depth and its complex noise sources. We formulate the enhancement problem as an unsupervised translation task in Section (3.2) and discuss the limitations of the original CycleGAN which prevent it from producing reasonable recovery results in this case. We next describe our modifications to the CycleGAN framework, including the network architecture and considerations in designing it (Section (3.2.1)), the depthspecific losses (Section (3.2.2)), and the TriCycle loss (Section (3.2.3)). As discussed later, the TriCycle loss can be interpreted as a nonlinear generalization of the MoorePenrose inverse for asymmetrical translation problems, and in our view is the main innovation in this work. We continue by providing experimental results on several datasets in Section (4), demonstrating the effectiveness of the improved CycleGAN both visually and quantitatively. We discuss and conclude in Section (5).
2 Related Work
Depth map completion and enhancement have received considerable attention over the past years. Depth completion methods can generally be divided into two categories: colorguided and nonguided methods. Colorguided approaches [deepdepth, sparse2dense, blurrydepth, deeplidar] assume the existence of a color image aligned with the corrupted depth image, and rely on the fact that both share much of the structural information — such as object edges — to deduce a dense depth map from the lowquality input. For example, [deepdepth]
uses a CNN to estimate surface normals and edges from the color image, and subsequently combines them with the low quality depth image in a postprocess. Other works, such as
[sparse2dense], directly infer the underlying relation between depth and color, and output an enhanced depth map in a single endtoend process.When aligned color is not available, either due to a lack of a color sensor, the absence of an alignment between the depth and color streams, lowlight conditions, or (as in the RealSense case) the existence of a projected pattern in the visible image, a nonguided completion method must be used [sparsedepthsensing, sparseconv, sparsedense, sparseconvgan]. For example, Sparse Depth Sensing [sparsedepthsensing] reconstructs dense depth maps from very sparse measurements by modeling the scene as a piecewiseplanar map, and formulating the recovery task as a compressed sensing problem regularized by sparse second derivatives.
SparsityInvariant CNNs [sparseconv] take a different approach, and learn an imagetoimage enhancement network based on sparse convolutions, which consider only valid depth values when computing convolution outputs. However, in a followup work [sparsedense], the authors note that sparse convolutions rapidly lose their effectiveness after only a few convolutional layers, and thus elect to fillin missing pixels using a deep architecture based on traditional convolutions instead. The work introduces a unique sparse training strategy which synthetically varies the density of valid pixels in the input during training (though in relatively simple patterns), and is found to outperform [sparseconv] even at the density for which it was trained. In parallel, a GANbased approach was proposed in [sparseconvgan], introducing an adversarial loss to the supervised depth completion task. The added adversarial loss is shown to notably improve both realism and accuracy of the recovered images compared to previous methods.
Despite the convincing results of all these works in their respective tasks, we note that they all adopt a supervised approach to the training process, relying on the availability of groundtruth images alongside the degraded ones. This is often achieved by limiting the method to simple degradations which can be synthetically reproduced, such as i.i.d depth noise and randomly distributed missing pixels. In the case of realworld depth images, however, such assumptions often do not hold. Thus, in this work we take a different approach, and formulate the enhancement task as an unsupervised problem which does not require groundtruth images. In this way, we address the task of enhancing depth maps captured by real world lowquality depth cameras, and develop a framework for handling this challenging task.
3 Improving Depth Images Using CycleGAN
3.1 The Challenge of RealWorld Depth
Low power, small formfactor cameras such as the Intel RealSense R200 suffer from significant noise and artifacts in the captured depth maps. As an active stereo camera, the main sources of error include inaccuracies in the pattern matching — due to the algorithm itself or to insufficient information in the scene — as well as from shadowing due to the different viewpoints of the two sensors. These are all amplified by the small camera baseline and the low power of the projector. An example image captured with this camera is shown in Figure (
2) (left).The dependence of the depth noise on multiple factors, including scenespecific details such as texture, material, geometry and lighting, as well as cameraspecific parameters such as optics, projector, and algorithm performance, make it virtually impossible to reliably model the depth degradation. Thus, in contrast to many lowlevel image processing tasks such as denoising or superresolution, simulating a realistic noisy image given a known groundtruth image is impractical.
In the absence of a viable option to simulate training data, one must resort to manual capturing. One approach could be to capture pairs of images of a scene using two synchronized and calibrated depth cameras, with one being the lowquality camera and the second being a highquality camera providing the ground truth. Unfortunately, employing such a technique in large scale is extremely complex — it requires highly accurate alignment of the cameras, suffers from occlusions due to the different viewpoints, and furthermore, since most depth cameras involve some form of active projection, it is impossible to have the two capture the scene at the same time. Consequently, the process becomes lengthy and inefficient, producing too few images to form an effective training set. Interestingly, such an aligned dataset was recently presented in [danielkinectrs], though to achieve accurate results the process was limited to a specific, highly controlled environment, and resulted in just 112 images.
3.2 Unsupervised Depth Image Improvement
Considering the huge challenge in producing pairs of inputoutput images for realworld depth enhancement, we believe that the most viable path for training such a process is unsupervised learning. In this approach, the problem is restated as a translation problem between two domains — a lowquality domain and a highquality domain , represented by two unaligned, freely captured training sets. Such translation tasks have recently received significant attention, and have shown remarkable results in many translation problems [cyclegan, discogan, dualgan, stargan, sflowgan].
Following previous work, we adopt the highly successful CycleGAN [cyclegan, discogan, dualgan] as the basis for our domain translation framework. The CycleGAN simultaneously learns two generative networks for translating in both directions, and uses cycleconsistency to encourage informationpreservation by the translation in the absence of groundtruth targets. Specifically, given the two domains and
, the loss function of the CycleGAN is given by:
(1) 
Here, and are the two learned translators, and are images from the two domains, and and are adversarial losses [gan] for their respective domains, each incorporating a learned discriminator working against the generator (we omit the full definition of this loss for conciseness). The first two losses in this formulation guide the translators to output images in their correct domains (represented via the exemplar images from each domain), the next two losses are the cycleconsistency losses, and the final two losses are the identity losses which regularize the training process, and were introduced in [cyclegan].
The nature of the depth data, however, poses significant challenges to the CycleGAN framework. First, the lowquality depth exhibits significantly stronger and more complex noise patterns than traditionally used with the CycleGAN, and large missing regions create severe discontinuities in the data. Furthermore, we observe that the information preservation assumption made by the CycleGAN design does not actually hold in our case — specifically, the two depth domains are in fact not equivalent, with the highquality domain containing distinctly more information than the lowquality one. Thus, the cycleconsistency constraint which forms the basis of the CycleGAN becomes problematic in this case. To address these issues, we modify several key aspects of the original CycleGAN formulation, enabling it to successfully handle this challenging task. In the next sections we detail these modifications.
3.2.1 Network Architecture
Since a large part of the difficulty in lowquality depth comes from the high number of missing pixels, one may be tempted to consider architectures based on sparse convolutions [sparseconv, partialconv], which are a type of layer specifically designed for inpainting problems. In a masked convolution, only known pixels contribute to the result of the convolution, with each output feature normalized by the number of contributing values. However, for our task of depth enhancement, we have found such networks to perform poorly. Figure (2) (left) reveals a possible explanation: as opposed to inpainting tasks where the hole locations are typically arbitrary, in depth images the holes are in fact strongly correlated with the properties and geometry of the scene. In other words, the holes themselves convey information about the objects being recovered, such as their shape or distance. Thus, masking this information using convolutions invariant to the hole configuration is actually counterproductive in our case, and does not contribute to the desired result.
With this understanding, we base our translation network on standard convolutions, and consider the entire depth image — including its structured zero values — as a single visual representation of the scene. We employ a standard UNet with skip connections [unet, hourglass] as the generator architecture, similar to the original CycleGAN. The basic UNet architecture is illustrated in Figure (3).
However, pluggingin a simple UNet to the CycleGAN produces strikingly bad results in our case. To handle the complexity of lowquality depth, it is crucial to use a much wider and deeper translation network. Specifically, we significantly increase the number of channels in the lower layers of the network — those that respond to high frequencies in the image — to enable the network to more effectively handle the large variety of local patterns that emerge in the presence of holes. At the same time, we use a much deeper architecture than typically used in CycleGANs to allow the network to better resolve large objectscale phenomena, which is required to reliably fillin large holes and compensate for complex artifacts. Our full generator architecture is detailed in Table (1).
Layer Name  Input Layers  Output Size 

input    
conv1  input  
conv2.1  conv1  
conv2.2  conv2.1  
conv3.1  conv2.2  
conv3.2  conv3.1  
conv4.1  conv3.2  
conv4.2  conv4.1  
conv5.1  conv4.2  
conv5.2  conv5.1  
conv6.1  conv5.2  
conv6.2  conv6.1  
conv7.1  conv6.2  
conv7.2  conv7.1  
conv6.3  up(conv7.2) conv(conv6.2)  
conv6.4  conv6.3  
conv5.3  up(conv6.4) conv(conv5.2)  
conv5.4  conv5.3  
conv4.3  up(conv5.4) conv(conv4.2)  
conv4.4  conv4.3  
conv3.3  up(conv4.4) conv(conv3.2)  
conv3.4  conv3.3  
conv2.3  up(conv3.4) conv(conv2.2)  
conv2.4  conv2.3  
conv1.3  up(conv2.4) conv(conv1.2)  
conv1.4  conv1.3 
convolution, Leaky ReLU, and instance normalization
[instancenorm], with a stride of 1 or 2 depending on the output size. The
operator represents channelwise concatenation, up denotes nearestneighbor upsampling, and conv denotes a sizemaintaining convolution with Leaky ReLU and instance normalization.3.2.2 DepthSpecific Losses
The CycleGAN uses image similarity as a central component in the training process, in both the cycleconsistency and identity losses. However, when missing pixels are involved, computing similarity over the entire image may be suboptimal, particularly for pixels which are scattered and random. We note that while so far we have focused mainly on the structured noise of the RealSense camera, the Kinect 2 camera suffers from noise as well. Specifically, while the Kinect images exhibit significantly fewer holes than the RealSense images, and though some of these holes follow object boundaries and discontinuities, many of them are random and isolated, as demonstrated in Figure (
4). These random patterns are due to the timeofflight technology, which often forms holes in areas of low reflectivity, or where external light sources overpower the camera’s own projector. Clearly, requiring a generator to recreate these precise random patterns, for instance in the Kinect RealSense Kinect cycle, would be counterproductive as it would force the first translator to encode ”hints” about the original hole locations in the RealSense image.To address this, we utilize masked similarity, which considers only nonzero locations when computing distance. Formally, given a known depth image with valid pixel mask , and given a second depth image , we define the masked similarity loss as
(2) 
where denotes elementwise (Hadamard) multiplication.
We note that (2) is not symmetric in and . We use a nonsymmetric loss since the valid pixel mask of the output depth is nondifferentiable in the network parameters and is unstable near , and thus optimizing with respect to it would be impractical. Furthermore, the symmetric variant would encourage the formation of holes in the output, and in fact has a trivial global minimum at . In contrast, the asymmetric similarity generally prefers fillingin holes in the output image, while still allowing isolated holes to form owing to the robust norm.
Finally, an additional issue we have observed with the original CycleGAN is range preservation. Specifically, for any solution of (1), the solution , where represents a depth shift by of the nonzero values, is also equally valid. To counter this effect, we add a small masked similarity loss to the translation, requiring that the highquality image be close to the lowquality one where it is nonzero. Formally, this loss is given by:
(3) 
where is the valid mask of the lowquality image. We note that since this image typically has significantly more holes than the expected highquality output, this loss essentially just maintains the overall distance of the known objects in the scene, without affecting the visual properties of the output image.
3.2.3 TriCycle Loss
The CycleGAN measures information preservation by passing images through a full cycle of the domain translation process, and requiring the result to be an identity operator. However, particularly in the translation, this transform is in fact a onetomany mapping, as the low quality image may degrade in many different ways. An example of this is shown in Figure (5). Formally, for a 3D scene and viewpoint , the depth image is ideally a projection of the scene on the camera plane. However, due to noise and errors in the capturing process, we obtain a measured depth image which can very roughly be expressed as , with the depth noise and a mask image. Thus, for a fixed scene and viewpoint, we may measure any one of many possible depth images
. In the presence of holes and strong depth errors, this set can become of significant size, in contrast to e.g., a color camera where the noise model can be approximated as a Gaussian or Poisson source with typically low variance, leading to a relatively compact set.
To address this, we propose using an asymmetric loss function for promoting information preservation, which does not require the two translations to be inverses. Instead, this loss essentially requires that when performing a full cycle of the form , we simply produce a lowquality image which could feasibly reproduce the highquality one, but not necessarily the same one we began with.
To this end, we regard the Kinect camera as a highquality camera with relatively low noise, and hence consider the volume of the set to be negligible. However, this does not hold for the low quality RealSense camera, where the capturing process is substantially less stable, and multiple frames of the same scene may display large variation.
(Adversarial)  
(H cycle)  
(L cycle)  
(H identity)  
(L identity)  
(Depth preserve)  
(Tricycle) 
Given a lowquality depth image captured from the underlying scene and viewpoint , we denote by the set of all low quality images which could have been captured under the same conditions:
(4) 
Here,
denotes the joint distribution of plausible lowquality depth noise and hole patterns corresponding to the underlying scene. The set
forms the equivalency set of , and as previously noted, has a nonnegligible volume due to the properties of the lowquality depth camera.Returning to the CycleGAN formulation (1), it is now evident that the cycleconsistency assumption is broken in the case, as the second translation is a onetomany mapping. Clearly, requiring this cycle to be the identity mapping is an unnecessarily difficult constraint. We thus propose to relax the cycleconsistency constraint in this case, such that the output belongs to the equivalency set of the input, rather than equal it. This translates to the constraint:
(5) 
Unfortunately, enforcing this constraint directly is impractical, as the set is a complex, nonconvex set with no analytical form. However, if we apply to both sides of this expression, and by using the fact that iff , we can rewrite the above as:
(6) 
This requirement readily translates to a loss function, which we term the TriCycle loss due to the application of three consecutive translations in its definition. Incorporating an distance norm, and accumulating over the entire lowquality domain, this loss becomes:
(7)  
It is interesting to note the similarity between the above TriCycle loss and the linear Generalized Inverse. In the linear case, we may consider the inversion of a dimensionreducing matrix (i.e., a manytoone mapping) with , with the set
in this case being the Affine set of all vectors
mapped to the same . The Penrose conditions for inverting such a matrix [generalizedinverse] essentially require that the inverse transform map every to one of the vectors which would have been mapped to it by , i.e., , which is formalized by the condition . Indeed, our TriCycle constraint follows very similar reasoning. In this sense, we may view the mapping as a generalized inverse of , and the TriCycle formulation as seeking one of these mappings as part of the optimization process.3.3 Full Loss Function and Optimization Method
Our full depth enhancement architecture optimizes a combined penalty consisting of all the losses discussed in the previous sections. The full loss function is given in Table (2). We optimize using ADAM [adam] with batch size 1, with each batch consisting of a pair of random lowquality and highquality images sampled from . We use a constant learning rate of 0.001, and augment the examples with random crops, 90degree rotations, horizontal and vertical flips, and random shifts in depth.
4 Experimental Evaluation and Results
4.1 Synthetic Experiments
To quantify the performance of our CycleGAN framework, we use a highquality dataset of rendered depth images, to which we apply noise in a process simulating a depth camera. Our dataset is based on the Physically Based Rendering Dataset [suncgrenderedwebsite] consisting of 568,793 depth images randomly sampled from the SUNCG set of 3D scenes [suncgwebsite]
. We further filter the data by removing images with very low standard deviation (
400mm) or with more than 15% distant pixels ( 5000mm), as we are emulating a depth camera with limited range. The resulting synthetic dataset contains around 120,000 images, see Figure (6) (top).For the depth noise, we apply several degradations typical of depth cameras. Unfortunately, it is extremely difficult to emulate the highly structured noise of the RealSense camera. However, our process includes several noise sources which are common to depth cameras such as the RealSense and Kinect. These include structural noise, generated by adding Gaussian noise to a downsampled version of the image, followed by nearestneighbor upsampling; object boundary noise
, produced by removing pixels near object edges with a probability of
; depthadaptive noise, generated as random Gaussian noise with a distancedependent standard deviation ; and depthadaptive holes, generated by randomly eliminating pixels from the image with a probability . Figure 6 (bottom) shows a few noisy images produced by this process.We train a translation network to convert between the noisy and noiseless depth domains. Our experiments compare the performance of our full CycleGAN framework to the original formulation, as well as to the original CycleGAN but with the larger generator architecture. In addition, we compare our results to those of the recent Sparse Depth Sensing depth enhancement algorithm [sparsedepthsensing].
For the quantitative comparison, it is wellknown that traditional measures such as PSNR are unreliable as image quality estimators, particularly when adversarial and perceptual losses are involved [blaurethinking, superres, ct, compression]. Indeed, in most cases methods which directly optimize MSE will outperform perceptual methods in terms of PSNR, while in reality they produce oversmoothed images which lack detail. Thus, to more accurately quantify recovery of detail, we instead propose a patchbased normalized crosscorrelation (PNCC) measure. This measure computes the similarity between two images by computing the normalized crosscorrelations between their local patches (with overlap), and averaging the results. Formally, given two images and , we define in terms of a block size and a step size . Denoting by the patch of image beginning at and extending to (inclusive), the similarity between and is computed as:
(8) 
Here, is the normalized cross correlation function and is the total number of patches in the sum.
Table (3) lists the quantitative results of the synthetic experiment. We use and in the PNCC computation, though we note that the results behave similarly across different block sizes and steps. Example results are shown in Figure (7). As can be seen, the quantitative results indicate that our method is indeed recovering more detail than the alternatives. Examining the images, we see that the modified GAN formulation produces much sharper and more detailed images than either the original CycleGAN or the sparse sensing algorithm, inline with the local correlation metric.
Base  Improved Net  TriCycle  Sparse Sensing 
0.668  0.869  0.879  0.736 
4.2 Experiments with Real Depth Data
To demonstrate our method in more real conditions, we created a dataset of realworld depth images captured in an office setting. The images were captured independently using the RealSense and Kinect 2 cameras, with no synchronization between them. After some basic filtering (e.g., removing similar images or images with very little content) we arrived at a dataset consisting of just over 1,000 images from each camera. We note that due to the unconstrained manner in which this dataset was captured, we have no groundtruth for these images, and thus can only perform a qualitative evaluation of the results. On the other hand, the construction of this dataset makes it truly unsupervised, and thus wellrepresentative of a realworld scenario. Figure (2) shows an example from this dataset, with additional examples provided in the results figures.
Figure (8) demonstrates the effects of the generator architecture on the results. To isolate the parameters for this experiment, we do not employ any of the new losses in this case, and only vary the network architecture. We consider the following architectures: (1) original network; (2) original network with increased number of channels; (3) original network with increased number of layers; and (4) the final network. As can be seen, the original architecture is essentially unusable for this task, producing strong artifacts and providing no visible enhancement. Increasing the number of channels or the number of layers each have a notable effect in terms of reducing artifacts and fillingin holes, though with limited success. Combining both modifications produces the best results, with the fewest visible artifacts and the most accurate hole filling. It is thus clear that both width and depth are crucial for handling the challenges of lowquality depth. We note that in particular, the increased number of channels in the earlier network layers deviates from the standard practice for CNNs [vgg], though proves advantageous in this case.
Finally, Figure (9) shows some recovery results of our full TriCycle GAN framework. As before, we compare our results to those of Sparse Depth Sensing [sparsedepthsensing]. We also show results with and without the TriCycle loss, to demonstrate its effects on recovery performance. As can be seen, the method [sparsedepthsensing]
struggles with these images, exhibiting oversmoothness, jagged object edges, and intensification of outlier pixels leading to unnatural holes in objects. Clearly, the degradation model assumed by this method is too simplistic for this task. Continuing with the CycleGAN, increasing the network size has a significant effect on the results, though the output still suffers from visible artifacts and missing regions. Adding the TriCycle loss leads to a notable improvement in the results, producing more realistic and detailed images with fewer artifacts and missing pixels. Indeed, as many of these artifacts are in regions which were strongly corrupted in the input image, we attribute these improvements to the TriCycle loss, which relaxes the requirement to recover the exact degraded input by the inverse translation.
4.3 Experiments with the DROT Dataset
The Depth Restoration Occlusionless Temporal dataset, or DROT [danielkinectrs], is a carefully captured and postprocessed set of RealSense and Kinect 2 images, which are nearly pixellevel aligned. ^{1}^{1}1The dataset also includes color, Kinect 1, and 3D DAVID images, though we do not use these in this work. The dataset consists of 112 image sets which have been recorded in a studio setting, employing a highly accurate calibration process between the cameras. Figure (10) shows an example from this dataset. We use this dataset to quantify the performance of our method on actual RealSense depth maps. It should be noted, though, that due to the controlled environment and specificallychosen scene and materials, this dataset exhibits much lighter degradations than those our method was intended to handle.
Table (4) details our quantitative results on this dataset, and Figure (11) shows some example results. As can be seen, the original CycleGAN remains unusable in this case. However, both our method and Sparse Depth Sensing produce very competitive results, with each exhibiting different visual strengths and artifacts. Specifically, our method produces sharper edges and more accurate geometries and boundaries, whereas [sparsedepthsensing] produces images with no missing pixels and with negligible depth shift. Perhaps ironically, the main limitation of our method may be its own success — specifically, as our network was trained to produce convincing Kinect 2 images, it has also learned to reproduce its typical artifacts and noise patterns, such as missing pixels in this case. Nonetheless, the visual results as well as the higher PNCC scores of our method support its improved reconstruction of geometry and detail in this case.
Base  TriCycle  Sparse Sensing 
0.213  0.633  0.611 
5 Conclusions
Enhancing depth images with realworld noise is an immensely challenging task, with few practical solutions at this point. Formulating the problem as an unsupervised translation task dramatically simplifies dataset construction, however, the existing CycleGAN framework is found to be insufficient for this complex task. To overcome this, we proposed several modifications to the framework: a much larger generator architecture designed to handle lowquality depth, use of depthspecific masked similarity losses, and importantly, the asymmetric TriCycle loss which promotes informationpreservation between nonequivalent domains. We have tested these modifications on three datasets, and found them to dramatically improve over the base CycleGAN in all cases, producing sharp, detailed, and realisticlooking images. We conclude that the proposed approach enables effective enhancement of realworld depth images with severe noise and degradations, expanding the applicability of the CycleGAN to asymmetric tasks which do not necessarily satisfy the cycleconsistency assumption.
Comments
There are no comments yet.