unsupimg
None
view repo
Deep neural networks have been very successful in image estimation applications such as compressivesensing and image restoration, as a means to estimate images from partial, blurry, or otherwise degraded measurements. These networks are trained on a large number of corresponding pairs of measurements and groundtruth images, and thus implicitly learn to exploit domainspecific image statistics. But unlike measurement data, it is often expensive or impractical to collect a large training set of groundtruth images in many application settings. In this paper, we introduce an unsupervised framework for training image estimation networks, from a training set that contains only measurementswith two varied measurements per imagebut no groundtruth for the full images desired as output. We demonstrate that our framework can be applied for both regular and blind image estimation tasks, where in the latter case parameters of the measurement model (e.g., the blur kernel) are unknown: during inference, and potentially, also during training. We evaluate our method for training networks for compressivesensing and blind deconvolution, considering both nonblind and blind training for the latter. Our unsupervised framework yields models that are nearly as accurate as those from fully supervised training, despite not having access to any groundtruth images.
READ FULL TEXT VIEW PDFNone
Reconstructing images from imperfect observations is a classic inference task in many imaging applications. In compressive sensing donoho2006compressed , a sensor makes partial measurements for efficient acquisition. These measurements correspond to a lowdimensional projection of the higherdimensional image signal, and the system relies on computational inference for recovering the fulldimensional image. In other cases, cameras capture degraded images that are lowresolution, blurry, etc., and require a restoration algorithm freeman2002example ; yuan2007image ; zoran2011learning
to recover a corresponding uncorrupted image. Deep convolutional neural networks (CNNs) have recently emerged as an effective tool for such image estimation tasks
chen2017trainable ; ircnn ; chakrabarti2016neural ; dong2015image ; kulkarni2016reconnet ; facedeblur ; istanet . Specifically, a CNN for a given application is trained on a large dataset that consists of pairs of groundtruth images and observed measurements (in many cases where the measurement or degradation process is well characterized, having a set of groundtruth images is sufficient to generate corresponding measurements). This training set allows the CNN to learn to exploit the expected statistical properties of images in that application domain, to solve what is essentially an illposed inverse problem.But for many domains, it is impractical or prohibitively expensive to capture fulldimensional or uncorrupted images, and construct such a large representative training set. Unfortunately, it is often in such domains that a computational imaging solution is most useful. Recently, Lehtinen et al. noise2noise proposed a solution to this issue for denoising, with a method that trains with only pairs of noisy observations. While their method yields remarkably accurate network models without needing any groundtruth images for training, it is applicable only to the specific case of estimation from noisy measurements—when each image intensity is observed as a sample from a (potentially unknown) distribution with mean or mode equal to its corresponding true value.
In this work, we introduce an unsupervised method for training image estimation networks that can be applied to a general class of observation models—where measurements are a linear function of the true image, potentially with additive noise. As training data, it only requires two observations for the same image but not the underlying image itself^{1}^{1}1Note that at test time, the trained network only requires one observation as input as usual.. The two measurements in each pair are made with different parameters (such as different compressive measurement matrices or different blur kernels), and these parameters vary across different pairs. Collecting such a training set provides a practical alternative to the more laborious one of collecting full image groundtruth. Given these measurements, our method trains an image estimation network by requiring that its prediction from one measurement of a pair be consistent with the other measurement, when observed with the corresponding parameter. With sufficient diversity in measurement parameters for different training pairs, we show this is sufficient to train an accurate network model despite lacking direct groundtruth supervision.
While our method requires knowledge of the measurement model (e.g., blur by convolution), it also incorporates a novel mechanism to handle the blind setting during training—when the measurement parameters (e.g., the blur kernels) for training observations are unknown. To be able to enforce consistency as above, we use an estimator for measurement parameters that is trained simultaneously using a “proxy” training set. This set is created onthefly by taking predictions from the image network even as it trains, and pairing them with observations synthetically created using randomly sampled, and thus known, parameters. The proxy set provides supervision for training the parameter estimator, and to augment training of the image estimator as well. This mechanism allows our method to nearly match the accuracy of fully supervised training on image and parameter groundtruth.
We validate our method with experiments on image reconstruction from compressive measurements and on blind deblurring of face images, with blind and nonblind training for the latter, and compare to fullysupervised baselines with stateoftheart performance. The supervised baselines use a training set of groundtruth images and generate observations with random parameters on the fly in each epoch, to create a much larger number of effective imagemeasurement pairs. In contrast, our method is trained with only two measurements per image from the same training set (but not the image itself), with the pairs kept fixed through all epochs of training. Despite this, our unsupervised training method yields models with test accuracy close to that of the supervised baselines, and thus presents a practical way to train CNNs for image estimation when lacking access to image ground truth.
CNNbased Image Estimation. Many imaging tasks require inverting the measurement process to obtain a clean image from the partial or degraded observations—denoising buades2005non , deblurring yuan2007image
freeman2002example , compressive sensing donoho2006compressed , etc. While traditionally solved using statistical image priors foe ; zoran2011learning ; figueiredo2007gradient , CNNbased estimators have been successfully employed for many of these tasks. Most methods Nah2017DeepMC ; chen2017trainable ; ircnn ; chakrabarti2016neural ; dong2015image ; kulkarni2016reconnet ; facedeblur ; istanet learn a network to map measurements to corresponding images from a large training set of pairs of measurements and ideal groundtruth images. Some learn CNNbased image priors, as denoisers ircnn ; onenet ; romano2017little or GANs anirudh2018unsupervised , that are agnostic to the inference task (denoising, deblurring, etc.), but still tailored to a chosen class of images. All these methods require access to a large domainspecific dataset of groundtruth images for training. However, capturing image groundtruth is burdensome or simply infeasible in many settings (e.g., for MRI scans lustig2008compressed and other biomedical imaging applications). In such settings, our method provides a practical alternative by allowing estimation networks to be trained from measurement data alone.Unsupervised Learning.Unsupervised learning for CNNs is broadly useful in many applications where largescale training data is hard to collect. Accordingly, researchers have proposed unsupervised and weaklysupervised methods for such applications, such as depth estimation zhou2017unsupervised ; godard2017unsupervised , intrinsic image decomposition ma2018single ; li2018learning , etc. However, these methods are closely tied to their specific applications. In this work, we seek to enable unsupervised learning for image estimation networks. In the context of image modeling, Bora et al. bora2018ambientgan propose a method to learn a GAN model from only degraded observations. Their method, like ours, includes a measurement model with its discriminator for training (but requires knowledge of measurement parameters, while we are able to handle the blind setting). Their method proves successful in training a generator for ideal images. We seek a similar unsupervised means for training image reconstruction and restoration networks.
The closest work to ours is the recent Noise2Noise method of Lehtinen et al. noise2noise , who propose an unsupervised framework for training denoising networks by training on pairs of noisy observations of the same image. In their case, supervision comes from requiring the denoised output from one observation be close to the other. This works surprisingly well, but is based on the assumption that the expected or median value of the noisy observations is the image itself. We focus on a more general class of observation models, which requires injecting the measurement process in loss computation. We also introduce a proxy training approach to handle blind image estimation applications.
Given a measurement of an ideal image that are related as
(1) 
our goal is to train a CNN to produce an estimate of the image from . Here, is random noise with distribution that is assumed to be zeromean and independent of the image , and the parameter is an matrix that models the linear measurement operation. Often, the measurement matrix is structured with fewer than degrees of freedom based on the measurement model—e.g., it is blockToeplitz for deblurring with entries defined by the blur kernel. We consider both nonblind estimation when the measurement parameter is known for a given measurement during inference, and the blind setting where is unavailable but we know the distribution . For blind estimators, we address both nonblind and blind training—when is known for each measurement in the training set but not at test time, and when it is unknown during training as well.
Since (1) is typically noninvertible, image estimation requires reasoning with the statistical distribution of images for the application domain, and conventionally, this is provided by a large training set of typical groundtruth images . In particular, CNNbased image estimation methods train a network on a large training set of pairs of corresponding images and measurements, based on a loss that measures error between predicted and true images across the training set. In the nonblind setting, the measurement parameter is known and provided as input to the network (we omit this in the notation for convenience), while in the blind setting, the network must also reason about the unknown measurement parameter .
To avoid the need for a large number of groundtruth training images, we propose an unsupervised learning method that is able to train an image estimation network using measurements alone. Specifically, we assume we are given a training set of two measurements for each image :
(2) 
but not the images themselves. We require the corresponding measurement parameters and to be different for each pair, and further, to also vary across different training pairs. These parameters are assumed to be known for the nonblind training setting, but not for blind training.
We begin with the simpler case of nonblind estimation, when the parameter for a given measurement is known, both during inference and training. Given pairs of measurements with known parameters, our method trains the network using a “swapmeasurement” loss on each pair, defined as:
(3) 
This loss evaluates the accuracy of the full images predicted by the network from each measurement in a pair, by comparing it to the other measurement—using an error function —after simulating observation with the corresponding measurement parameter. Note Noise2Noise noise2noise can be seen as a special case of (3) for measurements are degraded only by noise, with .
When the parameters used to acquire the training set are sufficiently diverse and statistically independent for each underlying , this loss provides sufficient supervision to train the network . To see this, we consider using the distance for the error function , and note that (3) represents an empirical approximation of the expected loss over image, parameter, and noise distributions. Assuming the training measurement pairs are obtained using (2) with , , and drawn i.i.d. from their respective distributions, we have
(4) 
Therefore, because the measurement matrices are independent, we find that in expectation the swapmeasurement loss is equivalent to supervised training against the true image , with an loss that is weighted by the matrix
(upto an additive constant given by noise variance). With a sufficiently diverse distribution
of measurement parameters, will be fullrank (even though the individual are not). Then, the swapmeasurement loss will provide supervision along all image dimensions, and will reach its theoretical minimum iff the network makes exact predictions.In addition to the swap loss, we also find it useful to train with an additional “selfmeasurement” loss that measures consistency between an image prediction and its own corresponding input measurement:
(5) 
While not sufficient by itself, we find the additional supervision it provides to be practically useful in yielding more accurate network models. Therefore, our overall unsupervised training objective is a weighted version of the two losses , with weight chosen on a validation set.
We next consider the more challenging case of blind estimation, when the measurement parameter for an observation is unknown—and specifically, the blind training setting, when it is unknown even during training. The blind training setting complicates the use of our unsupervised losses in (3) and (5), since the values of and used there are unknown. Also, blind estimation tasks often have a more diverse set of possible parameters . While supervised training methods with access to groundtruth images can generate a very large database of synthetic imagemeasurement pairs by pairing the same image with many different (assuming is known), our unsupervised framework has access only to two measurements per image.
To address this, we propose a “proxy training” approach that treats estimates from our network during training as a source of image groundtruth to train an estimator for measurement parameters. We use the image network’s predictions to construct synthetic observations as:
(6) 
where and are sampled on the fly from the parameter and noise distributions, and indicates an assignment with a “stopgradient” operation (to prevent loss gradients on the proxy images from affecting the image estimator ). We use these synthetic observations , with known sampled parameters , to train the parameter estimation network based on the loss:
(7) 
As the parameter network trains with augmented data, we simultaneously use it to compute estimates of parameters for the original observations: , and compute the swap and selfmeasurement losses in (3) and (5) on the original observations using these estimated, instead of true, parameters. Notice that we use a stopgradient here as well, since we do not wish to train the parameter estimator based on the swap or selfmeasurement losses—the behavior observed in (3.1) no longer holds in this case, and we empirically observe that removing the stopgradient leads to instability and often causes training to fail.
In addition to training the parameter estimator , the proxy training data in (6) can be used to augment training for the image estimator , now with full supervision from the proxy images as:
(8) 
This loss can be used even in the nonblind training setting, and provides a means of generating additional training data with more pairings of image and measurement parameters. Also note that although our proxy images are approximate estimates of the true images, they represent the groundtruth for the synthetically generated observations . Hence, the losses and are approximate only in the sense that they are based on images that are not sampled from the true image distribution . And the effect of this approximation diminishes as training progresses, and the image estimation network produces better image predictions (especially on the training set).
Our overall method randomly initializes the weights of the image and parameter networks and , and then trains them with a weighted combination of all losses: , where the scalar weights are hyperparameters determined on a validation set. For nonblind training (of blind estimators), only the image estimator needs to be trained, and can be set to .
We evaluate our framework on two wellestablished tasks: nonblind image reconstruction from compressive measurements, and blind deblurring of face images. These tasks were chosen since large training sets of groundtruth images is available in both cases, which allows us to demonstrate the effectiveness of our approach through comparisons to fully supervised baselines. The source code of our implementation is available at https://projects.ayanc.org/unsupimg/.
We consider the task of training a CNN to reconstruct images from compressive measurements. We follow the measurement model of kulkarni2016reconnet ; istanet , where all nonoverlapping patches in an image are measured individually by the same lowdimensional orthonormal matrix. Like kulkarni2016reconnet ; istanet , we train CNN models that operate on individual patches at a time, and assume ideal observations without noise (the supplementary includes additional results for noisy measurements). We train models for compression ratios of , and (using corresponding matrices provided by kulkarni2016reconnet ).
Method  Supervised  BSD68  Set11  
1%  4%  10%  1%  4%  10%  
TVAL3 tval3  ✗        16.43  18.75  22.99 
ReconNet kulkarni2016reconnet  ✓    21.66  24.15  17.27  20.63  24.28 
ISTANet+ istanet  ✓  19.14  22.17  25.33  17.34  21.31  26.64 
Supervised Baseline (Ours)  ✓  19.74  22.94  25.57  17.88  22.61  26.74 
Unsupervised Training (Ours)  ✗  19.67  22.78  25.40  17.84  22.20  26.33 
Ground truth  ReconNet kulkarni2016reconnet  ISTANet+ istanet  Supervised Baseline (Ours)  Unsupervised Training (Ours) 
PSNR:  21.89 dB  23.61 dB  24.34 dB  24.03 dB 
PSNR:  21.29 dB  23.66 dB  24.37 dB  24.17 dB 
We generate a training and validation set, of k and images respectively, by taking
crops from images in the ImageNet database
imagenet . We use a CNN architecture that stacks two UNets unet, with a residual connection between the two (see supplementary). We begin by training our architecture with full supervision, using
all overlapping patches from the training images, and an loss between the network’s predictions and the groundtruth image patches. For unsupervised training with our approach, we create two partitions of the original image, each containing nonoverlapping patches. The partitions themselves overlap, with patches in one partition being shifted from those in the other (see supplementary). We measure patches in both partitions with the same measurement matrix, to yield two sets of measurements. These provide the diversity required by our method as each pixel is measured with a different patch in the two partitions. Moreover, this measurement scheme can be simply implemented in practice by camera translation. The shifts for each image are randomly selected, but kept fixed throughout training. Since the network operates independently on patches, it can be used on measurements from both partitions. To compute the swapmeasurement loss, we take the network’s individual patch predictions from one partition, arrange them to form the image, and extract and then apply the measurement matrix to shifted patches corresponding to the other partition. The weight for the selfmeasurement loss is set to 0.05 based on the validation set.In Table 1, we first compare our fully supervised baseline to existing compressive sensing methods that use supervised training kulkarni2016reconnet ; istanet as well as one that uses a manual regularizer tval3 (numbers are reported from istanet ), and show that it achieves stateoftheart performance. We then report results for training with our unsupervised framework, and find that this leads to accurate models that only lag our supervised baseline by 0.4 db or less in terms of average PSNR on both test sets—and in most cases, actually outperforms previous methods. This is despite the fact that these models have been trained without any access to groundtruth images. Figure 2 provides example reconstructions for some images, and we find that results from our unsupervised method are extremely close in visual quality to those of the baseline model trained with full supervision.
We next consider the problem of blind motion deblurring of face images. Like facedeblur , we consider the problem of restoring aligned and cropped face images that have been affected by motion blur, through convolution with motion blur kernels of size upto
, and Gaussian noise with standard deviation of two gray levels. We use all 160k images in the CelebA training set
celeba and 1.8k images from Helen training set helen to construct our training set, and 2k images from CelebA val and 200 from the Helen training set for our validation set. We use a set of 18k and 2k random motion kernels for training and validation respectively, generated using the method described in chakrabarti2016neural . We evaluate our method on the official blurred test images provided by facedeblur (derived from the CelebA and Helen test sets). Note that unlike facedeblur , we do not use any semantic labels for training.In this case, we use a single UNet architecture to map blurry observations to sharp images. We again train a model for this architecture with full supervision, generating blurrysharp training pairs on the fly by pairing random of blur kernels from training set with the sharp images. Then, for unsupervised training with our approach, we choose two kernels for each training image to form a training set of measurement pairs, that are kept fixed (including the added Gaussian noise) across all epochs of training. We first consider nonblind training, using the true blur kernels to compute the swap and selfmeasurement losses. Here, we consider training with and without the proxy loss for the network. Then, we consider the blind training case where we also learn an estimator for blur kernels, and use its predictions to compute the measurement losses. Instead of training a entirely separate network, we share the initial layers with the image UNet, and form a separate decoder path going from the bottleneck to the blur kernel. The weights are all set to one in this case.
We report results for all versions of our method in Table 2, and compare it to facedeblur , as well as a traditional deblurring method that is not trained on face images xu2013unnatural . We find that with full supervision, our architecture achieves stateoftheart performance. Then with nonblind training, we find that our method is able to come close to supervised performance when using the proxy loss, but does worse without—highlighting its utility even in the nonblind setting. Finally, we note that models derived using blindtraining with our approach are also able to produce results nearly as accurate as those trained with full supervision—despite lacking access both to ground truth image data, and knowledge of the blur kernels in their training measurements. Figure 3 illustrates this performance qualitatively, with example deblurred results from various models on the official test images. We also visualize the blur kernel estimator learned during blind training with our approach in Fig. 4 on images from our validation set. Additional results, including those on real images, are included in the supplementary.
Method  Supervised  Helen  CelebA  
PSNR  SSIM  PSNR  SSIM  
Xu et al. xu2013unnatural  ✗  20.11  0.711  18.93  0.685 
Shen et al. facedeblur  ✓  25.99  0.871  25.05  0.879 
Supervised Baseline (Ours)  ✓  26.13  0.886  25.20  0.892 
Unsupervised Nonblind (Ours)  ✗  25.95  0.878  25.09  0.885 
Unsupervised Nonblind (Ours) without proxy loss  ✗  25.47  0.867  24.64  0.873 
Unsupervised Blind (Ours)  ✗  25.93  0.876  25.06  0.883 

Ground truth  Blurred input  Shen et al. facedeblur  Supervised (Ours)  Nonblind (Ours)  Blind (Ours) 
22.69, 24.61, 25.16, 25.19  
26.83, 28.18, 28.27, 28.16  
26.59, 28.29, 27.42, 26.77  
22.36, 23.50, 22.84, 22.94 
Ground Truth  Blurred  Predictions  Ground Truth  Blurred  Predictions 
We presented an unsupervised method to train image estimation networks from only measurements pairs, without access to groundtruth images, and in blind settings, without knowledge of measurement parameters. In this paper, we validated this approach on wellestablished tasks where sufficient groundtruth data (for natural and face images) was available, since it allowed us to compare to training with fullsupervision and study the performance gap between the supervised and unsupervised settings. But we believe that our method’s real utility will be in opening up the use of CNNs for image estimation to new domains—such as medical imaging, applications in astronomy, etc.—where such use has been so far infeasible due to the difficulty of collecting large groundtruth datasets.
Acknowledgments. This work was supported by the NSF under award no. IIS1820693.
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
, volume 2, pages 60–65. IEEE, 2005.We include additional results for the case where the compressive measurements are corrupted by additive white Gaussian noise. When training with full supervision, we generate the noisy measurement on the fly resulting in many noisy compressed measurements for each image. But for unsupervised training with our approach, we keep the noise values (along with the measurement parameters) for each image fixed across all training epochs. We show results for Gaussian noise with different standard deviations in Table 3 for the 10% compression ratio. Again, our unsupervised training approach comes close to matching the accuracy of the fully supervised baseline. Figure 5 shows example reconstructions for this case.
Method  BSD68  Set11  
=0  =0.1  =0.2  =0.3  =0  =0.1  =0.2  =0.3  
Supervised Baseline  25.57  24.60  23.49  22.57  26.74  25.24  23.67  22.30 
Unsupervised Training  25.40  24.41  23.12  21.99  26.33  24.94  23.21  21.79 

Supervised 





Unsupervised 






Supervised 





Unsupervised 






Supervised 





Unsupervised 




We show additional face deblurring results from facedeblur ’s test set in Fig. 6. Moreover, facedeblur also provides a dataset of real blurred images that are aligned, cropped, and scaled. While there is no groundtruth image data available for this set, we include example results from it in Fig. 7 for qualitative evaluation. We again find that results from models trained using our unsupervised approach are close in visual quality to those from our supervised baseline.
Ground truth  Blurred input  Shen et al. facedeblur  Supervised (Ours)  Nonblind (Ours)  Blind (Ours) 
25.28, 27.01, 26.35, 26.21  
22.06, 23.76, 23.77, 23.96  
24.34, 26.38, 26.06, 25.88  
24.84, 26.20, 27.05, 26.77  
25.87, 27.14, 26.87, 26.67  
29.04, 30.10, 29.61, 29.05  
27.59, 29.82, 30.62, 30.66 
Blurred input  Shen et al. facedeblur  Supervised (Ours)  Nonblind (Ours)  Blind (Ours) 
Both of our compressive measurement reconstruction and face deblurring networks are based on UNet unet
, featuring encoderdecoder architectures with skip connections. We use convolutional layers with stride larger than 1 for downsampling, and transpose convolutional layers for upsampling. Except for the last layer of each network, all layers are followed by batch normalization and ReLU. We use
distance as for all losses for compressive measurement reconstruction, and the distance (again, for all losses) in blind face deblurring. All networks are trained with Adam adam optimizer and a learning rate of . We drop the learning rate twice by when the loss on the validation set flattens out. Training takes about one to two days on a 1080 Ti GPU.Compressive Reconstruction. Our compressive measurement reconstruction network is a stack of two UNets, with the detailed configuration of each UNet shown in Table 4
. Given a compressed vector
for a single patch and the sensing matrix , we first compute and reshape it to the original size of the patch (i.e., ) and input this to the first UNet. The second UNet then takes as input the concatenation of and the output from the first UNet. Finally, we add the outputs of these two UNets to derive the final estimate of the image.Our approach to deriving measurement pairs during training is visualized in Fig. 8.
Input  Output  Kernel Size 


Stride  Output Size  

conv1  2  1 or 2  32  1  32 (VALID)  
conv1  conv2  4  32  64  2  16  
conv2  conv3  4  64  128  2  8  
conv3  conv4  4  128  256  2  4  
conv4  conv5  4  256  256  2  2  
conv5  conv6  4  256  256  2  1  
conv6  upconv1  4  256  256  1/2  2  
conv5 upconv1  upconv2  4  512  256  1/2  4  
conv4 upconv2  upconv3  4  512  128  1/2  8  
conv3 upconv3  upconv4  4  256  64  1/2  16  
conv2 upconv4  upconv5  4  128  32  1/2  32  
conv1 upconv5  upconv6  2  64  32  1  33 (VALID)  
upconv6  end1  3  32  32  1  33  
end1  end2  1  32  1  1  33 
implies concatenation, and unless indicated with “VALID”, all layers use “SAME” padding.
Face deblurring. Our face deblurring network is also a UNet that maps the blurred observation to a sharp image estimate of the same size. For blind training, we have an auxiliary decoder path to produce the kernel estimate (i.e., to act as ). The kernel decoder path has the same number of transpose convolution layers, but only the first few upsample by two and have skip connections, since the kernel is smaller. The remaining transpose convolution layers have stride 1, but increase spatial size (as they represent transpose of a ’VALID’ convolution). The final output of the kernel decoder path is passed through a “softmax” that is normalized across spatial locations. This yields a kernel with elements that sum to 1 (which matches the constraint that the blur kernel doesn’t change the average intensity, or DC value, of the image). The detailed architecture is presented in Table 5.
Input  Output  Kernel Size 


Stride  Output Size  
RGB  conv1  4  3  64  2  64  
conv1  conv2  4  64  128  2  32  
conv2  conv3  4  128  256  2  16  
conv3  conv4  4  256  512  2  8  
conv4  conv5  4  512  512  2  4  
conv5  conv6  4  512  512  2  2  
conv6  conv7  4  512  512  2  1  
conv7  upconv1  4  512  512  1/2  2  
conv6 upconv1  upconv2  4  1024  512  1/2  4  
conv5 upconv2  upconv3  4  1024  512  1/2  8  
conv4 upconv3  upconv4  4  1024  256  1/2  16  
conv3 upconv4  upconv5  4  512  128  1/2  32  
conv2 upconv5  upconv6  4  256  64  1/2  64  
conv1 upconv6  output  4  128  3  1/2  128  
conv7  kupconv1  4  512  512  1/2  2  
conv6 kupconv1  kupconv2  4  1024  512  1/2  4  
conv5 kupconv2  kupconv3  4  1024  512  1/2  8  
conv4 kupconv3  kupconv4  4  1024  256  1/2  16  
conv3 kupconv4  kupconv5  4  512  128  1  19 (VALID)  
kupconv5  kupconv6  4  128  64  1  22(VALID)  
kupconv6  kupconv7  4  128  64  1  25(VALID)  
kupconv7  koutput  3  128  64  1  27(VALID) 
Comments
There are no comments yet.