1 Introduction
Deep convolutional neural networks (CNNs) are dominating in most visual recognition problems and applications, including semantic segmentation
[16], action recognition [25] and object detection [6]. When full supervision is available, CNNs can achieve outstanding performances, however this type of supervision may not be available in a wide range of applications. In semantic segmentation, full supervision involves annotating all the pixels in each training image. The problem is further amplified when such annotations require expert knowledge or involves volumetric data, as is the case in medical imaging [15]. Therefore, the supervision of semantic segmentation with partial or weak labels, for example, scribbles [13, 27, 28], image tags [22, 19], bounding boxes [23] or points [1], has received significant research efforts in the last years.Imposing prior knowledge on the network’s prediction via some unsupervised loss is a wellestablished technique in semisupervised learning
[30, 7]. Such prior acts as a regularizer that leverages unlabeled data with domainspecific knowledge. For instance, in semantic segmentation, several recent works showed that adding loss terms such as dense conditional random fields (CRFs) [28], graph clustering [27] or priors on the sizes of the target regions [12] can achieve outstanding performances with only fractions of full supervision labels. However, imposing hard inequality or equality constraints on the output of deep CNNs is still in a nascent stage, and only a few recent works have focused on the subject [22, 17, 24, 12].1.1 Problem formulation
We consider a general class of semi or weaklysupervised semantic segmentation problems, where global inequality constraints are enforced on the network’s output. In what follows, we present a formulation for 2D images, however the same formulation also applies to 3D images. Consider a training image , with a set of labeled pixels, which corresponds to a fraction of the pixels in image domain . For classes, let denote the groundtruth label of pixel , and
a standard Kway softmax probability output, with
the parameters of the network. In matrix , each row corresponds to the predictions for a pixel in , which can be either unlabeled or labeled. We focus on problems of the following general form, in which we optimize a partial (semisupervised) loss subject to a set of inequality constraints on the network output:(1)  
s.t. 
where is some standard loss for the set of labeled pixels , e.g., the cross entropy^{1}^{1}1We give the cross entropy as an example but our framework is not restricted to a specific form of loss for the set of labeled points.: . Inequality constraints of the general form in (1) can embed very useful prior knowledge on the network predictions for unlabeled pixels. Assume, for instance, that we have prior knowledge about the size of the target region (i.e., class) . Such knowledge can be in the form of lower or upper bounds on size, which is common in medical image segmentation problems [12, 18, 8]. In this case, one can impose constraints of the form , with denoting an upper bound on the size of region . The same type of constraints can impose imagetag priors, a form of weak supervision enforcing whether a target region is present or absent in a given training image, as in multiple instance learning (MIL) scenarios [22, 12]. For instance, constraint of the form forces class to be present in a given training image.
1.2 Challenges of constrained CNN optimization
Even when the constraints are convex with respect to the network output, problem (1) is very challenging for deep CNNs, which typically have millions of trainable parameters in the case of semantic segmentation. In optimization, a standard way to handle constraints is to solve the Lagrangian primal and dual problems in an alternating scheme [3]. For (1), this corresponds to alternating the optimization of a CNN for the primal with stochastic optimization, e.g., SGD, and projected gradientascent iterates for the dual. However, despite the clear benefits of imposing global constraints on CNNs, such a standard Lagrangiandual optimization is mostly avoided in modern deep networks. As discussed recently in [22, 17, 24], this might be explained by two main reasons: (1) computational complexity and (2) stability/convergence issues caused by alternating between stochastic optimization and dual updates.
As pointed out in [22, 17, 12], imposing hard constraints on the outputs of deep CNNs is challenging. In standard Lagrangiandual optimization methods, an unconstrained optimization problems needs to be solved after each iterative dual step. This is not feasible for deep CNNs, however, as it would require retraining the network at each step. To avoid this problem, Pathak et al. [22] introduce a latent distribution, and minimize a KL divergence so that the CNN output matches this distribution as closely as possible. Since the network’s output is not directly coupled with constraints, its parameters can be optimized using standard techniques like SGD. While this strategy enabled adding inequality constraints in weakly supervised segmentation, it is limited to linear constraints. Moreover, the work in [17]
imposes hard equality constraints on 3D human pose estimation. To alleviate computational complexity, Kyrlov subspace approach is used to limit the solver to a randomly selected subset of constraints within each iteration. Therefore, constraints that are satisfied at one iteration may not be satisfied at the next, which might explain the negative results in the paper. In general, updating the network parameters and dual variables in an alternating fashion leads to a higher computational complexity than solving a loss function directly.
The second difficulty in Lagrangian optimization is the interplay between stochastic optimization (e.g., SGD) for the primal and the iterates/projections for the dual. Basic gradient methods have wellknown issues with deep networks, e.g., they are sensitive to the learning rate and prone to weak local minima. Therefore, the dual part in Lagrangian optimization might obstruct the practical and theoretical benefits of stochastic optimization (e.g., speed and strong generalization performance), which are widely established for unconstrained deep network losses [9]. More importantly, solving the primal and dual separately may lead to instability during training or slow convergence, as shown recently in [12].
1.3 Penalty approaches
In the context of deep networks, “hard” inequality or equality constraints are typically handled in a “soft” manner by augmenting the loss with a penalty function [10, 11, 12]. The penaltybased approach is a simple alternative to Lagrangian optimization, and is wellknown in the general context of constrained optimization; see [2], Chapter 4. In general, such penaltybased methods approximate a constrained minimization problem with an unconstrained one by adding a term (penalty) , which increases when constraint is violated. By definition, a penalty is a nonnegative, continuous and differentiable function, which verifies: if and only if constraint is satisfied. In semantic segmentation [12]
and, more generally, in deep learning
[10], it is common to use a quadratic penalty for imposing an inequality constraint: , where denotes the rectifier function. Fig. 1 depicts an illustration of different choices of penalty functions. Penalties are convenient for deep networks because they remove the requirement for explicit Lagrangiandual optimization. The inequality constraints are fully handled within stochastic optimization, as in standard unconstrained losses, avoiding gradient ascent iterates/projections over the dual variables and reducing the computational load for training [12]. However, this simplicity of penalty methods comes at a price. In fact, it is well known that penalty methods do not guarantee constraint satisfaction and require careful and ad hoc tuning of the relative importance (or weight) of each penalty term in the overall function that is being minimized. More importantly, in the case of several competing constraints, penalties do not act as barriers at the boundary of the feasible set (i.e., a satisfied constraint yields a null penalty). As a result, a subset of constraints that are satisfied at one iteration may not be satisfied at the next. Lagrangian optimization can deal with these difficulties, and has several wellknown theoretical and practical advantages over penaltybased methods [4, 5]: it finds automatically the optimal weights of the constraints and guarantees constraint satisfaction when feasible solutions exist. Unfortunately, as pointed out recently in [17, 12], these advantages of Lagrangian optimization do not materialize in practice in the context of deep CNNs. Apart from the computationalfeasibility aspects, which the recent works in [17, 22] address to some extent with approximations, the performances of Lagrangian optimization are, surprisingly, below those obtained with simple, much less computationally intensive penalties [17, 12]. This is, for instance, the case of the recent weakly supervised CNN semantic segmentation results in [12], which showed that a simple quadraticpenalty formulation of inequality constraints outperforms substantially the Lagrangian method in [22]. Also, the authors of [17] reported surprising results in the context of 3D human pose estimation. In their case, replacing the equality constraints with simple quadratic penalties yielded better results than Lagrangian optimization.1.4 Contributions
We leverage wellestablished concepts in interiorpoint methods, which approximate Lagrangian optimization with a sequence of unconstrained problems, while completely avoiding dual steps/projections. Specifically, we propose a sequence of unconstrained logbarrierextension losses for approximating inequalityconstrained CNN problems. The proposed extension has a dualitygap bound, which yields suboptimality certificates for feasible solutions in the case of convex losses. While suboptimality is not guaranteed for nonconvex problems, the result shows that logbarrier extensions are a principled way to approximate Lagrangian optimization for constrained CNNs. Our approach addresses the wellknown limitations of penalty methods and, at the same time, removes the explicit dual steps of Lagrangian optimization. We report comprehensive experiments showing that our formulation outperforms a recent penaltybased constrained CNN method [12], both in terms of accuracy and training stability.
2 Background on Lagrangian optimization and the logbarrier method
This section reviews both standard Lagrangian optimization and the logbarrier method for constrained problems [3]. We also present basic concepts of duality theory, namely the duality gap and suboptimality, which will be needed when introducing our logbarrier extension and the corresponding dualitygap bound. We also discuss the limitations of standard constrained optimization methods in the context of deep CNNs.
Lagrangian optimization: Let us first examine standard Lagrangian optimization for problem (1):
(2) 
where
is the dual variable (or Lagrangemultiplier) vector, with
the multiplier associated with constraint . The dual function is the minimum value of Lagrangian (2) over : . A dual feasible yields a lower bound on the optimal value of constrained problem (1), which we denote : . This important inequality can be easily verified, even when the problem (1) is not convex; see [3], p. 216. It follows that a dual feasible gives a suboptimality certificate for a given feasible point , without knowing the exact value of : . Nonnegative quantity is the duality gap for primaldual pair . If we manage to find a feasible primaldual pair such that the duality gap is less or equal than a certain , then primal feasible is suboptimal.Definition 1.
A primal feasible point is suboptimal when it verifies: .
This provides a nonheuristic stopping criterion for Lagrangian optimization, which alternates two iterative steps, one primal and one dual, each decreasing the duality gap until a given accuracy
is attained^{2}^{2}2Strong duality should hold if we want to achieve arbitrarily small tolerance . Of course, strong duality does not hold in the case of CNNs as the primal problem is not convex.. In the context of CNNs [22], the primal step minimizes the Lagrangian w.r.t. , which corresponds to training a deep network with stochastic optimization, e.g., SGD: . The dual step is a constrained maximization of the dual function^{3}^{3}3Notice that the dual function is always concave as it is the minimum of a family of affine functions, even when the original (or primal) problem is not convex, as is the case for CNNs. via projected gradient ascent: . As mentioned before, direct use of Lagrangian optimization for deep CNNs increases computational complexity and can lead to instability or poor convergence due to the interplay between stochastic optimization for the primal and the iterates/projections for the dual. Our work approximates Lagrangian optimization with a sequence of unconstrained logbarrierextension losses, in which the dual variables are implicit, avoiding explicit dual iterates/projections. Let us first review the basic barrier method.The logbarrier method: The logbarrier method is widely used for inequalityconstrained optimization, and belongs to the family of interiorpoint techniques [3]. To solve our constrained CNN problem (1) with this method, we need to find a strictly feasible set of network parameters as a starting point, which can then be used in an unconstrained problem via the logbarrier function. In the general context of optimization, logbarrier methods proceed in two steps. The first, often called phase I [3], computes a feasible point by Lagrangian minimization of a constrained problem, which in the case of (1) is:
(3) 
For deep CNNs with millions of parameters, Lagrangian optimization of problem (3) has the same difficulties as with the initial constrained problem in (1). To find a feasible set of network parameters, one needs to alternate CNN training and projected gradient ascent for the dual variables. This might explain why such interiorpoint methods, despite their substantial impact in optimization [3], are mostly overlooked in modern deep networks^{4}^{4}4Interiorpoint methods were investigated for artificial neural networks before the deep learning era [29]., as is generally the case for other Lagrangiandual optimization methods.
The second step, often referred to as phase II, approximates (1) as an unconstrained problem:
(4) 
where is the logbarrier function: . When , this convex, continuous and twicedifferentiable function approaches a hard indicator for the constraints: if and otherwise; see Fig. 1 (a) for an illustration. The domain of the function is the set of feasible points. The higher , the better the quality of the approximation. This suggest that large yields a good approximation of the initial constrained problem in (1). This is, indeed, confirmed with the following standard dualitygap result for the logbarrier method [3], which shows that optimizing (4) yields a solution that is suboptimal.
Proposition 1.
Proof: The proof can be found in [3], p. 566. ∎
An important implication that follows immediately from proposition (1) is that a feasible solution of approximation (4) is suboptimal: . This suggests a simple way for solving the initial constrained problem with a guaranteed suboptimality: We simply choose large and solve unconstrained problem (4). However, for large , the logbarrier function is difficult to minimize because its gradient varies rapidly near the boundary of the feasible set. In practice, logbarrier methods solve a sequence of problems of the form (4) with an increasing value . The solution of a problem is used as a starting point for the next, until a specified suboptimality is reached.
3 Logbarrier extensions
We propose the following unconstrained loss for approximating Lagrangian optimization of constrained problem (1):
(5) 
where is our logbarrier extension, which is convex, continuous and twicedifferentiable:
(6) 
Similarly to the standard logbarrier, when , our extension (6) can be viewed a smooth approximation of hard indicator function ; see Fig. 1 (b). However, a very important difference is that the domain of our extension is not restricted to feasible points . Therefore, our approximation (5) removes completely the requirement for explicit Lagrangiandual optimization for finding a feasible set of network parameters. In our case, the inequality constraints are fully handled within stochastic optimization, as in standard unconstrained losses, avoiding completely gradient ascent iterates and projections over explicit dual variables. As we will see in the experiments, our formulation yields better results in terms of accuracy and stability than the recent penalty constrained CNN method in [12].
In our approximation in (5), the Lagrangian dual variables for the initial inequalityconstrained problem of (1) are implicit. We prove the following dualitygap bound, which yields suboptimality certificates for feasible solutions of our approximation in (5). Our result^{5}^{5}5Our result applies to the general context of convex optimization. In deep CNNs, of course, a feasible solution of our approximation may not be unique and is not guaranteed to be a global optimum as and the constraints are not convex. can be viewed as an extension of the standard result in proposition 1, which expresses the dualitygap as a function of for the logbarrier function.
Proposition 2.
Proof: We give a detailed proof in the supplemental material. ∎
From proposition 2, the following important fact follows immediately: If the solution that we obtain from unconstrained problem (5) is feasible and global, then it is suboptimal for constrained problem (1): .
Finally, we arrive to our constrained CNN learning algorithm, which is fully based on SGD. Similarly to the standard logbarrier algorithm, we use a varying parameter . We optimize a sequence of losses of the form (5) and increase gradually the value by a factor . The network parameters obtained for the current
and epoch are used as a starting point for the next
and epoch. Steps of the proposed constrained CNN learning algorithm are detailed in Algorithm 1.4 Experiments
Both the proposed extended logbarrier and the penalty based baseline [12] are compatible with any differentiable function , including nonlinear and fractional terms (such as Equations (8) and (9) introduced further in the paper). However, we hypothesize that our logbarrier extension is better to handle the interplay between multiple constraints. To validate this hypothesis, we compare both strategies on the joint optimization of two segmentation constraints related to region size and centroid.
Size.
We define the size of a segmentation for class as the sum of its predictions:
(8) 
Notice that we use the softmax predictions to compute , as using values after thresholding would not be differentiable. In practice, we can make the network predictions to be near binary using a large enough temperature parameter in the softmax. We bound the function such as
where , corresponding to “individual bounds” [12], i.e., specific bounds are determined for each image from its ground truth .^{6}^{6}6Since we focus on methods to constraint the training of a deep neural network, we do not study how we can realistically get such bounds without complete annotations. This is left as future work.
Centroid.
The centroid of the predicted object can be computed as a weighted average of the pixel coordinates:
(9) 
where are the pixel coordinates on a 2D grid. We constraint the position of the centroid in a box around the ground truth centroid:
where , being the bound values associated to each image.
4.1 Datasets and evaluation metrics
The proposed loss is evaluated in three different segmentation scenarios using synthetic, medical and color images. Datasets used in each of these problems are detailed below.

Synthetic images: We generated a synthetic dataset composed of 1100 images with two different circles of same size but different color, and different levels of Gaussian noise added to the whole image. The target object is the darker circle. From these images, 1000 were employed for training and 100 for validation. See Figure 4, first column for illustration. No pixel annotation is used during training (). The objective of this simple dataset is to compare our logbarrier extension with the penaltybased approach of [12], in three different constraint settings: 1) only size, 2) only centroid, and 3) both constraints. For the first two settings, we expect both methods to fail since the corresponding segmentation problems are underdetermined (e.g., size is not sufficient to determine which circle is the correct one). On the other hand, the third setting provides enough information to segment the right circle, and the main challenge here is the interplay between the two different constraints.

Medical images: We use the PROMISE12 [14] dataset, which was made available for the MICCAI 2012 prostate segmentation challenge. Magnetic Resonance (MR) images (T2weighted) of 50 patients with various diseases were acquired at different locations with several MRI vendors and scanning protocols. We hold 10 patients for validation and use the rest for training. As in [12], we use partial cross entropy for the weakly supervised setting, with weak labels derived from the ground truth by placing random dots inside the object of interest (Fig. 2). As this dataset is already fairly centerered around the object, we impose constraints only on the size of the object (see Eq. (8)).

Color images: We also evaluate our method on the Semantic Boundaries Dataset (SBD), which can be seen as a scaling of the original PascalVOC segmentation benchmark. We employed the 20 semantic categories described on PascalVOC. This dataset contains 11318 images fully annotated, divided into 8498 training and 2820 test images. We obtained the scribble annotations from the public repository from ScribbleSup [13] and took the intersection between both datasets for our experiments. Thus, a total of 8829 images were used for training, and 1449 for validation.
For the synthetic and PROMISE12 datasets, we resort to the common Dice index (DSC) = to evaluate the performance of tested methods. For PascalVOC, we follow most studies on this dataset and use the mean Intersection over Union (mIoU) metric.
4.2 Training and implementation details
Since the three datasets have very different characteristics, we considered a specific network architecture and training strategy for each of them.
For the dataset of synthetic images, we used the ENet network [20], as it has shown a good tradeoff between accuracy and inference time. The network was trained from scratch using the Adam optimizer and a batch size of 1. The initial learning rate was set to and decreased by half if validation performance did not improve for 20 epochs. Softmax temperature value was set to 5. To segment the prostate, we used the same settings as in [12]
, reporting their results for the penaltybased baselines. For PascalVOC, we used a Pytorch implementation of the FCN8s model
[16], built upon a pretrained VGG16 [26] from the Torchvision model zoo^{7}^{7}7https://pytorch.org/docs/stable/torchvision/models.html. We trained this network with a batch size of 1 and a constant learning rate of 10^{5} over time. Regarding the weights of the penalty and logbarrier terms we investigated several values and we obtained the best performances with 10^{4} and 10^{2}, respectively.For all tasks, we set to 5 the initial value of our extended logbarrier (Algorithm 1), and increased it by a factor of after each epoch. This strategy relaxes constraints in the first epochs so that the network can focus on learning from images, and then gradually makes these constraints harder as optimization progresses. Experiments on the toy example and PascalVOC were implemented in Python 3.7 with PyTorch 1.0.1 [21], whereas we followed the same specifications than [12] for the prostate experiments, employing Python 3.6 with PyTorch 1.0. All the experiments were carried out on a server equipped with a NVIDIA Titan V. The code is publicly available at https://github.com/LIVIAETS/extended_logbarrier.
4.3 Results
The following sections report the experimental results of the proposed extended logbarrier method on the three datasets introduced in Sec. 4.1.
4.3.1 Synthetic images
Results on the synthetic example for both penaltybased and the extended logbarrier approaches are reported in Table 1. As expected, constraining the size only is not sufficient to locate the correct circle (2nd and 5th columns in Fig. 4), which explains the very low DSC values in Figure 2(a). However, we observe that the two optimization methods lead to very different solutions: sparse unconnected dots for the penalty method and continuous shape for the logbarrier method. This difference could be due to the high gradients of the penalty method in the first iterations, which strongly biases the network toward bright pixels. On the other hand, constraining only the centroid results in a correctly located region, but without the correct boundary (3rd and 6th columns in Figure 4). The most interesting scenario is when both the size and centroid are constrained. In Figure 2(a), we can see that the penalty constrained network is unstable during training, and has worse results than the logbarrier method (4th and 7th columns). This demonstrates the barrier’s effectiveness to prevent predictions from going out of bounds (Fig. 1), thereby making optimization more stable.
Constraints  

Method  Size  Centroid  Size & Centroid 
Penalty [12]  0.0601  0.3197  0.8514 
Extended logbarrier  0.0018  0.4347  0.9574 
4.3.2 PROMISE12 dataset
Quantitative results on the prostate segmentation task are reported in Table 2 (left column). If no prior information is inferred, i.e., only scribbles, the trained model completely fails to achieve a satisfactory performance, with a mean Dice coefficient of 0.032. It can be observed that integrating the target size during training significantly improves performance. While constraining the predicted segmentation with a penaltybased method [12] achieves a DSC value of nearly 0.83, imposing the constraints with our logbarrier extension increased the performance by an additional 2%. The use of the extended logbarrier to constraint the CNN predictions reduces the gap towards the full supervised model, with only 4% of difference between both.
In terms of optimization, the extended log barrier method needs more iterations to converge, eventually surpassing the penaltybased method (Fig. 2(b)). This may be due to the scheduling factor of the extended logbarrier, whose contribution slowly increases over time. This avoids large gradients early in the training, which can overshot the minimum, failing to converge. On the other hand, when becomes a high value, the constrained function has been slowly pushed toward the desired bounds, leading to small gradient updates. In this point, the barrier will prevent the network to predict segmentations out of bounds, ensuring stability and better performance.
Dataset  
Method  PROMISE12 (DSC)  VOC2012 (mIoU) 
Partial crossentropy  0.032 (0.015)  48.48 (14.88) 
w/ penalty [12]  0.830 (0.057)  52.22 (14.94) 
w/ extended logbarrier  0.852 (0.038)  53.40 (14.62) 
Full supervision  0.891 (0.032)  59.87 (16.94) 
Mean and standard deviation on the validation set of the PROMISE12 and PascalVOC datasets when networks are trained with several levels of supervision.
4.3.3 PascalVOC
Table 2 (right column) compares the numeric results of our approach with those of only scribble annotations and naive penalty approaches, in the scenario of size constraints. For reference, we also include the results of full supervised trained, which serves as an upper bound. From the results, we can see that the penaltybased method improves the performance of scribble annotations by approximately 4%, in terms of mIoU. If we use the proposed extended logbarrier method, the mIoU increases up to 53.4%, representing a 1.2% of improvement with respect the penaltybases strategy and only 6.4% of gap compared to full supervision. Visual results (Fig. 6) show that the proposed framework for constraining the CNN training helps to reduce oversegmentation, reducing the amount of false positives.
5 Conclusion
We introduced an extended logbarrier to constrain deep CNNs, which avoids the difficult step in classical interior point methods of finding an initial feasible solution. We demonstrate its effectiveness over a penaltybased method [12], on several segmentation tasks and with different constraints settings, including linear and nonlinear functions.
In our experiments, we derived size bounds and centroids from segmentation ground truth. Future work could investigate automated techniques to obtain this information directly from input images, for instance using a regression network. Another interesting extension of this work would be to test a broader set of constraints, for example, related to region connectivity or compactness.
Acknowledgements
We gratefully thank NVIDIA for its GPU donations (TITAN V and GTX 1080 Ti). This work is supported by the National Science and Engineering Research Council of Canada (NSERC), discovery grant program.
References

[1]
A. L. Bearman, O. Russakovsky, V. Ferrari, and F. Li.
What’s the point: Semantic segmentation with point supervision.
In
European Conference on Computer Vision (ECCV)
, pages 549–565, 2016.  [2] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.
 [3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
 [4] R. Fletcher. Practical Methods of Optimization. John Wiley & Sons, 1987.
 [5] P. Gill, W. Murray, and M. Wright. Practical Optimization. Academic Press, 1981.

[6]
R. Girshick, J. Donahue, T. Darrell, and J. Malik.
Rich feature hierarchies for accurate object detection and semantic
segmentation.
In
Conference on computer vision and pattern recognition
, pages 580–587, 2014.  [7] I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press, 2016.
 [8] L. Gorelick, F. R. Schmidt, and Y. Boykov. Fast trust region for segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1714–1721, 2013.

[9]
M. Hardt, B. Recht, and Y. Singer.
Train faster, generalize better: Stability of stochastic gradient descent.
InInternational Conference on Machine Learning (ICML)
, pages 1225–1234, 2016. 
[10]
F. S. He, Y. Liu, A. G. Schwing, and J. Peng.
Learning to play in a day: Faster deep reinforcement learning by optimality tightening.
In International Conference on Learning Representations (ICLR), pages 1–13, 2017.  [11] Z. Jia, X. Huang, E. I. Chang, and Y. Xu. Constrained deep weak supervision for histopathology image segmentation. IEEE Transactions on Medical Imaging, 36(11):2376–2388, 2017.
 [12] H. Kervadec, J. Dolz, M. Tang, E. Granger, Y. Boykov, and I. B. Ayed. Constrainedcnn losses for weakly supervised segmentation. Medical Image Analysis, 2019.
 [13] D. Lin, J. Dai, J. Jia, K. He, and J. Sun. Scribblesup: Scribblesupervised convolutional networks for semantic segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 3159–3167, 2016.
 [14] G. Litjens, R. Toth, W. van de Ven, C. Hoeks, S. Kerkstra, B. van Ginneken, G. Vincent, G. Guillard, N. Birbeck, J. Zhang, et al. Evaluation of prostate segmentation algorithms for mri: the promise12 challenge. Medical Image Analysis, 18(2):359–373, 2014.
 [15] G. J. S. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I. Sánchez. A survey on deep learning in medical image analysis. Medical Image Analysis, 42:60–88, 2017.
 [16] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431–3440, 2015.
 [17] P. MárquezNeila, M. Salzmann, and P. Fua. Imposing Hard Constraints on Deep Networks: Promises and Limitations. In CVPR Workshop on Negative Results in Computer Vision, pages 1–9, 2017.
 [18] M. Niethammer and C. Zach. Segmentation with area constraints. Medical Image Analysis, 17(1):101–112, 2013.
 [19] G. Papandreou, L. Chen, K. P. Murphy, and A. L. Yuille. Weaklyand semisupervised learning of a deep convolutional network for semantic image segmentation. In International Conference on Computer Vision (ICCV), pages 1742–1750, 2015.
 [20] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello. Enet: A deep neural network architecture for realtime semantic segmentation. arXiv preprint arXiv:1606.02147, 2016.
 [21] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017.
 [22] D. Pathak, P. Krahenbuhl, and T. Darrell. Constrained convolutional neural networks for weakly supervised segmentation. In International Conference on Computer Vision (ICCV), pages 1796–1804, 2015.
 [23] M. Rajchl, M. C. Lee, O. Oktay, K. Kamnitsas, J. PasseratPalmbach, W. Bai, M. Damodaram, M. A. Rutherford, J. V. Hajnal, B. Kainz, et al. Deepcut: Object segmentation from bounding box annotations using convolutional neural networks. IEEE Transactions on Medical Imaging, 36(2):674–683, 2017.
 [24] S. N. Ravi, T. Dinh, V. Sai, R. Lokhande, and V. Singh. Constrained deep learning using conditional gradient and applications in computer vision. arXiv:1803.0645, 2018.
 [25] K. Simonyan and A. Zisserman. Twostream convolutional networks for action recognition in videos. In Neural Information Processing Systems (NeurIPS), pages 568–576, 2014.
 [26] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. In International Conference on Learning Representations (ICLR), 2015.
 [27] M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers. Normalized Cut Loss for Weaklysupervised CNN Segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1818–1827, 2018.
 [28] M. Tang, F. Perazzi, A. Djelouah, I. Ben Ayed, C. Schroers, and Y. Boykov. On Regularized Losses for Weaklysupervised CNN Segmentation. In European Conference on Computer Vision (ECCV), Part XVI, pages 524–540, 2018.
 [29] T. B. Trafalis, T. A. Tutunji, and N. P. Couellan. Interior point methods for supervised training of artificial neural networks with bounded weights. In Network Optimization, pages 441–470, 1997.
 [30] J. Weston, F. Ratle, H. Mobahi, and R. Collobert. Deep learning via semisupervised embedding. In Neural Networks: Tricks of the Trade, pages 639–655. Springer, 2012.
Proof for proposition 2
In this section, we provide a detailed proof for the dualitygap bound for proposal 2 in the paper. Recall our unconstrained approximation for inequalityconstrained CNN:
(10) 
where is our logbarrier extension, with strictly positive. Let be the solution of problem (10) and the corresponding vector of implicit dual variables given by:
(11) 
We assume that verifies approximately^{8}^{8}8When optimizing unconstrained loss via stochastic gradient descent (SGD), there is no guarantee that the obtained solution verifies exactly the optimality conditions. the optimality condition for a minimum of (10):
(12) 
It is easy to verify that each dual variable corresponds to the derivative of the logbarrier extension at : . Therefore, (12) means that verifies approximately the optimality condition for the Lagrangian corresponding to the original inequalityconstrained problem when :
(13) 
It is also easy to check that the implicit dual variables defined in (11) corresponds to a feasible dual, i.e., elementwise. Therefore, the dual function evaluated at is:
which yields the duality gap associated with primaldual pair :
(14) 
Now, to prove that this duality gap is upperbounded by , we consider three cases for each term in the sum in (14) and verify that, for all the cases, we have .
In all the three cases, we have . Summing this inequality over gives . Using this inequality in (14) yields the following upper bound on the duality gap associated with primal and implicit dual feasible for the original inequalityconstrained problem: ∎
This bound yields suboptimality certificates for feasible solutions of our approximation in (10). If the solution that we obtain from our unconstrained problem (10) is feasible, i.e., it satisfies constraints , , then is suboptimal for the original inequality constrained problem: . Our upperbound result can be viewed as an extension of the dualitygap equality for the standard logbarrier function [3]. Our result applies to the general context of convex optimization. In deep CNNs, of course, a feasible solution for our approximation may not be unique and is not guaranteed to be a global optimum as and the constraints are not convex.
Training curves for the three datasets
Figure 7 shows the learning curves on the training set.
In Subfigure 6(b), the behaviour of the network trained with the extended logbarrier is particularly striking – the initially low level of performance gradually achieves a considerably higher training dice than its penalty counterpart.
Visual results on PROMISE12 validation set
Figure 8 shows more results on the validation set.
Visual results on PascalVOC validation set
Figure 9 introduces more results on PascalVOC validation set.
Comments
There are no comments yet.