1 Introduction
In statistical learning theory, regularization techniques are typically leveraged to achieve the tradeoff between empirical error minimization and the control of the model complexity
[26]. In contrast to classical convex empirical risk minimization where regularization can rule out trivial solutions, regularization plays a rather different role in deep learning due to the highly nonconvex optimization property
[29]. In this paper, we firstly review two effective and prestigious regularization branches for deep neural networks that can elegantly generalize from supervised learning to semisupervised setting.
Adversarial Training [5, 16] can provide an additional regularization beyond that provided by other generic regularization strategies, such as dropout, pretraining and model averaging. However, recent works [31, 25] demonstrated that this kind of training method holds a tradeoff between the robustness and accuracy, limiting the efficacy of the adversarial regularization. In addition, Virtual Adversarial Training (VAT) [19]
can be regarded as a natural extension of adversarial training to semisupervised setting without the leverage of label information by imposing local smoothness on the classifier. This strategy has achieved great success in image classification
[19], text classification [17] as well as node classification [22]. TangentNormal Adversarial Regularization (TNAR) [28] extended VAT by taking the data manifold into consideration and applied VAT along the tangent space and the orthogonal normal space of the data manifold, outperforming other stateoftheart semisupervised approaches.MixUp [30]
augmented the training data by incorporating the prior knowledge that linear interpolation of input vectors should lead to linear interpolation of the associated targets, accomplishing consistent improvement of generalization on image, speech and tabular data. MixMatch
[1] extended MixUp to semisupervised tasks by guessing lowentropy labels for dataaugmented unlabeled examples and mixing labeled and unlabeled data using MixUp. In contrast with VAT, MixMatch [1] utilizes one specific form of consistency regularization, i.e., using the standard data augmentation for images, such as random horizontal flips and crops, rather than computing adversarial perturbations to smooth the posterior distribution of the classifier.Nevertheless, most methods for the design of regularization, including the aforementioned approaches, assume that the training samples are drawn independently and identically from an unknown data generating distribution. For instance, Support Vector Machine (SVM), BackPropagation (BP) for Neural Networks, and many other common algorithms implicitly make this assumption as part of their derivation. However, this i.i.d. assumption is commonly violated in realistic scenarios where batches or subgroups of training samples are likely to have internal correlations. In particular, Dundar et al.
[4] demonstrated that accounting for the correlations in realworld training data leads to statistically significant improvements in accuracy. Similarly, PeerRegularized Networks (PeerNet) [23] applied graph convolutions [9, 27] to harness information from a graph of peer samples so as to improve the adversarial robustness of deep neural networks. The resulting nonlocal propagation in the new model acted as a strong regularization that dramatically reduce the vulnerability against adversarial attacks. Inspired by these ideas, we aim to design a general regularization strategy that can fully utilize the internal relationship between samples by explicitly constructing a graph within a batch in order to further improve the generalization in both supervised classification and semisupervised settings.In this paper, we propose the Patchlevel Neighborhood Interpolation (Pani) for deep neural networks, serving as a simple yet effective regularization to improve the generalization of classifiers.We firstly construct a graph in each batch during minibatch stochastic gradient decent training for deep neural networks, according to the correlation between the patchlevel features in the different layers of networks rather than among samples directly. The constructed graph is expected to capture the relationship of each patch features in both input and hidden layers. Then we apply linear interpolation on the neighbors of current patch element to refine its representation by additionally leveraging the neighborhood information. Furthermore, we customize our Neighbor Interpolation Method into Virtual Adversarial Regularization and MixUp regularization respectively, resulting in Pani VAT and Pani MixUp.
For the Pani VAT, we reformulate the construction of adversarial perturbation, transforming from solely depending on the current sample to the linear combination of neighboring patch features. The resulting adversarial perturbation can leverage the information of neighboring features for all samples within a batch, thus providing more informative adversarial smoothness in semisupervised setting. Similarly, in the Pani MixUp, we extend MixUp from image level to patch level by imposing random interpolation between patches in a neighborhood to better leverage more finegrained supervised signal. We conduct extensive experiments to demonstrate that both of the two derived regularization strategies can outperform other stateoftheart approaches in both supervised and semisupervised tasks.
Our contributions can be summarized as follow:

To the best of our knowledge, we are the first to propose a general regularization method by explicitly constructing a patchlevel graph that focuses on leveraging the information of correlations between samples in order to improve the generalization.

The resulting Patchlevel Neighborhood Interpolation can provide a framework that can extend the current main branches of regularization, i.e., adversarial regularization and MixUp, achieving thestateoftheart performance over both supervised and semisupervised settings.

Patchlevel Neighborhood Interpolation paves a way toward better leveraging the neighborhood information on the design of machine learning modules.
2 Preliminary
2.1 Virtual Adversarial Training
VAT [19]
extends the adversarial training by utilizing “virtual” adversarial perturbations to construct adversarial smoothness, obtaining effective improvement on accuracy in semisupervised learning (SSL). Particularly, VAT replaces true labels
of samples in the formulation of adversarial training by current estimate
from model:(1) 
where measures the divergence between two distributions and . is the adversarial perturbation depending on current sample than can further provide the smoothness in SSL. Then the VAT regularization could be derived from the inner maxization:
(2) 
One elegant part of VAT is that it utilized the secondorder Taylor’s expansion of virtual adversarial loss to compute the perturbation , which can be computed efficiently by power iteration with finite difference. Once the desired perturbation
has been computed, we can conduct forward and back propagation to optimize the full loss function:
(3) 
where is the original supervised loss and is the hyperparameter to control the degree of virtual adversarial smoothness.
2.2 MixUp
Mixup [30] augmented the training data with linear interpolation on both input features and target. The resulting featuretarget vectors are shown as follow:
(4)  
where and are two featuretarget vectors drawn randomly from the training data. and . MixUp can be understood as a form of data augmentation that encourages decision boundaries to transit linearly between classes. It is a kind of generic regularization that provides a smoother estimate of uncertainty, yielding the improvement of generalization.
2.3 PeerRegularized Networks (PeerNet)
The centerpiece of PeerNet [23] is the learnable Peer Regularization (PR) layer designed to focus on improving the adversarial robustness of deep neural networks. PR layer can be flexibly added into the feature maps of deep models.
Let be matrices as the feature maps of images, where is the number of pixels and represents the dimension of each pixel. The core of PeerNet is to find the nearest neighboring pixels for each pixel among all the pixes of peer images via constructing a nearest neighbor graph in the dimensional space. Particularly, for the th pixel in the th image , the th nearest pixel neighbor can be denoted as taken from the pixel of the peer image . Then the learnable PR layer is constructed by a variant of Graph Attention Networks (GAT) [27]:
(5)  
where is the attention score determining the importance of the th pixel of the th peer image on the representation of current th pixel taken from the th image. Therefore, the resulting learnable PR layer involves nonlocal filtering by leveraging the wisdom of pixel neighbors from peer images, yielding robustness against adversarial attacks.
3 Patchlevel Neighborhood Interpolation
Inspired by the pixellevel nearest neighbor graph in PeerNet, we propose a more general patchlevel regularization that can easily extend from pixel to the whole image by adjusting the corresponding patch size. For instance, when we set patch size as 1, we in fact construct a graph based on features of each pixel, which is the same as the way of constructing a graph in PeerNet. Another flexible part of our method is that we can choose the arbitrary layer in a deep neural networks including the input layer and hidden layers. In the different layers, a flexible patch size can be chosen according to the size of receptive field in order to capture the different semantic information.
Concretely, for our Patchlevel Neighborhood Interpolation (Pani) shown in Figure 1, in the first step we deploy filtering operation for the whole images in a batch to determine the candidate peer images set for each image. For example, after the filtering, the candidate set can be established for the th image. The specific way of filtering can be achieved by retrieving the semantically nearest peer images or by random matching. In the meantime, we construct the whole patches set in the candidate peer images set by applying one special convolution to extract the corresponding patch in the different locations in an input or feature map .
Following the establishment of patch set , we construct nearest neighbor graph based on the cosine distance of patch features in order to find the neighbors of each patch in patch set for th image, with respect to its candidate set . Mathematically, following the definition in the PeerNet, let be the th patch on the input or feature map for the th image within one batch. Then denote the th nearest patch neighbor for as taken from the patch of the peer image in the candidate set .
Next, in order to leverage the knowledge from neighbors, different from graph attention mechanism in PeerNet, we apply more straightforward linear interpolation on the neighboring patches for the current patch . Then, the general formulation of our Patchlevel Neighborhood Interpolation can be presented as follow:
(6) 
where is the combination coefficient for the th patch of th image w.r.t its th patch neighbor. The choice of linear interpolation or combination is natural and simple yet effective, as shown in our experimental part. Additionally, Eq. 6 enjoys great computational advantage compared with the expensive cost of GAT in PeerNet. Finally, after the deconvolution on all the patches with new features, we can obtain the refined representation for th image, .
Note that our proposed method can explicitly combine the advantage of manifold regularization and nonlocal filtering in a flexible way, elaborated in the following.
Manifold Regularization
There are a flurry of papers introducing regularization from the classical manifold learning based on the assumption that the data can be modeled as a lowdimensional manifold in the data space. More importantly, Hinton et al. [6] and Ioffe et al. [7] demonstrated regularizers that work well in the input space can also be applied to the hidden layers of a deep network, which could further improve generalization performance. Our Patchlevel Neighborhood Interpolation can be easily extended from input to the hidden layers, enjoying the benefits of manifold regularization.
Nonlocal Filtering
Nonlocal filters have achieved great success image processing field by additionally encoding the knowledge of neighboring pixels and their relative locations. Same as the pixellevel neighboring correlations established in PeerNet [23], our patchlevel approach can still capture the knowledge of other neighboring patches within a batch, therefore yielding improvement of performance for the derived methods on various kinds of settings. Moreover, our Patchlevel Neighborhood Interpolation can also serve as a novel noni.i.d. regularization and can reasonably generalize well to broader settings especially when the natural correlation in the subgroup exists.
Now we customize our Patchlevel Neighborhood Interpolation into adversarial and Mixup that can significantly boost their performance.
3.1 Pani VAT
Based on our patchlevel framework, we can construct a novel Pani VAT that utilizes the combination or interpolation of patch neighbors for each sample to manipulate the “neighboring” perturbations, thus providing more informative adversarial smoothness in semisupervised setting. Combining Eq. 2 and Eq. 6, we reformulate our Pani VAT with perturbations on layers in a deep neural network as follows:
(7)  
where represents the classifier and denotes the input or hidden feature of input . indicates the perturbations in th layer of network. In particularly, when , we denotes the perturbations are only imposed on input feature, which is similar to the traditional (virtual) adversarial perturbations. represents the feature map imposed by perturbation in the way shown in Eq. 6. adjusts the importance of perturbations in different layers with the overall perturbations restrained in a ball.
Next, we can still utilize the similar power iteration and finite difference proposed in VAT [18] to compute the desired perturbation . Then the resulting full loss function is defined as:
(8) 
where can be attained after solving the optimization problem in Eq. 7.
Procedure
For the specific instantiation of our framework exhibiting in Figure 1 for our derived Pani VAT method, we present the procedure in the following:

Firstly, we construct the nearest neighbor graph on the images based on the cosine distance of second last feature through the classifier in the filter process. Construct the patch set through the convolution operation defined in a standard way.

Secondly, for the feature map on each considered layer, we still incorporate the nearest patch neighbors for each patch of each image among all the patches from the peer patches, i.e., the candidate set .

Conduct interpolation in the way shown in Eq. 6 as the nonlocal forward propagation.
Remark. As shown in the adversarial part of Figure 1, the rationality of our Pani VAT method lies in the fact that the constructed perturbations can entail more nonlocal information coming from the neighbors of current sample. Through the delicate patchlevel interpolation among neighbors of each patch, the resulting virtual adversarial perturbations are expected to construct more informative directions of smoothness, thus enhancing the performance of classifier in semisupervised setting.
3.2 Pani MixUp
To derive a finegrained Mixup, we conduct patchbased neighborhood method from our framework. The core formulation of Pani MixUp (PMU) can be formulated as:
(9)  
where denote the number of patches after filtering operation for each image and represents the importance of current element, such as image or patch while conducting MixUp. It should be noted that due to the unsymmetric property of in our framework, we should tune both the and in our experiments. For simplicity, we fix and only consider the as the hyperparameter to pay more attention to the importance of current patch, which is inspired by the similar approach in MixMatch [1]. For the first restraint in Eq. 9, we can achieve it through normalization according to the ratio of for the current element and for all the neighbors. Considering the physical meanings of in MixUp, we impose extra convex combination restraint, i,e, the second restriction in Eq. 9. Then the mixing patchtarget vectors in the Pani MixUp method can be presented as:
Procedure
The Pani MixUp applies the following procedures:

Construct the candidate set for the th image by random matching among all the images within one batch and then construct the patch set through the convolution operation mentioned before.

For the feature map on each target layer, we consider the nearest patch neighbors for each patch of each image among all the patches from the candidate set .

Conduct MixUp in the way shown in Eq. 10 among the neighbors of each patch over all the patches and their targets.

Conduct deconvolution operation on the patch set to return new representation of original input with the corresponding mixed target. Optimize the parameters of classifier on the attained data representation.
Remark. Different from the role of in the aforementioned Pani VAT where serves as the “combinational” perturbations, in our Pani MixUp approach, the physical meaning of is the linear interpolation coefficient to conduct MixUP. However, all the two customizations can be derived from one framework, namely our Patchlevel Neighborhood Interpolation .
4 Experiments
To demonstrate the superiority of our Patchlevel Neighborhood Interpolation, we conduct extensive experiments for both our Pani VAT and Pani MixUp Method on semisupervised and supervised settings, respectively.
4.1 Pani VAT
Implement Details
For fair comparison especially with VAT and its variants, such as VAT + SNTG [15] and TNAR [28], we choose the standard large convolutional network as classifier as in [19]. For the option of dataset, we focus on the standard semisupervised setting on CIFAR10 with 4,000 labeled data. Unless otherwise noted, all the experimental settings in our method are the identical with those in the Vanilla VAT [19]. In particular, we conduct our Pani VAT on input layer and one additional hidden layer, yielding two variants Pani VAT (input) and Pani VAT (+hidden). In Pani VAT (input), we choose patch size as 2, peer images as the candidate set for each image, to construct the patch nearest neighbor graph, perturbation size and adjustment coefficient as 2.0 and 1.0, respectively. For our Pani VAT (+hidden) method, we let , patch size as 2 and overall perturbation size . On the considered two layers, we set as 10 and 50, the adjustment coefficient as 1 and 0.5, respectively.
Method  CIFAR10 4,000 labels 
VAT [18]  
VAT + SNTG [15]  
model [11]  
Mean Teacher [24]  
CCLP [8]  
ALI [3]  
Improved GAN [21]  
Tripple GAN [14]  
Bad GAN [2]  
LGAN [20]  
Improved GAN + JacobRegu + tangent [10]  
Improved GAN + ManiReg [12]  
TNAR [28]  
Pani VAT (input)  12.20 
Pani VAT (+hidden) 
Our Results
Table 1 presents the stateoftheart performance achieved by Pani VAT (+hidden) compared with other baselines on CIFAR10. We focus on the baseline methods especially along the direction of variants of VAT and refer to the results from TNAR method [28], the previous stateoftheart variant of VAT that additionally leverages the data manifold to decompose the directions of virtual adversarial smoothness. It is worthy of remarking that the performance of relevant GANbased approaches, such as Localized GAN (LGAN) [20] as well as TNAR, in Table 1 mainly rely on the modeling data manifold by a generative model. By contrast, our approach does not additionally depend on this requirement and can still outperform these baselines. In addition, our Pani VAT (+hidden) achieves slight improvement compared with Pani VAT (input), verifying the superiority of manifold regularization mentioned in our framework. Although Pani VAT (input), serving as an ablation study, obtains the comparable performance with TNAR, it still outperforms other baselines without the additional leverage of the modeling of data manifold.
Dataset  Model  Aug  ERM  Mixup()  Ours(input) 
CIFAR10  PreAct ResNet18  5.43 0.16  4.24 0.16  3.93 0.12  
12.81 0.46  9.88 0.25  8.12 0.09  
PreActResNet34  5.15 0.12  3.72 0.20  3.36 0.15  
12.67 0.26  10.60 0.57  8.13 0.32  
WideResNet2810  4.59 0.06  3.21 0.13  3.02 0.11  
8.78 0.20  8.08 0.39  5.79 0.03  
CIFAR100  PreAct ResNet18  24.96 0.51  22.15 0.72  20.90 0.21  
39.64 0.65  41.96 0.27  32.03 0.34  
PreActResNet34  24.85 0.14  21.49 0.68  19.46 0.29  
39.41 0.80  41.96 0.24  34.48 0.86  
WideResNet2810  21.00 0.09  18.58 0.16  17.39 0.16  
31.91 0.77  35.16 0.33  27.71 0.63 
Analysis of Computation Cost
Another noticeable advantage of our approach is the negligible increase of computation cost compared with Vanilla VAT. In particular, one crucial operation in our approach is the construction of patch set , which can be accomplished efficiently by the convolution operation. The restoration of images from constructed patches can be easily achieved by the corresponding deconvolution similarly. Additionally, the index of nearest neighbor graph can be efficiently attained through topk
operation in Tensorflow or Pytorch. We conduct further sensitivity analysis on the computational cost of our method with respect to other parameters, i.e.,
(number of peer images in the filter process), (number of patch neighbors), (number of layers imposed by “neighboring” perturbations) and patch size .As shown in Figure 2, the variation of all parameters has negligible impact on the training time each epoch compared with Vanilla VAT except the number of perturbed layers. The increasing of computational cost presents an almost linear tendency with the increasing of the number of perturbed layer as the amount of floatingpoint calculation is proportional to the number of perturbation elements, i.e.,
, if we temporarily neglect the difference of time in the backpropagation process for different layers. Combined with results from Table
1 and Figure 2, we argue that the better performance can be expected if we construct perturbations on more hidden layers at the cost of more computation.4.2 Pani MixUp
Implementation Details
The experimental settings in this section are strictly followed by those in Vanilla MixUp [30] and Vanilla MixMatch [1] to pursue fair comparison. We conduct supervised image classification on CIFAR10 and CIFAR100 datasets to further evaluate the generalization performance of Pani MixUp. In particular, we compare ERM (Empirical Risk Minimization, i.e,, normal training), MixUp training and our approach for different neural architectures: PreAct ResNet18, PreAct ResNet34 and WideResNet2810. For fair comparison with input MixUp, we conduct our approach only on input layer and the better performance can be expected naturally if we consider more layers. More specifically, in our Pani MixUp method for all neural architectures, we uniformly choose patch size 16, parameter in Beta distribution as 2.0 for the data augmentation setting while we opt patch size 8, on the settings without data augmentation.
Mask Mechanism
To extend the flexibility of Pani MixUp, we additionally introduce the mask mechanism on the interpolation coefficient to random drop with certain ratio. The mask mechanism can be viewed as dropout or enforcing sparsity , which can help to abandon redundant information while conducting patchlevel MixUp. We set the mask ratio as 0.6 in the data augmentation setting while fixing the ratio as 0.4 in the scenario without data augmentation.
Our Results
Table 2 presents the consistent superiority of Pani MixUp over ERM (normal training) as well as Vanilla MixUp over different deep neural network architectures. It is worthy of noting that the superiority of our approach in the setting without data augmentation can be more easily observed than that with data augmentation. Another interesting phenomenon is that MixUp suffers from one kind of collapse for some deep neural networks without data augmentation as the performance of MixUp is even inferior to the ERM on CIFAR100 without data augmentation. By contrast, our approach exhibits consistent advantage over various settings.
Analysis of Computation Cost
To provide a comprehensive understanding about the computation cost of our method, we plot the tendency between training time under 200 epoch and the test accuracy as shown in Figure 3, in which we can better observe the computational efficiency as well as the better performance of our approach. To be more specific, we choose ResNet18 as the basic test model and conduct the experiment about the variation of test accuracy while training to compare the efficacy of different approaches. We can easily observe the consistent advantage of performance of our approach and comparable training time under the same number of epochs. One interesting point about the “collapse” phenomenon shown in the fourth subplot of Figure 3 reveals the process of this issue. After the complete of learning rate decay around 50th epoch, the performance of MixUp surprisingly drops steadily to the final result that is even inferior to original ERM. By contrast, Neighborhood Method achieves consistent improvement on the generalization without any “collapse” issue.
Further Extension to the MixMatch
To further demonstrate the superiority of our Neighborhood Interpolation MixUp, we embed our approach into MixMatch [1], the current stateoftheart approach that naturally extends MixUp to semisupervised setting. The resulting approach, Pani MixMatch, elegantly replaces the MixUp part in the MixMatch with our Pani MixUp, thus imposing patch neighbor Mixup by additionally incorporating patch neighborhood information. Results shown in Table 3 demonstrate that Pani MixMatch can further improve the performance of MixMatch in the standard semisupervised setting, thus verifying the effectiveness and flexibility of our Patchlevel Neighborhood Interpolation.
Methods  CIFAR10 
PiModel [11]  
PseudoLabel [13]  
Mixup [30]  
VAT [18]  
MeanTeacher [24]  
MixMatch [1]  
MixMatch(ours)  
Pani MixMatch 
5 Discussion
The recent tendency of the design of regularization attaches more importance on the consistency and flexibility on various kinds of settings. For instance, Virtual Adversarial Training is a natural extension for Adversarial Training to the semisupervised setting by constructing virtual adversarial smoothness. MixMatch unified the dominant approaches relevant to MixUp and then achieved remarkable performance on the semisupervised scenario by simultaneously considering the MixUp operation on both labeled and unlabeled data. Along this way, we focus on the proposal of a general regularization motivated by additional leverage of neighboring information existing in the subgroup of samples, e.g., within one batch, which can elegantly extend previous prestigious regularization approaches and generalize well over both supervised and semisupervised setting.
6 Conclusion
In this paper, we firstly analyze the benefit of leveraging noni.i.d information while developing more efficient regularization for deep neural networks, thus proposing a general and flexible patch neighbor regularizer called Patchlevel Neighborhood Interpolation by interpolating the neighborhood representation. Furthermore, we customize our Patchlevel Neighborhood Interpolation into VAT and MixUp, respectively. Extensive experiments have verified the effectiveness of the two derived approaches, therefore demonstrating the benefit of our Patchlevel Neighborhood Interpolation. Our work paves a way toward better understanding and leveraging the knowledge of relationship between samples to design better regularization and improve generalization over a wide range of settings.
Since the proposed Pani framework is general and flexible, more applications could be considered in the future, such as adversarial training for improving model robustness and natural language processing tasks. Also, the theoretical properties of Pani should also be analyzed.
References
 [1] David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. Mixmatch: A holistic approach to semisupervised learning. Conference on Neural Information Processing Systems, 2019.
 [2] Zihang Dai, Zhilin Yang, Fan Yang, William W Cohen, and Ruslan R Salakhutdinov. Good semisupervised learning that requires a bad gan. In Advances in Neural Information Processing Systems, pages 6510–6520, 2017.
 [3] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and Aaron Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
 [4] Murat Dundar, Balaji Krishnapuram, Jinbo Bi, and R Bharat Rao. Learning classifiers when the training data is not iid. In IJCAI, pages 756–761, 2007.
 [5] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. International Conference on Learning Representations, 2014.
 [6] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
 [7] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, 2015.
 [8] Konstantinos Kamnitsas, Daniel C Castro, Loic Le Folgoc, Ian Walker, Ryutaro Tanno, Daniel Rueckert, Ben Glocker, Antonio Criminisi, and Aditya Nori. Semisupervised learning via compact latent space clustering. arXiv preprint arXiv:1806.02679, 2018.
 [9] Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. International Conference on Learning Representations, 2016.
 [10] Abhishek Kumar, Prasanna Sattigeri, and Tom Fletcher. Semisupervised learning with gans: Manifold invariance with improved inference. In Advances in Neural Information Processing Systems, pages 5540–5550, 2017.
 [11] Samuli Laine and Timo Aila. Temporal ensembling for semisupervised learning. arXiv preprint arXiv:1610.02242, 2016.
 [12] Bruno Lecouat, ChuanSheng Foo, Houssam Zenati, and Vijay R Chandrasekhar. Semisupervised learning with gans: Revisiting manifold regularization. arXiv preprint arXiv:1805.08957, 2018.
 [13] DongHyun Lee. Pseudolabel: The simple and efficient semisupervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, page 2, 2013.
 [14] Chongxuan Li, Kun Xu, Jun Zhu, and Bo Zhang. Triple generative adversarial nets. arXiv preprint arXiv:1703.02291, 2017.
 [15] Yucen Luo, Jun Zhu, Mengxi Li, Yong Ren, and Bo Zhang. Smooth neighbors on teacher graphs for semisupervised learning. arXiv preprint arXiv:1711.00258, 2017.
 [16] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, 2017.
 [17] Takeru Miyato, Andrew M Dai, and Ian Goodfellow. Adversarial training methods for semisupervised text classification. International Conference on Learning Representations, 2016.
 [18] Takeru Miyato, Shinichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semisupervised learning. arXiv preprint arXiv:1704.03976, 2017.
 [19] Takeru Miyato, Shinichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semisupervised learning. IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018.

[20]
GuoJun Qi, Liheng Zhang, Hao Hu, Marzieh Edraki, Jingdong Wang, and XianSheng
Hua.
Global versus localized generative adversarial nets.
In
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2018.  [21] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
 [22] Ke Sun, Zhouchen Lin, Hantao Guo, and Zhanxing Zhu. Virtual adversarial training on graph convolutional networks in node classification. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 431–443. Springer, 2019.
 [23] Jan Svoboda, Jonathan Masci, Federico Monti, Michael M Bronstein, and Leonidas Guibas. Peernets: Exploiting peer wisdom against adversarial attacks. International Conference on Learning Representations, 2018.
 [24] Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weightaveraged consistency targets improve semisupervised deep learning results. In Advances in neural information processing systems, pages 1195–1204, 2017.

[25]
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and
Aleksander Madry.
Robustness may be at odds with accuracy.
International Conference on Learning Representations, 2018. 
[26]
Vladimir N Vapnik and A Ya Chervonenkis.
On the uniform convergence of relative frequencies of events to their probabilities.
In Measures of complexity, pages 11–30. Springer, 2015.  [27] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. International Conference on Learning Representations, 2017.
 [28] Bing Yu, Jingfeng Wu, Jinwen Ma, and Zhanxing Zhu. Tangentnormal adversarial regularization for semisupervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10676–10684, 2019.
 [29] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. International Conference on Learning Representations, 2016.
 [30] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David LopezPaz. mixup: Beyond empirical risk minimization. Conference on Neural Information Processing Systems, 2017.
 [31] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan. Theoretically principled tradeoff between robustness and accuracy. International Conference on Machine Learning, 2019.
Comments
There are no comments yet.