Yanbo Fan

is this you? claim profile


  • Learning with Average Top-k Loss

    In this work, we introduce the average top-k (AT_k) loss as a new ensemble loss for supervised learning, which is the average over the k largest individual losses over a training dataset. We show that the AT_k loss is a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss, but can combines their advantages and mitigate their drawbacks to better adapt to different data distributions. Furthermore, it remains a convex function over all individual losses, which can lead to convex optimization problems that can be solved effectively with conventional gradient-based method. We provide an intuitive interpretation of the AT_k loss based on its equivalent effect on the continuous individual loss functions, suggesting that it can reduce the penalty on correctly classified data. We further give a learning theory analysis of MAT_k learning on the classification calibration of the AT_k loss and the error bounds of AT_k-SVM. We demonstrate the applicability of minimum average top-k learning for binary classification and regression using synthetic and real datasets.

    05/24/2017 ∙ by Yanbo Fan, et al. ∙ 0 share

    read it

  • Robust Localized Multi-view Subspace Clustering

    In multi-view clustering, different views may have different confidence levels when learning a consensus representation. Existing methods usually address this by assigning distinctive weights to different views. However, due to noisy nature of real-world applications, the confidence levels of samples in the same view may also vary. Thus considering a unified weight for a view may lead to suboptimal solutions. In this paper, we propose a novel localized multi-view subspace clustering model that considers the confidence levels of both views and samples. By assigning weight to each sample under each view properly, we can obtain a robust consensus representation via fusing the noiseless structures among views and samples. We further develop a regularizer on weight parameters based on the convex conjugacy theory, and samples weights are determined in an adaptive manner. An efficient iterative algorithm is developed with a convergence guarantee. Experimental results on four benchmarks demonstrate the correctness and effectiveness of the proposed model.

    05/22/2017 ∙ by Yanbo Fan, et al. ∙ 0 share

    read it

  • Self-Paced Learning: an Implicit Regularization Perspective

    Self-paced learning (SPL) mimics the cognitive mechanism of humans and animals that gradually learns from easy to hard samples. One key issue in SPL is to obtain better weighting strategy that is determined by minimizer function. Existing methods usually pursue this by artificially designing the explicit form of SPL regularizer. In this paper, we focus on the minimizer function, and study a group of new regularizer, named self-paced implicit regularizer that is deduced from robust loss function. Based on the convex conjugacy theory, the minimizer function for self-paced implicit regularizer can be directly learned from the latent loss function, while the analytic form of the regularizer can be even known. A general framework (named SPL-IR) for SPL is developed accordingly. We demonstrate that the learning procedure of SPL-IR is associated with latent robust loss functions, thus can provide some theoretical inspirations for its working mechanism. We further analyze the relation between SPL-IR and half-quadratic optimization. Finally, we implement SPL-IR to both supervised and unsupervised tasks, and experimental results corroborate our ideas and demonstrate the correctness and effectiveness of implicit regularizers.

    06/01/2016 ∙ by Yanbo Fan, et al. ∙ 0 share

    read it

  • Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning

    In existing visual representation learning tasks, deep convolutional neural networks (CNNs) are often trained on images annotated with single tags, such as ImageNet. However, a single tag cannot describe all important contents of one image, and some useful visual information may be wasted during training. In this work, we propose to train CNNs from images annotated with multiple tags, to enhance the quality of visual representation of the trained CNN model. To this end, we build a large-scale multi-label image database with 18M images and 11K categories, dubbed Tencent ML-Images. We efficiently train the ResNet-101 model with multi-label outputs on Tencent ML-Images, taking 90 hours for 60 epochs, based on a large-scale distributed deep learning framework,i.e.,TFplus. The good quality of the visual representation of the Tencent ML-Images checkpoint is verified through three transfer learning tasks, including single-label image classification on ImageNet and Caltech-256, object detection on PASCAL VOC 2007, and semantic segmentation on PASCAL VOC 2012. The Tencent ML-Images database, the checkpoints of ResNet-101, and all the training codehave been released at https://github.com/Tencent/tencent-ml-images. It is expected to promote other vision tasks in the research and industry community.

    01/07/2019 ∙ by Baoyuan Wu, et al. ∙ 0 share

    read it

  • Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

    In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises. We propose to fool an image captioning system to generate some targeted partial captions for an image polluted by adversarial noises, even the targeted captions are totally irrelevant to the image content. A partial caption indicates that the words at some locations in this caption are observed, while words at other locations are not restricted.It is the first work to study exact adversarial attacks of targeted partial captions. Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables. Both the generalized expectation maximization algorithm and structural SVMs with latent variables are then adopted to optimize the problem. The proposed methods generate very successful at-tacks to three popular CNN+RNN based image captioning models. Furthermore, the proposed attack methods are used to understand the inner mechanism of image captioning systems, providing the guidance to further improve automatic image captioning systems towards human captioning.

    05/10/2019 ∙ by Yan Xu, et al. ∙ 0 share

    read it