Xavier Boix

is this you? claim profile


  • Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images

    The human ability to recognize objects is impaired when the object is not shown in full. "Minimal images" are the smallest regions of an image that remain recognizable for humans. Ullman et al. 2016 show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of-the-art deep neural networks (DNNs), and are much more prominent in DNNs. We found many cases where DNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in DNNs. Our results thus reveal a new failure mode of DNNs that also affects humans to a much lesser degree. They expose how fragile DNN recognition ability is for natural images even without adversarial patterns being introduced. Bringing the robustness of DNNs in natural images to the human level remains an open challenge for the community.

    02/08/2019 ∙ by Sanjana Srivastava, et al. ∙ 8 share

    read it

  • Theory IIIb: Generalization in Deep Networks

    A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent results by Srebro et al. provide a satisfying solution of the puzzle for linear networks used in binary classification. They prove that minimization of loss functions such as the logistic, the cross-entropy and the exp-loss yields asymptotic, "slow" convergence to the maximum margin solution for linearly separable datasets, independently of the initial conditions. Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss. The result holds for exponential-type losses but not for the square loss. In particular, we prove that the weight matrix at each layer of a deep network converges to a minimum norm solution up to a scale factor (in the separable case). Our analysis of the dynamical system corresponding to gradient descent of a multilayer network suggests a simple criterion for ranking the generalization performance of different zero minimizers of the empirical loss.

    06/29/2018 ∙ by Tomaso Poggio, et al. ∙ 2 share

    read it

  • Herding Generalizes Diverse M -Best Solutions

    We show that the algorithm to extract diverse M -solutions from a Conditional Random Field (called divMbest [1]) takes exactly the form of a Herding procedure [2], i.e. a deterministic dynamical system that produces a sequence of hypotheses that respect a set of observed moment constraints. This generalization enables us to invoke properties of Herding that show that divMbest enforces implausible constraints which may yield wrong assumptions for some problem settings. Our experiments in semantic segmentation demonstrate that seeing divMbest as an instance of Herding leads to better alternatives for the implausible constraints of divMbest.

    11/14/2016 ∙ by Ece Ozkan, et al. ∙ 0 share

    read it

  • Comment on "Ensemble Projection for Semi-supervised Image Classification"

    In a series of papers by Dai and colleagues [1,2], a feature map (or kernel) was introduced for semi- and unsupervised learning. This feature map is build from the output of an ensemble of classifiers trained without using the ground-truth class labels. In this critique, we analyze the latest version of this series of papers, which is called Ensemble Projections [2]. We show that the results reported in [2] were not well conducted, and that Ensemble Projections performs poorly for semi-supervised learning.

    08/29/2014 ∙ by Xavier Boix, et al. ∙ 0 share

    read it

  • SEEDS: Superpixels Extracted via Energy-Driven Sampling

    Superpixel algorithms aim to over-segment the image by grouping pixels that belong to the same object. Many state-of-the-art superpixel algorithms rely on minimizing objective functions to enforce color ho- mogeneity. The optimization is accomplished by sophis- ticated methods that progressively build the superpix- els, typically by adding cuts or growing superpixels. As a result, they are computationally too expensive for real-time applications. We introduce a new approach based on a simple hill-climbing optimization. Starting from an initial superpixel partitioning, it continuously refines the superpixels by modifying the boundaries. We define a robust and fast to evaluate energy function, based on enforcing color similarity between the bound- aries and the superpixel color histogram. In a series of experiments, we show that we achieve an excellent com- promise between accuracy and efficiency. We are able to achieve a performance comparable to the state-of- the-art, but in real-time on a single Intel i7 CPU at 2.8GHz.

    09/16/2013 ∙ by Michael Van den Bergh, et al. ∙ 0 share

    read it

  • Random Binary Mappings for Kernel Learning and Efficient SVM

    Support Vector Machines (SVMs) are powerful learners that have led to state-of-the-art results in various computer vision problems. SVMs suffer from various drawbacks in terms of selecting the right kernel, which depends on the image descriptors, as well as computational and memory efficiency. This paper introduces a novel kernel, which serves such issues well. The kernel is learned by exploiting a large amount of low-complex, randomized binary mappings of the input feature. This leads to an efficient SVM, while also alleviating the task of kernel selection. We demonstrate the capabilities of our kernel on 6 standard vision benchmarks, in which we combine several common image descriptors, namely histograms (Flowers17 and Daimler), attribute-like descriptors (UCI, OSR, and a-VOC08), and Sparse Quantization (ImageNet). Results show that our kernel learning adapts well to the different descriptors types, achieving the performance of the kernels specifically tuned for each image descriptor, and with similar evaluation cost as efficient SVM methods.

    07/19/2013 ∙ by Gemma Roig, et al. ∙ 0 share

    read it

  • Theory of Deep Learning III: explaining the non-overfitting puzzle

    A main puzzle of deep networks revolves around the absence of overfitting despite overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamical systems associated with gradient descent minimization of nonlinear networks behave near zero stable minima of the empirical error as gradient system in a quadratic potential with degenerate Hessian. The proposition is supported by theoretical and numerical results, under the assumption of stable minima of the gradient. Our proposition provides the extension to deep networks of key properties of gradient descent methods for linear networks, that as, suggested in (1), can be the key to understand generalization. Gradient descent enforces a form of implicit regularization controlled by the number of iterations, and asymptotically converging to the minimum norm solution. This implies that there is usually an optimum early stopping that avoids overfitting of the loss (this is relevant mainly for regression). For classification, the asymptotic convergence to the minimum norm solution implies convergence to the maximum margin solution which guarantees good classification error for "low noise" datasets. The implied robustness to overparametrization has suggestive implications for the robustness of deep hierarchically local networks to variations of the architecture with respect to the curse of dimensionality.

    12/30/2017 ∙ by Tomaso Poggio, et al. ∙ 0 share

    read it