
Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images
The human ability to recognize objects is impaired when the object is not shown in full. "Minimal images" are the smallest regions of an image that remain recognizable for humans. Ullman et al. 2016 show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing stateoftheart deep neural networks (DNNs), and are much more prominent in DNNs. We found many cases where DNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in DNNs. Our results thus reveal a new failure mode of DNNs that also affects humans to a much lesser degree. They expose how fragile DNN recognition ability is for natural images even without adversarial patterns being introduced. Bringing the robustness of DNNs in natural images to the human level remains an open challenge for the community.
02/08/2019 ∙ by Sanjana Srivastava, et al. ∙ 8 ∙ shareread it

Theory IIIb: Generalization in Deep Networks
A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent results by Srebro et al. provide a satisfying solution of the puzzle for linear networks used in binary classification. They prove that minimization of loss functions such as the logistic, the crossentropy and the exploss yields asymptotic, "slow" convergence to the maximum margin solution for linearly separable datasets, independently of the initial conditions. Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss. The result holds for exponentialtype losses but not for the square loss. In particular, we prove that the weight matrix at each layer of a deep network converges to a minimum norm solution up to a scale factor (in the separable case). Our analysis of the dynamical system corresponding to gradient descent of a multilayer network suggests a simple criterion for ranking the generalization performance of different zero minimizers of the empirical loss.
06/29/2018 ∙ by Tomaso Poggio, et al. ∙ 2 ∙ shareread it

Herding Generalizes Diverse M Best Solutions
We show that the algorithm to extract diverse M solutions from a Conditional Random Field (called divMbest [1]) takes exactly the form of a Herding procedure [2], i.e. a deterministic dynamical system that produces a sequence of hypotheses that respect a set of observed moment constraints. This generalization enables us to invoke properties of Herding that show that divMbest enforces implausible constraints which may yield wrong assumptions for some problem settings. Our experiments in semantic segmentation demonstrate that seeing divMbest as an instance of Herding leads to better alternatives for the implausible constraints of divMbest.
11/14/2016 ∙ by Ece Ozkan, et al. ∙ 0 ∙ shareread it

Comment on "Ensemble Projection for Semisupervised Image Classification"
In a series of papers by Dai and colleagues [1,2], a feature map (or kernel) was introduced for semi and unsupervised learning. This feature map is build from the output of an ensemble of classifiers trained without using the groundtruth class labels. In this critique, we analyze the latest version of this series of papers, which is called Ensemble Projections [2]. We show that the results reported in [2] were not well conducted, and that Ensemble Projections performs poorly for semisupervised learning.
08/29/2014 ∙ by Xavier Boix, et al. ∙ 0 ∙ shareread it

SEEDS: Superpixels Extracted via EnergyDriven Sampling
Superpixel algorithms aim to oversegment the image by grouping pixels that belong to the same object. Many stateoftheart superpixel algorithms rely on minimizing objective functions to enforce color ho mogeneity. The optimization is accomplished by sophis ticated methods that progressively build the superpix els, typically by adding cuts or growing superpixels. As a result, they are computationally too expensive for realtime applications. We introduce a new approach based on a simple hillclimbing optimization. Starting from an initial superpixel partitioning, it continuously refines the superpixels by modifying the boundaries. We define a robust and fast to evaluate energy function, based on enforcing color similarity between the bound aries and the superpixel color histogram. In a series of experiments, we show that we achieve an excellent com promise between accuracy and efficiency. We are able to achieve a performance comparable to the stateof theart, but in realtime on a single Intel i7 CPU at 2.8GHz.
09/16/2013 ∙ by Michael Van den Bergh, et al. ∙ 0 ∙ shareread it

Random Binary Mappings for Kernel Learning and Efficient SVM
Support Vector Machines (SVMs) are powerful learners that have led to stateoftheart results in various computer vision problems. SVMs suffer from various drawbacks in terms of selecting the right kernel, which depends on the image descriptors, as well as computational and memory efficiency. This paper introduces a novel kernel, which serves such issues well. The kernel is learned by exploiting a large amount of lowcomplex, randomized binary mappings of the input feature. This leads to an efficient SVM, while also alleviating the task of kernel selection. We demonstrate the capabilities of our kernel on 6 standard vision benchmarks, in which we combine several common image descriptors, namely histograms (Flowers17 and Daimler), attributelike descriptors (UCI, OSR, and aVOC08), and Sparse Quantization (ImageNet). Results show that our kernel learning adapts well to the different descriptors types, achieving the performance of the kernels specifically tuned for each image descriptor, and with similar evaluation cost as efficient SVM methods.
07/19/2013 ∙ by Gemma Roig, et al. ∙ 0 ∙ shareread it

Theory of Deep Learning III: explaining the nonoverfitting puzzle
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamical systems associated with gradient descent minimization of nonlinear networks behave near zero stable minima of the empirical error as gradient system in a quadratic potential with degenerate Hessian. The proposition is supported by theoretical and numerical results, under the assumption of stable minima of the gradient. Our proposition provides the extension to deep networks of key properties of gradient descent methods for linear networks, that as, suggested in (1), can be the key to understand generalization. Gradient descent enforces a form of implicit regularization controlled by the number of iterations, and asymptotically converging to the minimum norm solution. This implies that there is usually an optimum early stopping that avoids overfitting of the loss (this is relevant mainly for regression). For classification, the asymptotic convergence to the minimum norm solution implies convergence to the maximum margin solution which guarantees good classification error for "low noise" datasets. The implied robustness to overparametrization has suggestive implications for the robustness of deep hierarchically local networks to variations of the architecture with respect to the curse of dimensionality.
12/30/2017 ∙ by Tomaso Poggio, et al. ∙ 0 ∙ shareread it
Xavier Boix
is this you? claim profile