Larry Davis

is this you? claim profile


  • Attributing Fake Images to GANs: Analyzing Fingerprints in Generated Images

    Research in computer graphics has been in pursuit of realistic image generation for a long time. Recent advances in machine learning with deep generative models have shown increasing success of closing the realism gap by using data-driven and learned components. There is an increasing concern that real and fake images will become more and more difficult to tell apart. We take a first step towards this larger research challenge by asking the question if and to what extend a generated fake image can be attribute to a particular Generative Adversarial Networks (GANs) of a certain architecture and trained with particular data and random seed. Our analysis shows single samples from GANs carry highly characteristic fingerprints which make attribution of images to GANs possible. Surprisingly, this is even possible for GANs with same architecture and same training that only differ by the training seed.

    11/20/2018 ∙ by Ning Yu, et al. ∙ 12 share

    read it

  • Unsupervised Data Uncertainty Learning in Visual Retrieval Systems

    We introduce an unsupervised formulation to estimate heteroscedastic uncertainty in retrieval systems. We propose an extension to triplet loss that models data uncertainty for each input. Besides improving performance, our formulation models local noise in the embedding space. It quantifies input uncertainty and thus enhances interpretability of the system. This helps identify noisy observations in query and search databases. Evaluation on both image and video retrieval applications highlight the utility of our approach. We highlight our efficiency in modeling local noise using two real-world datasets: Clothing1M and Honda Driving datasets. Qualitative results illustrate our ability in identifying confusing scenarios in various domains. Uncertainty learning also enables data cleaning by detecting noisy training labels.

    02/07/2019 ∙ by Ahmed Taha, et al. ∙ 10 share

    read it

  • STEP: Spatio-Temporal Progressive Learning for Video Action Detection

    In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector---a progressive learning framework for spatio-temporal action detection in videos. Starting from a handful of coarse-scale proposal cuboids, our approach progressively refines the proposals towards actions over a few steps. In this way, high-quality proposals (i.e., adhere to action movements) can be gradually obtained at later steps by leveraging the regression outputs from previous steps. At each step, we adaptively extend the proposals in time to incorporate more related temporal context. Compared to the prior work that performs action detection in one run, our progressive learning framework is able to naturally handle the spatial displacement within action tubes and therefore provides a more effective way for spatio-temporal modeling. We extensively evaluate our approach on UCF101 and AVA, and demonstrate superior detection results. Remarkably, we achieve mAP of 75.0 datasets with 3 progressive steps and using respectively only 11 and 34 initial proposals.

    04/19/2019 ∙ by Xitong Yang, et al. ∙ 8 share

    read it

  • SSH: Single Stage Headless Face Detector

    We introduce the Single Stage Headless (SSH) face detector. Unlike two stage proposal-classification detectors, SSH detects faces in a single stage directly from the early convolutional layers in a classification network. SSH is headless. That is, it is able to achieve state-of-the-art results while removing the "head" of its underlying classification network -- i.e. all fully connected layers in the VGG-16 which contains a large number of parameters. Additionally, instead of relying on an image pyramid to detect faces with various scales, SSH is scale-invariant by design. We simultaneously detect faces with different scales in a single forward pass of the network, but from different layers. These properties make SSH fast and light-weight. Surprisingly, with a headless VGG-16, SSH beats the ResNet-101-based state-of-the-art on the WIDER dataset. Even though, unlike the current state-of-the-art, SSH does not use an image pyramid and is 5X faster. Moreover, if an image pyramid is deployed, our light-weight network achieves state-of-the-art on all subsets of the WIDER dataset, improving the AP by 2.5 SSH also reaches state-of-the-art results on the FDDB and Pascal-Faces datasets while using a small input size, leading to a runtime of 50 ms/image on a GPU. The code is available at

    08/14/2017 ∙ by Mahyar Najibi, et al. ∙ 0 share

    read it

  • Weakly Supervised Learning of Heterogeneous Concepts in Videos

    Typical textual descriptions that accompany online videos are 'weak': i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a generalization of the Indian Buffet Process (IBP) that can (a) systematically incorporate heterogeneous concepts in an integrated framework, and (b) enforce location constraints, for efficient classification and localization of the concepts in the videos. Finally, we develop posterior inference for the proposed formulation using mean-field variational approximation. Comparative evaluations on the Casablanca and the A2D datasets show that the proposed approach significantly outperforms other state-of-the-art techniques: 24 classification in the Casablanca dataset and 9 localization in the A2D dataset as compared to the most competitive baseline.

    07/12/2016 ∙ by Sohil Shah, et al. ∙ 0 share

    read it

  • Learning Discriminative Features via Label Consistent Neural Network

    Deep Convolutional Neural Networks (CNN) enforces supervised information only at the output layer, and hidden layers are trained by back propagating the prediction error from the output layer without explicit supervision. We propose a supervised feature learning approach, Label Consistent Neural Network, which enforces direct supervision in late hidden layers. We associate each neuron in a hidden layer with a particular class label and encourage it to be activated for input signals from the same class. More specifically, we introduce a label consistency regularization called "discriminative representation error" loss for late hidden layers and combine it with classification error loss to build our overall objective function. This label consistency constraint alleviates the common problem of gradient vanishing and tends to faster convergence; it also makes the features derived from late hidden layers discriminative enough for classification even using a simple k-NN classifier, since input signals from the same class will have very similar representations. Experimental results demonstrate that our approach achieves state-of-the-art performances on several public benchmarks for action and object category recognition.

    02/03/2016 ∙ by Zhuolin Jiang, et al. ∙ 0 share

    read it

  • Learning Structured Ordinal Measures for Video based Face Recognition

    This paper presents a structured ordinal measure method for video-based face recognition that simultaneously learns ordinal filters and structured ordinal features. The problem is posed as a non-convex integer program problem that includes two parts. The first part learns stable ordinal filters to project video data into a large-margin ordinal space. The second seeks self-correcting and discrete codes by balancing the projected data and a rank-one ordinal matrix in a structured low-rank way. Unsupervised and supervised structures are considered for the ordinal matrix. In addition, as a complement to hierarchical structures, deep feature representations are integrated into our method to enhance coding stability. An alternating minimization method is employed to handle the discrete and low-rank constraints, yielding high-quality codes that capture prior structures well. Experimental results on three commonly used face video databases show that our method with a simple voting classifier can achieve state-of-the-art recognition rates using fewer features and samples.

    07/09/2015 ∙ by Ran He, et al. ∙ 0 share

    read it

  • The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives

    Visual narrative is often a combination of explicit information and judicious omissions, relying on the viewer to supply missing details. In comics, most movements in time and space are hidden in the "gutters" between panels. To follow the story, readers logically connect panels together by inferring unseen actions through a process called "closure". While computers can now describe what is explicitly depicted in natural images, in this paper we examine whether they can understand the closure-driven narratives conveyed by stylized artwork and dialogue in comic book panels. We construct a dataset, COMICS, that consists of over 1.2 million panels (120 GB) paired with automatic textbox transcriptions. An in-depth analysis of COMICS demonstrates that neither text nor image alone can tell a comic book story, so a computer must understand both modalities to keep up with the plot. We introduce three cloze-style tasks that ask models to predict narrative and character-centric aspects of a panel given n preceding panels as context. Various deep neural architectures underperform human baselines on these tasks, suggesting that COMICS contains fundamental challenges for both vision and language.

    11/16/2016 ∙ by Mohit Iyyer, et al. ∙ 0 share

    read it

  • Boundary-sensitive Network for Portrait Segmentation

    Compared to the general semantic segmentation problem, portrait segmentation has higher precision requirement on boundary area. However, this problem has not been well studied in previous works. In this paper, we propose a boundary-sensitive deep neural network (BSN) for portrait segmentation. BSN introduces three novel techniques. First, an individual boundary-sensitive kernel is proposed by dilating the contour line and assigning the boundary pixels with multi-class labels. Second, a global boundary-sensitive kernel is employed as a position sensitive prior to further constrain the overall shape of the segmentation map. Third, we train a boundary-sensitive attribute classifier jointly with the segmentation network to reinforce the network with semantic boundary shape information. We have evaluated BSN on the current largest public portrait segmentation dataset, , the PFCN dataset, as well as the portrait images collected from other three popular image segmentation datasets: COCO, COCO-Stuff, and PASCAL VOC. Our method achieves the superior quantitative and qualitative performance over state-of-the-arts on all the datasets, especially on the boundary area.

    12/22/2017 ∙ by Xianzhi Du, et al. ∙ 0 share

    read it

  • Face-MagNet: Magnifying Feature Maps to Detect Small Faces

    In this paper, we introduce the Face Magnifier Network (Face-MageNet), a face detector based on the Faster-RCNN framework which enables the flow of discriminative information of small scale faces to the classifier without any skip or residual connections. To achieve this, Face-MagNet deploys a set of ConvTranspose, also known as deconvolution, layers in the Region Proposal Network (RPN) and another set before the Region of Interest (RoI) pooling layer to facilitate detection of finer faces. In addition, we also design, train, and evaluate three other well-tuned architectures that represent the conventional solutions to the scale problem: context pooling, skip connections, and scale partitioning. Each of these three networks achieves comparable results to the state-of-the-art face detectors. With extensive experiments, we show that Face-MagNet based on a VGG16 architecture achieves better results than the recently proposed ResNet101-based HR method on the task of face detection on WIDER dataset and also achieves similar results on the hard set as our other method SSH.

    03/14/2018 ∙ by Pouya Samangouei, et al. ∙ 0 share

    read it

  • Class Subset Selection for Transfer Learning using Submodularity

    In recent years, it is common practice to extract fully-connected layer (fc) features that were learned while performing image classification on a source dataset, such as ImageNet, and apply them generally to a wide range of other tasks. The general usefulness of some large training datasets for transfer learning is not yet well understood, and raises a number of questions. For example, in the context of transfer learning, what is the role of a specific class in the source dataset, and how is the transferability of fc features affected when they are trained using various subsets of the set of all classes in the source dataset? In this paper, we address the question of how to select an optimal subset of the set of classes, subject to a budget constraint, that will more likely generate good features for other tasks. To accomplish this, we use a submodular set function to model the accuracy achievable on a new task when the features have been learned on a given subset of classes of the source dataset. An optimal subset is identified as the set that maximizes this submodular function. The maximization can be accomplished using an efficient greedy algorithm that comes with guarantees on the optimality of the solution. We empirically validate our submodular model by successfully identifying subsets of classes that produce good features for new tasks.

    03/30/2018 ∙ by Varun Manjunatha, et al. ∙ 0 share

    read it