Jia Deng

is this you? claim profile


Assistant Professor for Computer Science and Engineering at University of Michigan

  • Learning to Prove Theorems via Interacting with Proof Assistants

    Humans prove theorems by relying on substantial high-level reasoning and problem-specific insights. Proof assistants offer a formalism that resembles human mathematical reasoning, representing theorems in higher-order logic and proofs as high-level tactics. However, human experts have to construct proofs manually by entering tactics into the proof assistant. In this paper, we study the problem of using machine learning to automate the interaction with proof assistants. We construct CoqGym, a large-scale dataset and learning environment containing 71K human-written proofs from 123 projects developed with the Coq proof assistant. We develop ASTactic, a deep learning-based model that generates tactics as programs in the form of abstract syntax trees (ASTs). Experiments show that ASTactic trained on CoqGym can generate effective tactics and can be used to prove new theorems not previously provable by automated methods. Code is available at https://github.com/princeton-vl/CoqGym.

    05/21/2019 ∙ by Kaiyu Yang, et al. ∙ 32 share

    read it

  • CornerNet-Lite: Efficient Keypoint Based Object Detection

    Keypoint-based methods are a relatively new paradigm in object detection, eliminating the need for anchor boxes and offering a simplified detection framework. Keypoint-based CornerNet achieves state of the art accuracy among single-stage detectors. However, this accuracy comes at high processing cost. In this work, we tackle the problem of efficient keypoint-based object detection and introduce CornerNet-Lite. CornerNet-Lite is a combination of two efficient variants of CornerNet: CornerNet-Saccade, which uses an attention mechanism to eliminate the need for exhaustively processing all pixels of the image, and CornerNet-Squeeze, which introduces a new compact backbone architecture. Together these two variants address the two critical use cases in efficient object detection: improving efficiency without sacrificing accuracy, and improving accuracy at real-time efficiency. CornerNet-Saccade is suitable for offline processing, improving the efficiency of CornerNet by 6.0x and the AP by 1.0 improving both the efficiency and accuracy of the popular real-time detector YOLOv3 (34.4 YOLOv3 on COCO). Together these contributions for the first time reveal the potential of keypoint-based detection to be useful for applications requiring processing efficiency.

    04/18/2019 ∙ by Hei Law, et al. ∙ 20 share

    read it

  • DeepV2D: Video to Depth with Differentiable Structure from Motion

    We propose DeepV2D, an end-to-end differentiable deep learning architecture for predicting depth from a video sequence. We incorporate elements of classical Structure from Motion into an end-to-end trainable pipeline by designing a set of differentiable geometric modules. Our full system alternates between predicting depth and refining camera pose. We estimate depth by building a cost volume over learned features and apply a multi-scale 3D convolutional network for stereo matching. The predicted depth is then sent to the motion module which performs iterative pose updates by mapping optical flow to a camera motion update. We evaluate our proposed system on NYU, KITTI, and SUN3D datasets and show improved results over monocular baselines and deep and classical stereo reconstruction.

    12/11/2018 ∙ by Zachary Teed, et al. ∙ 18 share

    read it

  • Rethinking Numerical Representations for Deep Neural Networks

    With ever-increasing computational demand for deep learning, it is critical to investigate the implications of the numeric representation and precision of DNN model weights and activations on computational efficiency. In this work, we explore unconventional narrow-precision floating-point representations as it relates to inference accuracy and efficiency to steer the improved design of future DNN platforms. We show that inference using these custom numeric representations on production-grade DNNs, including GoogLeNet and VGG, achieves an average speedup of 7.6x with less than 1 relative to a state-of-the-art baseline platform representing the most sophisticated hardware using single-precision floating point. To facilitate the use of such customized precision, we also present a novel technique that drastically reduces the time required to derive the optimal precision configuration.

    08/07/2018 ∙ by Parker Hill, et al. ∙ 10 share

    read it

  • D3D: Distilled 3D Networks for Video Action Recognition

    State-of-the-art methods for video action recognition commonly use an ensemble of two networks: the spatial stream, which takes RGB frames as input, and the temporal stream, which takes optical flow as input. In recent work, both of these streams consist of 3D Convolutional Neural Networks, which apply spatiotemporal filters to the video clip before performing classification. Conceptually, the temporal filters should allow the spatial stream to learn motion representations, making the temporal stream redundant. However, we still see significant benefits in action recognition performance by including an entirely separate temporal stream, indicating that the spatial stream is "missing" some of the signal captured by the temporal stream. In this work, we first investigate whether motion representations are indeed missing in the spatial stream of 3D CNNs. Second, we demonstrate that these motion representations can be improved by distillation, by tuning the spatial stream to predict the outputs of the temporal stream, effectively combining both models into a single stream. Finally, we show that our Distilled 3D Network (D3D) achieves performance on par with two-stream approaches, using only a single model and with no need to compute optical flow.

    12/19/2018 ∙ by Jonathan C. Stroud, et al. ∙ 10 share

    read it

  • CornerNet: Detecting Objects as Paired Keypoints

    We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.1

    08/03/2018 ∙ by Hei Law, et al. ∙ 6 share

    read it

  • Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control

    Recent progress on physics-based character animation has shown impressive breakthroughs on human motion synthesis, through the imitation of motion capture data via deep reinforcement learning. However, results have mostly been demonstrated on imitating a single distinct motion pattern, and do not generalize to interactive tasks that require flexible motion patterns due to varying human-object spatial configurations. In this paper, we focus on one class of interactive task---sitting onto a chair. We propose a hierarchical reinforcement learning framework which relies on a collection of subtask controllers trained to imitate simple, reusable mocap motions, and a meta controller trained to execute the subtasks properly to complete the main task. We experimentally demonstrate the strength of our approach over different single level and hierarchical baselines. We also show that our approach can be applied to motion prediction given an image input. A video highlight can be found at https://youtu.be/3CeN0OGz2cA.

    08/20/2019 ∙ by Yu-Wei Chao, et al. ∙ 6 share

    read it

  • Feature Partitioning for Efficient Multi-Task Architectures

    Multi-task learning holds the promise of less data, parameters, and time than training of separate models. We propose a method to automatically search over multi-task architectures while taking resource constraints into consideration. We propose a search space that compactly represents different parameter sharing strategies. This provides more effective coverage and sampling of the space of multi-task architectures. We also present a method for quick evaluation of different architectures by using feature distillation. Together these contributions allow us to quickly optimize for efficient multi-task models. We benchmark on Visual Decathlon, demonstrating that we can automatically search for and identify multi-task architectures that effectively make trade-offs between task resource requirements while achieving a high level of final performance.

    08/12/2019 ∙ by Alejandro Newell, et al. ∙ 4 share

    read it

  • Learning to Generate Synthetic 3D Training Data through Hybrid Gradient

    Synthetic images rendered by graphics engines are a promising source for training deep networks. However, it is challenging to ensure that they can help train a network to perform well on real images, because a graphics-based generation pipeline requires numerous design decisions such as the selection of 3D shapes and the placement of the camera. In this work, we propose a new method that optimizes the generation of 3D training data based on what we call "hybrid gradient". We parametrize the design decisions as a real vector, and combine the approximate gradient and the analytical gradient to obtain the hybrid gradient of the network performance with respect to this vector. We evaluate our approach on the task of estimating surface normals from a single image. Experiments on standard benchmarks show that our approach can outperform the prior state of the art on optimizing the generation of 3D training data, particularly in terms of computational efficiency.

    06/29/2019 ∙ by Dawei Yang, et al. ∙ 3 share

    read it

  • To Learn or Not to Learn: Analyzing the Role of Learning for Navigation in Virtual Environments

    In this paper we compare learning-based methods and classical methods for navigation in virtual environments. We construct classical navigation agents and demonstrate that they outperform state-of-the-art learning-based agents on two standard benchmarks: MINOS and Stanford Large-Scale 3D Indoor Spaces. We perform detailed analysis to study the strengths and weaknesses of learned agents and classical agents, as well as how characteristics of the virtual environment impact navigation performance. Our results show that learned agents have inferior collision avoidance and memory management, but are superior in handling ambiguity and noise. These results can inform future design of navigation agents.

    07/26/2019 ∙ by Noriyuki Kojima, et al. ∙ 3 share

    read it

  • Learning Single-Image Depth from Videos using Quality Assessment Networks

    Although significant progress has been made in recent years, depth estimation from a single image in the wild is still a very challenging problem. One reason is the lack of high-quality image-depth data in the wild. In this paper we propose a fully automatic pipeline based on Structure-from-Motion (SfM) to generate such data from arbitrary videos. The core of this pipeline is a Quality Assessment Network that can distinguish correct and incorrect reconstructions obtained from SfM. With the proposed pipeline, we generate image-depth data from the NYU Depth dataset and random YouTube videos. We show that depth-prediction networks trained on such data can achieve competitive performance on the NYU Depth and the Depth-in-the-Wild benchmarks.

    06/25/2018 ∙ by Weifeng Chen, et al. ∙ 2 share

    read it