Xin He

is this you? claim profile


  • AutoML: A Survey of the State-of-the-Art

    Deep learning has penetrated all aspects of our lives and brought us great convenience. However, the process of building a high-quality deep learning system for a specific task is not only time-consuming but also requires lots of resources and relies on human expertise, which hinders the development of deep learning in both industry and academia. To alleviate this problem, a growing number of research projects focus on automated machine learning (AutoML). In this paper, we provide a comprehensive and up-to-date study on the state-of-the-art AutoML. First, we introduce the AutoML techniques in details according to the machine learning pipeline. Then we summarize existing Neural Architecture Search (NAS) research, which is one of the most popular topics in AutoML. We also compare the models generated by NAS algorithms with those human-designed models. Finally, we present several open problems for future research.

    08/02/2019 ∙ by Xin He, et al. ∙ 257 share

    read it

  • Simple Physical Adversarial Examples against End-to-End Autonomous Driving Models

    Recent advances in machine learning, especially techniques such as deep neural networks, are promoting a range of high-stakes applications, including autonomous driving, which often relies on deep learning for perception. While deep learning for perception has been shown to be vulnerable to a host of subtle adversarial manipulations of images, end-to-end demonstrations of successful attacks, which manipulate the physical environment and result in physical consequences, are scarce. Moreover, attacks typically involve carefully constructed adversarial examples at the level of pixels. We demonstrate the first end-to-end attacks on autonomous driving in simulation, using simple physically realizable attacks: the painting of black lines on the road. These attacks target deep neural network models for end-to-end autonomous driving control. A systematic investigation shows that such attacks are surprisingly easy to engineer, and we describe scenarios (e.g., right turns) in which they are highly effective, and others that are less vulnerable (e.g., driving straight). Further, we use network deconvolution to demonstrate that the attacks succeed by inducing activation patterns similar to entirely different scenarios used in training.

    03/12/2019 ∙ by Adith Boloor, et al. ∙ 14 share

    read it

  • Symmetry-constrained Rectification Network for Scene Text Recognition

    Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes. Recently, the community has paid increasing attention to the problem of recognizing text instances with irregular shapes. One intuitive and effective way to handle this problem is to rectify irregular text to a canonical form before recognition. However, these methods might struggle when dealing with highly curved or distorted text instances. To tackle this issue, we propose in this paper a Symmetry-constrained Rectification Network (ScRN) based on local attributes of text instances, such as center line, scale and orientation. Such constraints with an accurate description of text shape enable ScRN to generate better rectification results than existing methods and thus lead to higher recognition accuracy. Our method achieves state-of-the-art performance on text with both regular and irregular shapes. Specifically, the system outperforms existing algorithms by a large margin on datasets that contain quite a proportion of irregular text instances, e.g., ICDAR 2015, SVT-Perspective and CUTE80.

    08/06/2019 ∙ by Mingkun Yang, et al. ∙ 8 share

    read it

  • Linear-Time Succinct Encodings of Planar Graphs via Canonical Orderings

    Let G be an embedded planar undirected graph that has n vertices, m edges, and f faces but has no self-loop or multiple edge. If G is triangulated, we can encode it using 4/3m-1 bits, improving on the best previous bound of about 1.53m bits. In case exponential time is acceptable, roughly 1.08m bits have been known to suffice. If G is triconnected, we use at most (2.5+23){n,f}-7 bits, which is at most 2.835m bits and smaller than the best previous bound of 3m bits. Both of our schemes take O(n) time for encoding and decoding.

    01/27/2001 ∙ by Xin He, et al. ∙ 0 share

    read it

  • A Fast General Methodology for Information-Theoretically Optimal Encodings of Graphs

    We propose a fast methodology for encoding graphs with information-theoretically minimum numbers of bits. Specifically, a graph with property pi is called a pi-graph. If pi satisfies certain properties, then an n-node m-edge pi-graph G can be encoded by a binary string X such that (1) G and X can be obtained from each other in O(n log n) time, and (2) X has at most beta(n)+o(beta(n)) bits for any continuous super-additive function beta(n) so that there are at most 2^beta(n)+o(beta(n)) distinct n-node pi-graphs. The methodology is applicable to general classes of graphs; this paper focuses on planar graphs. Examples of such pi include all conjunctions over the following groups of properties: (1) G is a planar graph or a plane graph; (2) G is directed or undirected; (3) G is triangulated, triconnected, biconnected, merely connected, or not required to be connected; (4) the nodes of G are labeled with labels from 1, ..., ell_1 for ell_1 <= n; (5) the edges of G are labeled with labels from 1, ..., ell_2 for ell_2 <= m; and (6) each node (respectively, edge) of G has at most ell_3 = O(1) self-loops (respectively, ell_4 = O(1) multiple edges). Moreover, ell_3 and ell_4 are not required to be O(1) for the cases of pi being a plane triangulation. These examples are novel applications of small cycle separators of planar graphs and are the only nontrivial classes of graphs, other than rooted trees, with known polynomial-time information-theoretically optimal coding schemes.

    01/23/2001 ∙ by Xin He, et al. ∙ 0 share

    read it

  • Multilayer Nonlinear Processing for Information Privacy in Sensor Networks

    A sensor network wishes to transmit information to a fusion center to allow it to detect a public hypothesis, but at the same time prevent it from inferring a private hypothesis. We propose a multilayer nonlinear processing procedure at each sensor to distort the sensor's data before it is sent to the fusion center. In our proposed framework, sensors are grouped into clusters, and each sensor first applies a nonlinear fusion function on the information it receives from sensors in the same cluster and in a previous layer. A linear weighting matrix is then used to distort the information it sends to sensors in the next layer. We adopt a nonparametric approach and develop a modified mirror descent algorithm to optimize the weighting matrices so as to ensure that the regularized empirical risk of detecting the private hypothesis is above a given privacy threshold, while minimizing the regularized empirical risk of detecting the public hypothesis. Experiments on empirical datasets demonstrate that our approach is able to achieve a good trade-off between the error rates of the public and private hypothesis.

    11/13/2017 ∙ by Xin He, et al. ∙ 0 share

    read it

  • Scalable kernel-based variable selection with sparsistency

    Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms cannot attain these properties at the same time. In this article, a three-step variable selection algorithm is developed, involving kernel-based estimation of the regression function and its gradient functions as well as a hard thresholding. Its key advantage is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic sparsistency. The proposed algorithm can be adapted to any reproducing kernel Hilbert space (RKHS) with different kernel functions, and can be extended to interaction selection with slight modification. Its computational cost is only linear in the data dimension, and can be further improved through parallel computing. The sparsistency of the proposed algorithm is established for general RKHS under mild conditions, including linear and Gaussian kernels as special cases. Its effectiveness is also supported by a variety of simulated and real examples.

    02/26/2018 ∙ by Xin He, et al. ∙ 0 share

    read it

  • TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

    Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40

    07/04/2018 ∙ by Shangbang Long, et al. ∙ 0 share

    read it

  • AxTrain: Hardware-Oriented Neural Network Training for Approximate Inference

    The intrinsic error tolerance of neural network (NN) makes approximate computing a promising technique to improve the energy efficiency of NN inference. Conventional approximate computing focuses on balancing the efficiency-accuracy trade-off for existing pre-trained networks, which can lead to suboptimal solutions. In this paper, we propose AxTrain, a hardware-oriented training framework to facilitate approximate computing for NN inference. Specifically, AxTrain leverages the synergy between two orthogonal methods---one actively searches for a network parameters distribution with high error tolerance, and the other passively learns resilient weights by numerically incorporating the noise distributions of the approximate hardware in the forward pass during the training phase. Experimental results from various datasets with near-threshold computing and approximation multiplication strategies demonstrate AxTrain's ability to obtain resilient neural network parameters and system energy efficiency improvement.

    05/21/2018 ∙ by Xin He, et al. ∙ 0 share

    read it

  • Scene Text Detection and Recognition: The Deep Learning Era

    With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, approach and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and the grand challenges still remained. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected and compiled in our Github repository:

    11/10/2018 ∙ by Shangbang Long, et al. ∙ 0 share

    read it

  • On Measurement of the Spatio-Frequency Property of OFDM Backscattering

    Orthogonal frequency-division multiplexing (OFDM) backscatter system, such as Wi-Fi backscatter, has recently been recognized as a promising technique for the IoT connectivity, due to its ubiquitous and low-cost property. This paper investigates the spatial-frequency property of the OFDM backscatter which takes the distance and the angle into account in different frequency bands. We deploy three typical scenarios for performing measurements to evaluate the received signal strength from the backscatter link. The impact of the distances among the transmitter, the tag and the receiver, as well as the angle between the transmitter and the tag is observed through the obtained measurement data. From the evaluation results, it is found that the best location of tag is either close to the receiver or the transmitter which depends on the frequency band, and the best angle is 90 degrees between the transmitter and the receiver. This work opens the shed light on the spatial deployment of the backscatter tag in different frequency band with the aim of improving the performance and reducing the interference.

    10/27/2018 ∙ by Xiaoxue Zhang, et al. ∙ 0 share

    read it