Yang Shi

is this you? claim profile


  • Compact Tensor Pooling for Visual Question Answering

    Performing high level cognitive tasks requires the integration of feature maps with drastically different structure. In Visual Question Answering (VQA) image descriptors have spatial structures, while lexical inputs inherently follow a temporal sequence. The recently proposed Multimodal Compact Bilinear pooling (MCB) forms the outer products, via count-sketch approximation, of the visual and textual representation at each spatial location. While this procedure preserves spatial information locally, outer-products are taken independently for each fiber of the activation tensor, and therefore do not include spatial context. In this work, we introduce multi-dimensional sketch (MD-sketch), a novel extension of count-sketch to tensors. Using this new formulation, we propose Multimodal Compact Tensor Pooling (MCT) to fully exploit the global spatial context during bilinear pooling operations. Contrarily to MCB, our approach preserves spatial context by directly convolving the MD-sketch from the visual tensor features with the text vector feature using higher order FFT. Furthermore we apply MCT incrementally at each step of the question embedding and accumulate the multi-modal vectors with a second LSTM layer before the final answer is chosen.

    06/20/2017 ∙ by Yang Shi, et al. ∙ 0 share

    read it

  • Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations

    Robust tensor CP decomposition involves decomposing a tensor into low rank and sparse components. We propose a novel non-convex iterative algorithm with guaranteed recovery. It alternates between low-rank CP decomposition through gradient ascent (a variant of the tensor power method), and hard thresholding of the residual. We prove convergence to the globally optimal solution under natural incoherence conditions on the low rank component, and bounded level of sparse perturbations. We compare our method with natural baselines which apply robust matrix PCA either to the flattened tensor, or to the matrix slices of the tensor. Our method can provably handle a far greater level of perturbation when the sparse tensor is block-structured. This naturally occurs in many applications such as the activity detection task in videos. Our experiments validate these findings. Thus, we establish that tensor methods can tolerate a higher level of gross corruptions compared to matrix methods.

    10/15/2015 ∙ by Prateek Jain, et al. ∙ 0 share

    read it

  • PoTrojan: powerful neural-level trojan designs in deep learning models

    With the popularity of deep learning (DL), artificial intelligence (AI) has been applied in many areas of human life. Neural network or artificial neural network (NN), the main technique behind DL, has been extensively studied to facilitate computer vision and natural language recognition. However, the more we rely on information technology, the more vulnerable we are. That is, malicious NNs could bring huge threat in the so-called coming AI era. In this paper, for the first time in the literature, we propose a novel approach to design and insert powerful neural-level trojans or PoTrojan in pre-trained NN models. Most of the time, PoTrojans remain inactive, not affecting the normal functions of their host NN models. PoTrojans could only be triggered in very rare conditions. Once activated, however, the PoTrojans could cause the host NN models to malfunction, either falsely predicting or classifying, which is a significant threat to human society of the AI era. We would explain the principles of PoTrojans and the easiness of designing and inserting them in pre-trained deep learning models. PoTrojans doesn't modify the existing architecture or parameters of the pre-trained models, without re-training. Hence, the proposed method is very efficient.

    02/08/2018 ∙ by Minhui Zou, et al. ∙ 0 share

    read it

  • Question Type Guided Attention in Visual Question Answering

    Visual Question Answering (VQA) requires integration of feature maps with drastically different structures and focus of the correct regions. Image descriptors have structures at multiple spatial scales, while lexical inputs inherently follow a temporal sequence and naturally cluster into semantically different question types. A lot of previous works use complex models to extract feature representations but neglect to use high-level information summary such as question types in learning. In this work, we propose Question Type-guided Attention (QTA). It utilizes the information of question type to dynamically balance between bottom-up and top-down visual features, respectively extracted from ResNet and Faster R-CNN networks. We experiment with multiple VQA architectures with extensive input ablation studies over the TDIUC dataset and show that QTA systematically improves the performance by more than 5 multiple question type categories such as "Activity Recognition", "Utility" and "Counting" on TDIUC dataset. By adding QTA on the state-of-art model MCB, we achieve 3 extension to predict question types which generalizes QTA to applications that lack of question type, with minimal performance loss.

    04/06/2018 ∙ by Yang Shi, et al. ∙ 0 share

    read it

  • Accurate and Efficient Estimation of Small P-values with the Cross-Entropy Method: Applications in Genomic Data Analysis

    Small p-values are often required to be accurately estimated in large scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytical intractable, existing methods usually do not work well with small p-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently calculating small p-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques.We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small p-values (e.g. 10^-6 to 10^-100). The proposed algorithm is helpful to the improvement of existing test procedures and the development of new test procedures in genomic studies.

    03/09/2018 ∙ by Yang Shi, et al. ∙ 0 share

    read it

  • Connectivity-Preserving Coordination Control of Multi-Agent Systems with Time-Varying Delays

    This paper presents a distributed position synchronization strategy that also preserves the initial communication links for single-integrator multi-agent systems with time-varying delays. The strategy employs a coordinating proportional control derived from a specific type of potential energy, augmented with damping injected through a dynamic filter. The injected damping maintains all agents within the communication distances of their neighbours, and asymptotically stabilizes the multi-agent system, in the presence of time delays. Regarding the closed-loop single-integrator multi-agent system as a double-integrator system suggests an extension of the proposed strategy to connectivity-preserving coordination of Euler-Lagrange networks with time-varying delays. Lyapunov stability analysis and simulation results validate the two designs.

    03/21/2018 ∙ by Yuan Yang, et al. ∙ 0 share

    read it

  • Deep neural network based i-vector mapping for speaker verification using short utterances

    Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector based systems have become the standard in speaker verification applications, but they are less effective with short utterances. In this paper, we first compare two state-of-the-art universal background model training methods for i-vector modeling using full-length and short utterance evaluation tasks. The two methods are Gaussian mixture model (GMM) based and deep neural network (DNN) based methods. The results indicate that the I-vector_DNN system outperforms the I-vector_GMM system under various durations. However, the performances of both systems degrade significantly as the duration of the utterances decreases. To address this issue, we propose two novel nonlinear mapping methods which train DNN models to map the i-vectors extracted from short utterances to their corresponding long-utterance i-vectors. The mapped i-vector can restore missing information and reduce the variance of the original short-utterance i-vectors. The proposed methods both model the joint representation of short and long utterance i-vectors by using autoencoder. Experimental results using the NIST SRE 2010 dataset show that both methods provide significant improvement and result in a max of 28.43 relative improvement in Equal Error Rates from a baseline system, when using deep encoder with residual blocks and adding an additional phoneme vector. When further testing the best-validated models of SRE10 on the Speaker In The Wild dataset, the methods result in a 23.12 s) short-utterance conditions.

    10/16/2018 ∙ by Jinxi Guo, et al. ∙ 0 share

    read it

  • Multi-dimensional Tensor Sketch

    Sketching refers to a class of randomized dimensionality reduction methods that aim to preserve relevant information in large-scale datasets. They have efficient memory requirements and typically require just a single pass over the dataset. Efficient sketching methods have been derived for vector and matrix-valued datasets. When the datasets are higher-order tensors, a naive approach is to flatten the tensors into vectors or matrices and then sketch them. However, this is inefficient since it ignores the multi-dimensional nature of tensors. In this paper, we propose a novel multi-dimensional tensor sketch (MTS) that preserves higher order data structures while reducing dimensionality. We build this as an extension to the popular count sketch (CS) and show that it yields an unbiased estimator of the original tensor. We demonstrate significant advantages in compression ratios when the original data has decomposable tensor representations such as the Tucker, CP, tensor train or Kronecker product forms. We apply MTS to tensorized neural networks where we replace fully connected layers with tensor operations. We achieve nearly state of art accuracy with significant compression on image classification benchmarks.

    01/31/2019 ∙ by Yang Shi, et al. ∙ 0 share

    read it

  • Metaflow: A DAG-Based Network Abstraction for Distributed Applications

    In the past decade, increasingly network scheduling techniques have been proposed to boost the distributed application performance. Flow-level metrics, such as flow completion time (FCT), are based on the abstraction of flows yet they cannot capture the semantics of communication in a cluster application. Being aware of this problem, coflow is proposed as a new network abstraction. However, it is insufficient to reveal the dependencies between computation and communication. As a result, the real application performance can be hurt, especially in the absence of hard barriers. Based on the computation DAG of the application, we propose an expressive abstraction namely metaflow that resides in the middle of the two extreme points of flows and coflows. Evaluation results show that metaflow-based scheduling can outperform the coflow-based algorithm by 1.78x.

    01/17/2019 ∙ by Jiawei Fei, et al. ∙ 0 share

    read it

  • Visual Analytics of Anomalous User Behaviors: A Survey

    The increasing accessibility of data provides substantial opportunities for understanding user behaviors. Unearthing anomalies in user behaviors is of particular importance as it helps signal harmful incidents such as network intrusions, terrorist activities, and financial frauds. Many visual analytics methods have been proposed to help understand user behavior-related data in various application domains. In this work, we survey the state of art in visual analytics of anomalous user behaviors and classify them into four categories including social interaction, travel, network communication, and transaction. We further examine the research works in each category in terms of data types, anomaly detection techniques, and visualization techniques, and interaction methods. Finally, we discuss the findings and potential research directions.

    05/14/2019 ∙ by Yang Shi, et al. ∙ 0 share

    read it