Hao Zhou

is this you? claim profile

0

PhD student, Department of Computer Science at The University of Maryland

  • A Deep Cascade Network for Unaligned Face Attribute Classification

    Humans focus attention on different face regions when recognizing face attributes. Most existing face attribute classification methods use the whole image as input. Moreover, some of these methods rely on fiducial landmarks to provide defined face parts. In this paper, we propose a cascade network that simultaneously learns to localize face regions specific to attributes and performs attribute classification without alignment. First, a weakly-supervised face region localization network is designed to automatically detect regions (or parts) specific to attributes. Then multiple part-based networks and a whole-image-based network are separately constructed and combined together by the region switch layer and attribute relation layer for final attribute classification. A multi-net learning method and hint-based model compression is further proposed to get an effective localization model and a compact classification model, respectively. Our approach achieves significantly better performance than state-of-the-art methods on unaligned CelebA dataset, reducing the classification error by 30.9

    09/12/2017 ∙ by Hui Ding, et al. ∙ 0 share

    read it

  • Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images

    Lighting estimation from face images is an important task and has applications in many areas such as image editing, intrinsic image decomposition, and image forgery detection. We propose to train a deep Convolutional Neural Network (CNN) to regress lighting parameters from a single face image. Lacking massive ground truth lighting labels for face images in the wild, we use an existing method to estimate lighting parameters, which are treated as ground truth with unknown noises. To alleviate the effect of such noises, we utilize the idea of Generative Adversarial Networks (GAN) and propose a Label Denoising Adversarial Network (LDAN) to make use of synthetic data with accurate ground truth to help train a deep CNN for lighting regression on real face images. Experiments show that our network outperforms existing methods in producing consistent lighting parameters of different faces under similar lighting conditions. Moreover, our method is 100,000 times faster in execution time than prior optimization-based lighting estimation approaches.

    09/06/2017 ∙ by Hao Zhou, et al. ∙ 0 share

    read it

  • Solving Uncalibrated Photometric Stereo Using Fewer Images by Jointly Optimizing Low-rank Matrix Completion and Integrability

    We introduce a new, integrated approach to uncalibrated photometric stereo. We perform 3D reconstruction of Lambertian objects using multiple images produced by unknown, directional light sources. We show how to formulate a single optimization that includes rank and integrability constraints, allowing also for missing data. We then solve this optimization using the Alternate Direction Method of Multipliers (ADMM). We conduct extensive experimental evaluation on real and synthetic data sets. Our integrated approach is particularly valuable when performing photometric stereo using as few as 4-6 images, since the integrability constraint is capable of improving estimation of the linear subspace of possible solutions. We show good improvements over prior work in these cases.

    02/02/2017 ∙ by Soumyadip Sengupta, et al. ∙ 0 share

    read it

  • Dynamic Oracle for Neural Machine Translation in Decoding Phase

    The past several years have witnessed the rapid progress of end-to-end Neural Machine Translation (NMT). However, there exists discrepancy between training and inference in NMT when decoding, which may lead to serious problems since the model might be in a part of the state space it has never seen during training. To address the issue, Scheduled Sampling has been proposed. However, there are certain limitations in Scheduled Sampling and we propose two dynamic oracle-based methods to improve it. We manage to mitigate the discrepancy by changing the training process towards a less guided scheme and meanwhile aggregating the oracle's demonstrations. Experimental results show that the proposed approaches improve translation quality over standard NMT system.

    09/19/2017 ∙ by Zi-Yi Dou, et al. ∙ 0 share

    read it

  • Augmenting End-to-End Dialog Systems with Commonsense Knowledge

    Building dialog agents that can converse naturally with humans is a challenging yet intriguing problem of artificial intelligence. In open-domain human-computer conversation, where the conversational agent is expected to respond to human responses in an interesting and engaging way, commonsense knowledge has to be integrated into the model effectively. In this paper, we investigate the impact of providing commonsense knowledge about the concepts covered in the dialog. Our model represents the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models. In the retrieval-based scenario, we propose the Tri-LSTM model to jointly take into account message and commonsense for selecting an appropriate response. Our experiments suggest that the knowledge-augmented models are superior to their knowledge-free counterparts in automatic evaluation.

    09/16/2017 ∙ by Tom Young, et al. ∙ 0 share

    read it

  • Chunk-Based Bi-Scale Decoder for Neural Machine Translation

    In typical neural machine translation (NMT), the decoder generates a sentence word by word, packing all linguistic granularities in the same time-scale of RNN. In this paper, we propose a new type of decoder for NMT, which splits the decode state into two parts and updates them in two different time-scales. Specifically, we first predict a chunk time-scale state for phrasal modeling, on top of which multiple word time-scale states are generated. In this way, the target sentence is translated hierarchically from chunks to words, with information in different granularities being leveraged. Experiments show that our proposed model significantly improves the translation performance over the state-of-the-art NMT model.

    05/03/2017 ∙ by Hao Zhou, et al. ∙ 0 share

    read it

  • Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

    Perception and expression of emotion are key factors to the success of dialogue systems or conversational agents. However, this problem has not been studied in large-scale conversation generation so far. In this paper, we propose Emotional Chatting Machine (ECM) that can generate appropriate responses not only in content (relevant and grammatical) but also in emotion (emotionally consistent). To the best of our knowledge, this is the first work that addresses the emotion factor in large-scale conversation generation. ECM addresses the factor using three new mechanisms that respectively (1) models the high-level abstraction of emotion expressions by embedding emotion categories, (2) captures the change of implicit internal emotion states, and (3) uses explicit emotion expressions with an external emotion vocabulary. Experiments show that the proposed model can generate responses appropriate not only in content but also in emotion.

    04/04/2017 ∙ by Hao Zhou, et al. ∙ 0 share

    read it

  • Strongly Secure and Efficient Data Shuffle On Hardware Enclaves

    Mitigating memory-access attacks on the Intel SGX architecture is an important and open research problem. A natural notion of the mitigation is cache-miss obliviousness which requires the cache-misses emitted during an enclave execution are oblivious to sensitive data. This work realizes the cache-miss obliviousness for the computation of data shuffling. The proposed approach is to software-engineer the oblivious algorithm of Melbourne shuffle on the Intel SGX/TSX architecture, where the Transaction Synchronization eXtension (TSX) is (ab)used to detect the occurrence of cache misses. In the system building, we propose software techniques to prefetch memory data prior to the TSX transaction to defend the physical bus-tapping attacks. Our evaluation based on real implementation shows that our system achieves superior performance and lower transaction abort rate than the related work in the existing literature.

    11/12/2017 ∙ by Ju Chen, et al. ∙ 0 share

    read it

  • Modeling Past and Future for Neural Machine Translation

    Existing neural machine translation systems do not explicitly model what has been translated and what has not during the decoding phase. To address this problem, we propose a novel mechanism that separates the source information into two parts: translated Past contents and untranslated Future contents, which are modeled by two additional recurrent layers. The Past and Future contents are fed to both the attention model and the decoder states, which offers NMT systems the knowledge of translated and untranslated contents. Experimental results show that the proposed approach significantly improves translation performance in Chinese-English, German-English and English-German translation tasks. Specifically, the proposed model outperforms the conventional coverage model in both of the translation quality and the alignment error rate.

    11/27/2017 ∙ by Zaixiang Zheng, et al. ∙ 0 share

    read it

  • Non-parametric sparse additive auto-regressive network models

    Consider a multi-variate time series (X_t)_t=0^T where X_t ∈R^d which may represent spike train responses for multiple neurons in a brain, crime event data across multiple regions, and many others. An important challenge associated with these time series models is to estimate an influence network between the d variables, especially when the number of variables d is large meaning we are in the high-dimensional setting. Prior work has focused on parametric vector auto-regressive models. However, parametric approaches are somewhat restrictive in practice. In this paper, we use the non-parametric sparse additive model (SpAM) framework to address this challenge. Using a combination of β and ϕ-mixing properties of Markov chains and empirical process techniques for reproducing kernel Hilbert spaces (RKHSs), we provide upper bounds on mean-squared error in terms of the sparsity s, logarithm of the dimension d, number of time points T, and the smoothness of the RKHSs. Our rates are sharp up to logarithm factors in many cases. We also provide numerical experiments that support our theoretical results and display potential advantages of using our non-parametric SpAM framework for a Chicago crime dataset.

    01/23/2018 ∙ by Hao Zhou, et al. ∙ 0 share

    read it

  • Why Do Neural Dialog Systems Generate Short and Meaningless Replies? A Comparison between Dialog and Translation

    This paper addresses the question: Why do neural dialog systems generate short and meaningless replies? We conjecture that, in a dialog system, an utterance may have multiple equally plausible replies, causing the deficiency of neural networks in the dialog application. We propose a systematic way to mimic the dialog scenario in a machine translation system, and manage to reproduce the phenomenon of generating short and less meaningful sentences in the translation setting, showing evidence of our conjecture.

    12/06/2017 ∙ by Bolin Wei, et al. ∙ 0 share

    read it