Fei Wang

is this you? claim profile


Assistant Professor at Weill Cornell Medicine

  • Person-in-WiFi: Fine-grained Person Perception using WiFi

    Fine-grained person perception such as body segmentation and pose estimation has been achieved with many 2D and 3D sensors such as RGB/depth cameras, radars (e.g., RF-Pose) and LiDARs. These sensors capture 2D pixels or 3D point clouds of person bodies with high spatial resolution, such that the existing Convolutional Neural Networks can be directly applied for perception. In this paper, we take one step forward to show that fine-grained person perception is possible even with 1D sensors: WiFi antennas. To our knowledge, this is the first work to perceive persons with pervasive WiFi devices, which is cheaper and power efficient than radars and LiDARs, invariant to illumination, and has little privacy concern comparing to cameras. We used two sets of off-the-shelf WiFi antennas to acquire signals, i.e., one transmitter set and one receiver set. Each set contains three antennas lined-up as a regular household WiFi router. The WiFi signal generated by a transmitter antenna, penetrates through and reflects on human bodies, furniture and walls, and then superposes at a receiver antenna as a 1D signal sample (instead of 2D pixels or 3D point clouds). We developed a deep learning approach that uses annotations on 2D images, takes the received 1D WiFi signals as inputs, and performs body segmentation and pose estimation in an end-to-end manner. Experimental results on over 100000 frames under 16 indoor scenes demonstrate that Person-in-WiFi achieved person perception comparable to approaches using 2D images.

    03/30/2019 ∙ by Fei Wang, et al. ∙ 28 share

    read it

  • SE2Net: Siamese Edge-Enhancement Network for Salient Object Detection

    Deep convolutional neural network significantly boosted the capability of salient object detection in handling large variations of scenes and object appearances. However, convolution operations seek to generate strong responses on individual pixels, while lack the ability to maintain the spatial structure of objects. Moreover, the down-sampling operations, such as pooling and striding, lose spatial details of the salient objects. In this paper, we propose a simple yet effective Siamese Edge-Enhancement Network (SE2Net) to preserve the edge structure for salient object detection. Specifically, a novel multi-stage siamese network is built to aggregate the low-level and high-level features, and parallelly estimate the salient maps of edges and regions. As a result, the predicted regions become more accurate by enhancing the responses at edges, and the predicted edges become more semantic by suppressing the false positives in background. After the refined salient maps of edges and regions are produced by the SE2Net, an edge-guided inference algorithm is designed to further improve the resulting salient masks along the predicted edges. Extensive experiments on several benchmark datasets have been conducted, which show that our method is superior than the state-of-the-art approaches.

    03/29/2019 ∙ by Sanping Zhou, et al. ∙ 10 share

    read it

  • Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition

    Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs unscalable and difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor train LSTM and other state-of-the-art competitors.

    11/19/2018 ∙ by Yu Pan, et al. ∙ 8 share

    read it

  • Non-technical Loss Detection with Statistical Profile Images Based on Semi-supervised Learning

    In order to keep track of the operational state of power grid, the world's largest sensor systems, smart grid, was built by deploying hundreds of millions of smart meters. Such system makes it possible to discover and make quick response to any hidden threat to the entire power grid. Non-technical losses (NTLs) have always been a major concern for its consequent security risks as well as immeasurable revenue loss. However, various causes of NTL may have different characteristics reflected in the data. Accurately capturing these anomalies faced with such large scale of collected data records is rather tricky as a result. In this paper, we proposed a new methodology of detecting abnormal electricity consumptions. We did a transformation of the collected time-series data which turns it into an image representation that could well reflect users' relatively long term consumption behaviors. Inspired by the excellent neural network architecture used for objective detection in computer vision domain, we designed our deep learning model that takes the transformed images as input and yields joint featured inferred from the multiple aspects the input provides. Considering the limited labeled samples, especially the abnormal ones, we used our model in a semi-supervised fashion that is brought out in recent years. The model is tested on samples which are verified by on-field inspections and our method showed significant improvement.

    07/09/2019 ∙ by Jiangteng Li, et al. ∙ 3 share

    read it

  • The Devil of Face Recognition is in the Noise

    The growing scale of face recognition datasets empowers us to train strong convolutional networks for face recognition. While a variety of architectures and loss functions have been devised, we still have a limited understanding of the source and consequence of label noise inherent in existing datasets. We make the following contributions: 1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets and cleaned subsets, we profile and analyze label noise properties of MegaFace and MS-Celeb-1M. We show that a few orders more samples are needed to achieve the same accuracy yielded by a clean subset. 3) We study the association between different types of noise, i.e., label flips and outliers, with the accuracy of face recognition models. 4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy. The IMDb-Face dataset has been released on https://github.com/fwang91/IMDb-Face.

    07/31/2018 ∙ by Fei Wang, et al. ∙ 2 share

    read it

  • CSI-Net: Unified Human Body Characterization and Action Recognition

    Channel State Information (CSI) of WiFi signals becomes increasingly attractive in human sensing applications due to the pervasiveness of WiFi, robustness to illumination and view points, and little privacy concern comparing to cameras. In majority of existing works, CSI sequences are analyzed by traditional signal processing approaches. These approaches rely on strictly imposed assumption on propagation paths, reflection and attenuation of signal interacting with human bodies and indoor background. This makes existing approaches very difficult to model the delicate body characteristics and activities in the real applications. To address these issues, we build CSI-Net, a unified Deep Neural Network (DNN), that fully utilizes the strength of deep feature representation and the power of existing DNN architectures for CSI-based human sensing problems. Using CSI-Net, we jointly solved two body characterization problems: biometrics estimation (including body fat, muscle, water and bone rates) and human identification. We also demonstrated the application of CSI-Net on two distinctive action recognition tasks: the hand sign recognition (fine-scaled action of the hand) and falling detection (coarse-scaled motion of the body). Besides the technical contribution of CSI-Net, we present major discoveries and insights on how the multi-frequency CSI signals are encoded and processed in DNNs, which, to the best of our knowledge, is the first attempt that bridges the WiFi sensing and deep learning in human sensing problems.

    10/07/2018 ∙ by Fei Wang, et al. ∙ 2 share

    read it

  • Joint Multi-frame Detection and Segmentation for Multi-cell Tracking

    Tracking living cells in video sequence is difficult, because of cell morphology and high similarities between cells. Tracking-by-detection methods are widely used in multi-cell tracking. We perform multi-cell tracking based on the cell centroid detection, and the performance of the detector has high impact on tracking performance. In this paper, UNet is utilized to extract inter-frame and intra-frame spatio-temporal information of cells. Detection performance of cells in mitotic phase is improved by multi-frame input. Good detection results facilitate multi-cell tracking. A mitosis detection algorithm is proposed to detect cell mitosis and the cell lineage is built up. Another UNet is utilized to acquire primary segmentation. Jointly using detection and primary segmentation, cells can be fine segmented in highly dense cell population. Experiments are conducted to evaluate the effectiveness of our method, and results show its state-of-the-art performance.

    06/26/2019 ∙ by Zibin Zhou, et al. ∙ 1 share

    read it

  • Heuristic Search for Structural Constraints in Data Association

    The research on multi-object tracking (MOT) is essentially to solve for the data association assignment, the core of which is to design the association cost as discriminative as possible. Generally speaking, the match ambiguities caused by similar appearances of objects and the moving cameras make the data association perplexing and challenging. In this paper, we propose a new heuristic method to search for structural constraints (HSSC) of multiple targets when solving the problem of online multi-object tracking. We believe that the internal structure among multiple targets in the adjacent frames could remain constant and stable even though the video sequences are captured by a moving camera. As a result, the structural constraints are able to cut down the match ambiguities caused by the moving cameras as well as similar appearances of the tracked objects. The proposed heuristic method aims to obtain a maximum match set under the minimum structural cost for each available match pair, which can be integrated with the raw association costs and make them more elaborate and discriminative compared with other approaches. In addition, this paper presents a new method to recover missing targets by minimizing the cost function generated from both motion and structure cues. Our online multi-object tracking (MOT) algorithm based on HSSC has achieved the multi-object tracking accuracy (MOTA) of 25.0 on the public dataset 2DMOT2015[1].

    11/08/2017 ∙ by Xiao Zhou, et al. ∙ 0 share

    read it

  • GaDei: On Scale-up Training As A Service For Deep Learning

    Deep learning (DL) training-as-a-service (TaaS) is an important emerging industrial workload. The unique challenge of TaaS is that it must satisfy a wide range of customers who have no experience and resources to tune DL hyper-parameters, and meticulous tuning for each user's dataset is prohibitively expensive. Therefore, TaaS hyper-parameters must be fixed with values that are applicable to all users. IBM Watson Natural Language Classifier (NLC) service, the most popular IBM cognitive service used by thousands of enterprise-level clients around the globe, is a typical TaaS service. By evaluating the NLC workloads, we show that only the conservative hyper-parameter setup (e.g., small mini-batch size and small learning rate) can guarantee acceptable model accuracy for a wide range of customers. We further justify theoretically why such a setup guarantees better model convergence in general. Unfortunately, the small mini-batch size causes a high volume of communication traffic in a parameter-server based system. We characterize the high communication bandwidth requirement of TaaS using representative industrial deep learning workloads and demonstrate that none of the state-of-the-art scale-up or scale-out solutions can satisfy such a requirement. We then present GaDei, an optimized shared-memory based scale-up parameter server design. We prove that the designed protocol is deadlock-free and it processes each gradient exactly once. Our implementation is evaluated on both commercial benchmarks and public benchmarks to demonstrate that it significantly outperforms the state-of-the-art parameter-server based implementation while maintaining the required accuracy and our implementation reaches near the best possible runtime performance, constrained only by the hardware limitation. Furthermore, to the best of our knowledge, GaDei is the only scale-up DL system that provides fault-tolerance.

    11/18/2016 ∙ by Wei Zhang, et al. ∙ 0 share

    read it

  • Google Map Aided Visual Navigation for UAVs in GPS-denied Environment

    We propose a framework for Google Map aided UAV navigation in GPS-denied environment. Geo-referenced navigation provides drift-free localization and does not require loop closures. The UAV position is initialized via correlation, which is simple and efficient. We then use optical flow to predict its position in subsequent frames. During pose tracking, we obtain inter-frame translation either by motion field or homography decomposition, and we use HOG features for registration on Google Map. We employ particle filter to conduct a coarse to fine search to localize the UAV. Offline test using aerial images collected by our quadrotor platform shows promising results as our approach eliminates the drift in dead-reckoning, and the small localization error indicates the superiority of our approach as a supplement to GPS.

    03/29/2017 ∙ by Mo Shan, et al. ∙ 0 share

    read it

  • Model Accuracy and Runtime Tradeoff in Distributed Deep Learning:A Systematic Study

    This paper presents Rudra, a parameter server based distributed computing framework tuned for training large-scale deep neural networks. Using variants of the asynchronous stochastic gradient descent algorithm we study the impact of synchronization protocol, stale gradient updates, minibatch size, learning rates, and number of learners on runtime performance and model accuracy. We introduce a new learning rate modulation strategy to counter the effect of stale gradients and propose a new synchronization protocol that can effectively bound the staleness in gradients, improve runtime performance and achieve good model accuracy. Our empirical investigation reveals a principled approach for distributed training of neural networks: the mini-batch size per learner should be reduced as more learners are added to the system to preserve the model accuracy. We validate this approach using commonly-used image classification benchmarks: CIFAR10 and ImageNet.

    09/14/2015 ∙ by Suyog Gupta, et al. ∙ 0 share

    read it