Wei Yang

is this you? claim profile

0 followers

  • Hierarchical Macro Strategy Model for MOBA Game AI

    The next challenge of game AI lies in Real Time Strategy (RTS) games. RTS games provide partially observable gaming environments, where agents interact with one another in an action space much larger than that of GO. Mastering RTS games requires both strong macro strategies and delicate micro level execution. Recently, great progress has been made in micro level execution, while complete solutions for macro strategies are still lacking. In this paper, we propose a novel learning-based Hierarchical Macro Strategy model for mastering MOBA games, a sub-genre of RTS games. Trained by the Hierarchical Macro Strategy model, agents explicitly make macro strategy decisions and further guide their micro level execution. Moreover, each of the agents makes independent strategy decisions, while simultaneously communicating with the allies through leveraging a novel imitated cross-agent communication mechanism. We perform comprehensive evaluations on a popular 5v5 Multiplayer Online Battle Arena (MOBA) game. Our 5-AI team achieves a 48 teams which are ranked top 1

    12/19/2018 ∙ by Bin Wu, et al. ∙ 6 share

    read it

  • Identity-Aware Textual-Visual Matching with Latent Co-attention

    Textual-visual matching aims at measuring similarities between sentence descriptions and images. Most existing methods tackle this problem without effectively utilizing identity-level annotations. In this paper, we propose an identity-aware two-stage framework for the textual-visual matching problem. Our stage-1 CNN-LSTM network learns to embed cross-modal features with a novel Cross-Modal Cross-Entropy (CMCE) loss. The stage-1 network is able to efficiently screen easy incorrect matchings and also provide initial training point for the stage-2 training. The stage-2 CNN-LSTM network refines the matching results with a latent co-attention mechanism. The spatial attention relates each word with corresponding image regions while the latent semantic attention aligns different sentence structures to make the matching results more robust to sentence structure variations. Extensive experiments on three datasets with identity-level annotations show that our framework outperforms state-of-the-art approaches by large margins.

    08/07/2017 ∙ by Shuang Li, et al. ∙ 0 share

    read it

  • Learning Feature Pyramids for Human Pose Estimation

    Articulated human pose estimation is a fundamental yet challenging task in computer vision. The difficulty is particularly pronounced in scale variations of human body parts when camera view changes or severe foreshortening happens. Although pyramid methods are widely used to handle scale changes at inference time, learning feature pyramids in deep convolutional neural networks (DCNNs) is still not well explored. In this work, we design a Pyramid Residual Module (PRMs) to enhance the invariance in scales of DCNNs. Given input features, the PRMs learn convolutional filters on various scales of input features, which are obtained with different subsampling ratios in a multi-branch network. Moreover, we observe that it is inappropriate to adopt existing methods to initialize the weights of multi-branch networks, which achieve superior performance than plain networks in many tasks recently. Therefore, we provide theoretic derivation to extend the current weight initialization scheme to multi-branch network structures. We investigate our method on two standard benchmarks for human pose estimation. Our approach obtains state-of-the-art results on both benchmarks. Code is available at https://github.com/bearpaw/PyraNet.

    08/03/2017 ∙ by Wei Yang, et al. ∙ 0 share

    read it

  • Multi-Context Attention for Human Pose Estimation

    In this paper, we propose to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation. We adopt stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics. The Conditional Random Field (CRF) is utilized to model the correlations among neighboring regions in the attention map. We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on the detailed description for different body parts. Hence our model has the ability to focus on different granularity from local salient regions to global semantic-consistent spaces. Additionally, we design novel Hourglass Residual Units (HRUs) to increase the receptive field of the network. These units are extensions of residual units with a side branch incorporating filters with larger receptive fields, hence features with various scales are learned and combined within the HRUs. The effectiveness of the proposed multi-context attention mechanism and the hourglass residual units is evaluated on two widely used human pose estimation benchmarks. Our approach outperforms all existing methods on both benchmarks over all the body parts.

    02/24/2017 ∙ by Xiao Chu, et al. ∙ 0 share

    read it

  • Resolving Scale Ambiguity Via XSlit Aspect Ratio Analysis

    In perspective cameras, images of a frontal-parallel 3D object preserve its aspect ratio invariant to its depth. Such an invariance is useful in photography but is unique to perspective projection. In this paper, we show that alternative non-perspective cameras such as the crossed-slit or XSlit cameras exhibit a different depth-dependent aspect ratio (DDAR) property that can be used to 3D recovery. We first conduct a comprehensive analysis to characterize DDAR, infer object depth from its AR, and model recoverable depth range, sensitivity, and error. We show that repeated shape patterns in real Manhattan World scenes can be used for 3D reconstruction using a single XSlit image. We also extend our analysis to model slopes of lines. Specifically, parallel 3D lines exhibit depth-dependent slopes (DDS) on their images which can also be used to infer their depths. We validate our analyses using real XSlit cameras, XSlit panoramas, and catadioptric mirrors. Experiments show that DDAR and DDS provide important depth cues and enable effective single-image scene reconstruction.

    06/14/2015 ∙ by Wei Yang, et al. ∙ 0 share

    read it

  • Data-Driven Scene Understanding with Adaptively Retrieved Exemplars

    This article investigates a data-driven approach for semantically scene understanding, without pixelwise annotation and classifier training. Our framework parses a target image with two steps: (i) retrieving its exemplars (i.e. references) from an image database, where all images are unsegmented but annotated with tags; (ii) recovering its pixel labels by propagating semantics from the references. We present a novel framework making the two steps mutually conditional and bootstrapped under the probabilistic Expectation-Maximization (EM) formulation. In the first step, the references are selected by jointly matching their appearances with the target as well as the semantics (i.e. the assigned labels of the target and the references). We process the second step via a combinatorial graphical representation, in which the vertices are superpixels extracted from the target and its selected references. Then we derive the potentials of assigning labels to one vertex of the target, which depend upon the graph edges that connect the vertex to its spatial neighbors of the target and to its similar vertices of the references. Besides, the proposed framework can be naturally applied to perform image annotation on new test images. In the experiments, we validate our approach on two public databases, and demonstrate superior performances over the state-of-the-art methods in both semantic segmentation and image annotation tasks.

    02/03/2015 ∙ by Xionghao Liu, et al. ∙ 0 share

    read it

  • Clothing Co-Parsing by Joint Image Segmentation and Labeling

    This paper aims at developing an integrated system of clothing co-parsing, in order to jointly parse a set of clothing images (unsegmented but annotated with tags) into semantic configurations. We propose a data-driven framework consisting of two phases of inference. The first phase, referred as "image co-segmentation", iterates to extract consistent regions on images and jointly refines the regions over all images by employing the exemplar-SVM (E-SVM) technique [23]. In the second phase (i.e. "region co-labeling"), we construct a multi-image graphical model by taking the segmented regions as vertices, and incorporate several contexts of clothing configuration (e.g., item location and mutual interactions). The joint label assignment can be solved using the efficient Graph Cuts algorithm. In addition to evaluate our framework on the Fashionista dataset [30], we construct a dataset called CCP consisting of 2098 high-resolution street fashion photos to demonstrate the performance of our system. We achieve 90.29 recognition rate on the Fashionista and the CCP datasets, respectively, which are superior compared with state-of-the-art methods.

    02/03/2015 ∙ by Wei Yang, et al. ∙ 0 share

    read it

  • Learning Contour-Fragment-based Shape Model with And-Or Tree Representation

    This paper proposes a simple yet effective method to learn the hierarchical object shape model consisting of local contour fragments, which represents a category of shapes in the form of an And-Or tree. This model extends the traditional hierarchical tree structures by introducing the "switch" variables (i.e. the or-nodes) that explicitly specify production rules to capture shape variations. We thus define the model with three layers: the leaf-nodes for detecting local contour fragments, the or-nodes specifying selection of leaf-nodes, and the root-node encoding the holistic distortion. In the training stage, for optimization of the And-Or tree learning, we extend the concave-convex procedure (CCCP) by embedding the structural clustering during the iterative learning steps. The inference of shape detection is consistent with the model optimization, which integrates the local testings via the leaf-nodes and or-nodes with the global verification via the root-node. The advantages of our approach are validated on the challenging shape databases (i.e., ETHZ and INRIA Horse) and summarized as follows. (1) The proposed method is able to accurately localize shape contours against unreliable edge detection and edge tracing. (2) The And-Or tree model enables us to well capture the intraclass variance.

    02/03/2015 ∙ by Liang Lin, et al. ∙ 0 share

    read it

  • Discriminatively Trained And-Or Graph Models for Object Shape Detection

    In this paper, we investigate a novel reconfigurable part-based model, namely And-Or graph model, to recognize object shapes in images. Our proposed model consists of four layers: leaf-nodes at the bottom are local classifiers for detecting contour fragments; or-nodes above the leaf-nodes function as the switches to activate their child leaf-nodes, making the model reconfigurable during inference; and-nodes in a higher layer capture holistic shape deformations; one root-node on the top, which is also an or-node, activates one of its child and-nodes to deal with large global variations (e.g. different poses and views). We propose a novel structural optimization algorithm to discriminatively train the And-Or model from weakly annotated data. This algorithm iteratively determines the model structures (e.g. the nodes and their layouts) along with the parameter learning. On several challenging datasets, our model demonstrates the effectiveness to perform robust shape-based object detection against background clutter and outperforms the other state-of-the-art approaches. We also release a new shape database with annotations, which includes more than 1500 challenging shape instances, for recognition and detection.

    02/02/2015 ∙ by Liang Lin, et al. ∙ 0 share

    read it

  • 3D Human Pose Estimation in the Wild by Adversarial Learning

    Recently, remarkable advances have been achieved in 3D human pose estimation from monocular images because of the powerful Deep Convolutional Neural Networks (DCNNs). Despite their success on large-scale datasets collected in the constrained lab environment, it is difficult to obtain the 3D pose annotations for in-the-wild images. Therefore, 3D human pose estimation in the wild is still a challenge. In this paper, we propose an adversarial learning framework, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations. Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild. We also observe that a carefully designed information source for the discriminator is essential to boost the performance. Thus, we design a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator. The efficacy of our adversarial learning framework with the new geometric descriptor has been demonstrated through extensive experiments on widely used public benchmarks. Our approach significantly improves the performance compared with previous state-of-the-art approaches.

    03/26/2018 ∙ by Wei Yang, et al. ∙ 0 share

    read it

  • Evaluating Non-Motorized Transport Popularity of Urban Roads by Sports GPS Tracks

    Non-motorized transport is becoming increasingly important in urban development of cities in China. How to evaluate the non-motorized transport popularity of urban roads is an interesting question to study. The great amount of tracking data generated by smart mobile devices give us opportunities to solve this problem. This study aims to provide a data driven method for evaluating the popularity (walkability and bikeability) of urban non-motorized transport system. This paper defines a p-index to evaluate the popular degree of road segments which is based on the cycling, running, and walking GPS track data from outdoor activities logging applications. According to the p-index definition, this paper evaluates the non-motorized transport popularity of urban area in Wuhan city within different temporal periods.

    05/24/2018 ∙ by Wei Lu, et al. ∙ 0 share

    read it