Yuan-Ting Hu

is this you? claim profile


Graduate Student and Teaching Assistant at University of Illonois

  • Max-Sliced Wasserstein Distance and its use for GANs

    Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning. However, to model high-dimensional distributions, sequential training and stacked architectures are common, increasing the number of tunable hyper-parameters as well as the training time. Nonetheless, the sample complexity of the distance metrics remains one of the factors affecting GAN training. We first show that the recently proposed sliced Wasserstein distance has compelling sample complexity properties when compared to the Wasserstein distance. To further improve the sliced Wasserstein distance we then analyze its `projection complexity' and develop the max-sliced Wasserstein distance which enjoys compelling sample complexity while reducing projection complexity, albeit necessitating a max estimation. We finally illustrate that the proposed distance trains GANs on high-dimensional images up to a resolution of 256x256 easily.

    04/11/2019 ∙ by Ishan Deshpande, et al. ∙ 12 share

    read it

  • VideoMatch: Matching based Video Object Segmentation

    Video object segmentation is challenging yet important in a wide variety of applications for video analysis. Recent works formulate video object segmentation as a prediction task using deep nets to achieve appealing state-of-the-art performance. Due to the formulation as a prediction task, most of these methods require fine-tuning during test time, such that the deep nets memorize the appearance of the objects of interest in the given video. However, fine-tuning is time-consuming and computationally expensive, hence the algorithms are far from real time. To address this issue, we develop a novel matching based algorithm for video object segmentation. In contrast to memorization based classification techniques, the proposed approach learns to match extracted features to a provided template without memorizing the appearance of the objects. We validate the effectiveness and the robustness of the proposed method on the challenging DAVIS-16, DAVIS-17, Youtube-Objects and JumpCut datasets. Extensive results show that our method achieves comparable performance without fine-tuning and is much more favorable in terms of computational time.

    09/04/2018 ∙ by Yuan-Ting Hu, et al. ∙ 6 share

    read it

  • Descriptor Ensemble: An Unsupervised Approach to Descriptor Fusion in the Homography Space

    With the aim to improve the performance of feature matching, we present an unsupervised approach to fuse various local descriptors in the space of homographies. Inspired by the observation that the homographies of correct feature correspondences vary smoothly along the spatial domain, our approach stands on the unsupervised nature of feature matching, and can select a good descriptor for matching each feature point. Specifically, the homography space serves as the common domain, in which a correspondence obtained by any descriptor is considered as a point, for integrating various heterogeneous descriptors. Both geometric coherence and spatial continuity among correspondences are considered via computing their geodesic distances in the space. In this way, mutual verification across different descriptors is allowed, and correct correspondences will be highlighted with a high degree of consistency (i.e., short geodesic distances here). It follows that one-class SVM can be applied to identifying these correct correspondences, and boosts the performance of feature matching. The proposed approach is comprehensively compared with the state-of-the-art approaches, and evaluated on four benchmarks of image matching. The promising results manifest its effectiveness.

    12/13/2014 ∙ by Yuan-Ting Hu, et al. ∙ 0 share

    read it

  • MaskRNN: Instance Level Video Object Segmentation

    Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance -- a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent component and the localization component, our method is able to take advantage of long-term temporal structures of the video data as well as rejecting outliers. We validate the proposed algorithm on three challenging benchmark datasets, the DAVIS-2016 dataset, the DAVIS-2017 dataset, and the Segtrack v2 dataset, achieving state-of-the-art performance on all of them.

    03/29/2018 ∙ by Yuan-Ting Hu, et al. ∙ 0 share

    read it

  • Unsupervised Video Object Segmentation using Motion Saliency-Guided Spatio-Temporal Propagation

    Unsupervised video segmentation plays an important role in a wide variety of applications from object identification to compression. However, to date, fast motion, motion blur and occlusions pose significant challenges. To address these challenges for unsupervised video segmentation, we develop a novel saliency estimation technique as well as a novel neighborhood graph, based on optical flow and edge cues. Our approach leads to significantly better initial foreground-background estimates and their robust as well as accurate diffusion across time. We evaluate our proposed algorithm on the challenging DAVIS, SegTrack v2 and FBMS-59 datasets. Despite the usage of only a standard edge detector trained on 200 images, our method achieves state-of-the-art results outperforming deep learning based methods in the unsupervised setting. We even demonstrate competitive results comparable to deep learning based methods in the semi-supervised setting on the DAVIS dataset.

    09/04/2018 ∙ by Yuan-Ting Hu, et al. ∙ 0 share

    read it