Zhiwu Huang

is this you? claim profile


Postdoctoral Researcher at Computer Vision Lab, ETH Zurich

  • Sliced Wasserstein Generative Models

    In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of high-dimensional distributions. In contrast, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate. In this paper, we introduce novel approximations of the primal and dual SWD. Instead of using a large number of random projections, as it is done by conventional SWD approximation methods, we propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion. As concrete applications of our SWD approximations, we design two types of differentiable SWD blocks to equip modern generative frameworks---Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In the experiments, we not only show the superiority of the proposed generative models on standard image synthesis benchmarks, but also demonstrate the state-of-the-art performance on challenging high resolution image and video generation in an unsupervised manner.

    04/10/2019 ∙ by Jiqing Wu, et al. ∙ 46 share

    read it

  • Energy-relaxed Wassertein GANs(EnergyWGAN): Towards More Stable and High Resolution Image Generation

    Recently, generative adversarial networks (GANs) have achieved great impacts on a broad number of applications, including low resolution(LR) image synthesis. However, they suffer from unstable training especially when image resolution increases. To overcome this bottleneck, this paper generalizes the state-of-the-art Wasserstein GANs (WGANs) to an energy-relaxed objective which enables more stable and higher-resolution image generation. The benefits of this generalization can be summarized in three main points. Firstly, the proposed EnergyWGAN objective guarantees a valid symmetric divergence serving as a more rigorous and meaningful quantitative measure. Secondly, EnergyWGAN is capable of searching a more faithful solution space than the original WGANs without fixing a specific k-Lipschitz constraint. Finally, the proposed EnergyWGAN offers a natural way of stacking GANs for high resolution image generation. In our experiments we not only demonstrate the stable training ability of the proposed EnergyWGAN and its better image generation results on standard benchmark datasets, but also show the advantages over the state-of-the-art GANs on a real-world high resolution image dataset.

    12/04/2017 ∙ by Jiqing Wu, et al. ∙ 0 share

    read it

  • Face Translation between Images and Videos using Identity-aware CycleGAN

    This paper presents a new problem of unpaired face translation between images and videos, which can be applied to facial video prediction and enhancement. In this problem there exist two major technical challenges: 1) designing a robust translation model between static images and dynamic videos, and 2) preserving facial identity during image-video translation. To address such two problems, we generalize the state-of-the-art image-to-image translation network (Cycle-Consistent Adversarial Networks) to the image-to-video/video-to-image translation context by exploiting a image-video translation model and an identity preservation model. In particular, we apply the state-of-the-art Wasserstein GAN technique to the setting of image-video translation for better convergence, and we meanwhile introduce a face verificator to ensure the identity. Experiments on standard image/video face datasets demonstrate the effectiveness of the proposed model in both terms of qualitative and quantitative evaluations.

    12/04/2017 ∙ by Zhiwu Huang, et al. ∙ 0 share

    read it

  • Towards an Understanding of Our World by GANing Videos in the Wild

    Existing generative video models work well only for videos with a static background. For dynamic scenes, applications of these models demand an extra pre-processing step of background stabilization. In fact, the task of background stabilization may very often prove impossible for videos in the wild. To the best of our knowledge, we present the first video generation framework that works in the wild, without making any assumption on the videos' content. This allows us to avoid the background stabilization step, completely. The proposed method also outperforms the state-of-the-art methods even when the static background assumption is valid. This is achieved by designing a robust one-stream video generation architecture by exploiting Wasserstein GAN frameworks for better convergence. Since the proposed architecture is one-stream, which does not formally distinguish between fore- and background, it can generate - and learn from - videos with dynamic backgrounds. The superiority of our model is demonstrated by successfully applying it to three challenging problems: video colorization, video inpainting, and future prediction.

    11/30/2017 ∙ by Bernhard Kratzwald, et al. ∙ 0 share

    read it

  • Generative Autotransporters

    In this paper, we aim to introduce the classic Optimal Transport theory to enhance deep generative probabilistic modeling. For this purpose, we design a Generative Autotransporter (GAT) model with explicit distribution optimal transport. Particularly, the GAT model owns a deep distribution transporter to transfer the target distribution to a specific prior probability distribution, which enables a regular decoder to generate target samples from the input data that follows the transported prior distribution. With such a design, the GAT model can be stably trained to generate novel data by merely using a very simple l_1 reconstruction loss function with a generalized manifold-based Adam training algorithm. The experiments on two standard benchmarks demonstrate its strong generation ability.

    06/08/2017 ∙ by Jiqing Wu, et al. ∙ 0 share

    read it

  • On the Relation between Color Image Denoising and Classification

    Large amount of image denoising literature focuses on single channel images and often experimentally validates the proposed methods on tens of images at most. In this paper, we investigate the interaction between denoising and classification on large scale dataset. Inspired by classification models, we propose a novel deep learning architecture for color (multichannel) image denoising and report on thousands of images from ImageNet dataset as well as commonly used imagery. We study the importance of (sufficient) training data, how semantic class information can be traded for improved denoising results. As a result, our method greatly improves PSNR performance by 0.34 - 0.51 dB on average over state-of-the art methods on large scale dataset. We conclude that it is beneficial to incorporate in classification models. On the other hand, we also study how noise affect classification performance. In the end, we come to a number of interesting conclusions, some being counter-intuitive.

    04/05/2017 ∙ by Jiqing Wu, et al. ∙ 0 share

    read it

  • Deep Learning on Lie Groups for Skeleton-based Action Recognition

    In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time warping, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture to learn more appropriate Lie group features for 3D action recognition. Within the network structure, we design rotation mapping layers to transform the input Lie group features into desirable ones, which are aligned better in the temporal domain. To reduce the high feature dimensionality, the architecture is equipped with rotation pooling layers for the elements on the Lie group. Furthermore, we propose a logarithm mapping layer to map the resulting manifold data into a tangent space that facilitates the application of regular output layers for the final classification. Evaluations of the proposed network for standard 3D human action recognition datasets clearly demonstrate its superiority over existing shallow Lie group feature learning methods as well as most conventional deep learning methods.

    12/18/2016 ∙ by Zhiwu Huang, et al. ∙ 0 share

    read it

  • Building Deep Networks on Grassmann Manifolds

    Learning representations on Grassmann manifolds is popular in quite a few visual recognition tasks. In order to enable deep learning on Grassmann manifolds, this paper proposes a deep network architecture by generalizing the Euclidean network paradigm to Grassmann manifolds. In particular, we design full rank mapping layers to transform input Grassmannian data to more desirable ones, exploit re-orthonormalization layers to normalize the resulting matrices, study projection pooling layers to reduce the model complexity in the Grassmannian context, and devise projection mapping layers to respect Grassmannian geometry and meanwhile achieve Euclidean forms for regular output layers. To train the Grassmann networks, we exploit a stochastic gradient descent setting on manifolds of the connection weights, and study a matrix generalization of backpropagation to update the structured data. The evaluations on three visual recognition tasks show that our Grassmann networks have clear advantages over existing Grassmann learning methods, and achieve results comparable with state-of-the-art approaches.

    11/17/2016 ∙ by Zhiwu Huang, et al. ∙ 0 share

    read it

  • Geometry-aware Similarity Learning on SPD Manifolds for Visual Recognition

    Symmetric Positive Definite (SPD) matrices have been widely used for data representation in many visual recognition tasks. The success mainly attributes to learning discriminative SPD matrices with encoding the Riemannian geometry of the underlying SPD manifold. In this paper, we propose a geometry-aware SPD similarity learning (SPDSL) framework to learn discriminative SPD features by directly pursuing manifold-manifold transformation matrix of column full-rank. Specifically, by exploiting the Riemannian geometry of the manifold of fixed-rank Positive Semidefinite (PSD) matrices, we present a new solution to reduce optimizing over the space of column full-rank transformation matrices to optimizing on the PSD manifold which has a well-established Riemannian structure. Under this solution, we exploit a new supervised SPD similarity learning technique to learn the transformation by regressing the similarities of selected SPD data pairs to their ground-truth similarities on the target SPD manifold. To optimize the proposed objective function, we further derive an algorithm on the PSD manifold. Evaluations on three visual classification tasks show the advantages of the proposed approach over the existing SPD-based discriminant learning methods.

    08/17/2016 ∙ by Zhiwu Huang, et al. ∙ 0 share

    read it

  • A Riemannian Network for SPD Matrix Learning

    Symmetric Positive Definite (SPD) matrix learning methods have become popular in many image and video processing tasks, thanks to their ability to learn appropriate statistical representations while respecting Riemannian geometry of underlying SPD manifolds. In this paper we build a Riemannian network architecture to open up a new direction of SPD matrix non-linear learning in a deep model. In particular, we devise bilinear mapping layers to transform input SPD matrices to more desirable SPD matrices, exploit eigenvalue rectification layers to apply a non-linear activation function to the new SPD matrices, and design an eigenvalue logarithm layer to perform Riemannian computing on the resulting SPD matrices for regular output layers. For training the proposed deep network, we exploit a new backpropagation with a variant of stochastic gradient descent on Stiefel manifolds to update the structured connection weights and the involved SPD matrix data. We show through experiments that the proposed SPD matrix network can be simply trained and outperform existing SPD matrix learning and state-of-the-art methods in three typical visual classification tasks.

    08/15/2016 ∙ by Zhiwu Huang, et al. ∙ 0 share

    read it

  • Cross Euclidean-to-Riemannian Metric Learning with Application to Face Recognition from Video

    Riemannian manifolds have been widely employed for video representations in visual classification tasks including video-based face recognition. The success mainly derives from learning a discriminant Riemannian metric which encodes the non-linear geometry of the underlying Riemannian manifolds. In this paper, we propose a novel metric learning framework to learn a distance metric across a Euclidean space and a Riemannian manifold to fuse the average appearance and pattern variation of faces within one video. The proposed metric learning framework can handle three typical tasks of video-based face recognition: Video-to-Still, Still-to-Video and Video-to-Video settings. To accomplish this new framework, by exploiting typical Riemannian geometries for kernel embedding, we map the source Euclidean space and Riemannian manifold into a common Euclidean subspace, each through a corresponding high-dimensional Reproducing Kernel Hilbert Space (RKHS). With this mapping, the problem of learning a cross-view metric between the two source heterogeneous spaces can be expressed as learning a single-view Euclidean distance metric in the target common Euclidean space. By learning information on heterogeneous data with the shared label, the discriminant metric in the common space improves face recognition from videos. Extensive experiments on four challenging video face databases demonstrate that the proposed framework has a clear advantage over the state-of-the-art methods in the three classical video-based face recognition tasks.

    08/15/2016 ∙ by Zhiwu Huang, et al. ∙ 0 share

    read it