Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

02/24/2022
by   Dacheng Yin, et al.
2

This paper addresses the unsupervised learning of content-style decomposed representation. We first give a definition of style and then model the content-style representation as a token-level bipartite graph. An unsupervised framework, named Retriever, is proposed to learn such representations. First, a cross-attention module is employed to retrieve permutation invariant (P.I.) information, defined as style, from the input data. Second, a vector quantization (VQ) module is used, together with man-induced constraints, to produce interpretable content tokens. Last, an innovative link attention module serves as the decoder to reconstruct data from the decomposed content and style, with the help of the linking keys. Being modal-agnostic, the proposed Retriever is evaluated in both speech and image domains. The state-of-the-art zero-shot voice conversion performance confirms the disentangling ability of our framework. Top performance is also achieved in the part discovery task for images, verifying the interpretability of our representation. In addition, the vivid part-based style transfer quality demonstrates the potential of Retriever to support various fascinating generative tasks. Project page at https://ydcustc.github.io/retriever-demo/.

READ FULL TEXT

page 9

page 14

page 15

page 20

page 21

page 22

page 23

page 24

research
08/01/2021

Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization

Text style transfer aims to alter the style (e.g., sentiment) of a sente...
research
05/27/2020

Arbitrary Style Transfer via Multi-Adaptation Network

Arbitrary style transfer is a significant topic with both research value...
research
06/12/2023

MSSRNet: Manipulating Sequential Style Representation for Unsupervised Text Style Transfer

Unsupervised text style transfer task aims to rewrite a text into target...
research
06/16/2021

Global Rhythm Style Transfer Without Text Transcriptions

Prosody plays an important role in characterizing the style of a speaker...
research
02/21/2021

Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement

Content and style (C-S) disentanglement intends to decompose the underly...
research
04/23/2020

Unsupervised Speech Decomposition via Triple Information Bottleneck

Speech information can be roughly decomposed into four components: langu...
research
08/27/2020

Metrics for Exposing the Biases of Content-Style Disentanglement

Recent state-of-the-art semi- and un-supervised solutions for challengin...

Please sign up or login with your details

Forgot password? Click here to reset