SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective

03/16/2023
by   Zipeng Xu, et al.
0

Contrastive Language-Image Pre-Training (CLIP) has refreshed the state of the art for a broad range of vision-language cross-modal tasks. Particularly, it has created an intriguing research line of text-guided image style transfer, dispensing with the need for style reference images as in traditional style transfer methods. However, directly using CLIP to guide the transfer of style leads to undesirable artifacts (mainly written words and unrelated visual entities) spread over the image, partly due to the entanglement of visual and written concepts inherent in CLIP. Inspired by the use of spectral analysis in filtering linguistic information at different granular levels, we analyse the patch embeddings from the last layer of the CLIP vision encoder from the perspective of spectral analysis and find that the presence of undesirable artifacts is highly correlated to some certain frequency components. We propose SpectralCLIP, which implements a spectral filtering layer on top of the CLIP vision encoder, to alleviate the artifact issue. Experimental results show that SpectralCLIP prevents the generation of artifacts effectively in quantitative and qualitative terms, without impairing the stylisation quality. We further apply SpectralCLIP to text-conditioned image generation and show that it prevents written words in the generated images. Code is available at https://github.com/zipengxuc/SpectralCLIP.

READ FULL TEXT

page 1

page 5

page 7

page 12

page 13

page 15

page 16

page 17

research
10/07/2022

FastCLIPStyler: Towards fast text-based image style transfer using style representation

Artistic style transfer is usually performed between two images, a style...
research
05/19/2022

Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning

In this work, we tackle the challenging problem of arbitrary image style...
research
03/09/2023

A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning

We present Unified Contrastive Arbitrary Style Transfer (UCAST), a novel...
research
08/27/2022

AesUST: Towards Aesthetic-Enhanced Universal Style Transfer

Recent studies have shown remarkable success in universal style transfer...
research
10/20/2022

TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition

Creation of 3D content by stylization is a promising yet challenging pro...
research
04/10/2023

ITportrait: Image-Text Coupled 3D Portrait Domain Adaptation

Domain adaptation of 3D portraits has gained more and more attention. Ho...
research
08/16/2022

Language-guided Semantic Style Transfer of 3D Indoor Scenes

We address the new problem of language-guided semantic style transfer of...

Please sign up or login with your details

Forgot password? Click here to reset