Multi-Modality Deep Network for Extreme Learned Image Compression

04/26/2023
by   Xuhao Jiang, et al.
0

Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To address this issue, we propose a multimodal machine learning method for text-guided image compression, in which the semantic information of text is used as prior information to guide image compression for better compression performance. We fully study the role of text description in different components of the codec, and demonstrate its effectiveness. In addition, we adopt the image-text attention module and image-request complement module to better fuse image and text features, and propose an improved multimodal semantic-consistent loss to produce semantically complete reconstructions. Extensive experiments, including a user study, prove that our method can obtain visually pleasing results at extremely low bitrates, and achieves a comparable or even better performance than state-of-the-art methods, even though these methods are at 2x to 4x bitrates of ours.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 11

page 12

page 13

research
05/04/2023

Multi-Modality Deep Network for JPEG Artifacts Reduction

In recent years, many convolutional neural network-based models are desi...
research
08/24/2020

Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

We propose a GAN-based image compression method working at extremely low...
research
12/08/2018

Semantically-Aware Attentive Neural Embeddings for Image-based Visual Localization

We present a novel method for fusing appearance and semantic information...
research
04/09/2018

Generative Adversarial Networks for Extreme Learned Image Compression

We propose a framework for extreme learned image compression based on Ge...
research
03/27/2020

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Scene text image contains two levels of contents: visual texture and sem...
research
05/04/2023

Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder

The popular VQ-VAE models reconstruct images through learning a discrete...
research
12/15/2022

Text-guided mask-free local image retouching

In the realm of multi-modality, text-guided image retouching techniques ...

Please sign up or login with your details

Forgot password? Click here to reset