WITT: A Wireless Image Transmission Transformer for Semantic Communications

11/02/2022
by   Ke Yang, et al.
0

In this paper, we aim to redesign the vision Transformer (ViT) as a new backbone to realize semantic image transmission, termed wireless image transmission transformer (WITT). Previous works build upon convolutional neural networks (CNNs), which are inefficient in capturing global dependencies, resulting in degraded end-to-end transmission performance especially for high-resolution images. To tackle this, the proposed WITT employs Swin Transformers as a more capable backbone to extract long-range information. Different from ViTs in image classification tasks, WITT is highly optimized for image transmission while considering the effect of the wireless channel. Specifically, we propose a spatial modulation module to scale the latent representations according to channel state information, which enhances the ability of a single model to deal with various channel conditions. As a result, extensive experiments verify that our WITT attains better performance for different image resolutions, distortion metrics, and channel conditions. The code is available at https://github.com/KeYang8/WITT.

READ FULL TEXT

page 2

page 4

research
07/13/2021

CMT: Convolutional Neural Networks Meet Vision Transformers

Vision transformers have been successfully applied to image recognition ...
research
03/24/2021

Vision Transformers for Dense Prediction

We introduce dense vision transformers, an architecture that leverages v...
research
07/13/2022

Trans4Map: Revisiting Holistic Top-down Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers

Humans have an innate ability to sense their surroundings, as they can e...
research
07/14/2023

HEAL-SWIN: A Vision Transformer On The Sphere

High-resolution wide-angle fisheye images are becoming more and more imp...
research
05/26/2022

Perceptual Learned Source-Channel Coding for High-Fidelity Image Semantic Transmission

As one novel approach to realize end-to-end wireless image semantic tran...
research
04/25/2022

SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images

The existing deep learning fusion methods mainly concentrate on the conv...
research
01/11/2019

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

The basic principles in designing convolutional neural network (CNN) str...

Please sign up or login with your details

Forgot password? Click here to reset