PanoSwin: a Pano-style Swin Transformer for Panorama Understanding

08/28/2023
by   Zhixin Ling, et al.
0

In panorama understanding, the widely used equirectangular projection (ERP) entails boundary discontinuity and spatial distortion. It severely deteriorates the conventional CNNs and vision Transformers on panoramas. In this paper, we propose a simple yet effective architecture named PanoSwin to learn panorama representations with ERP. To deal with the challenges brought by equirectangular projection, we explore a pano-style shift windowing scheme and novel pitch attention to address the boundary discontinuity and the spatial distortion, respectively. Besides, based on spherical distance and Cartesian coordinates, we adapt absolute positional embeddings and relative positional biases for panoramas to enhance panoramic geometry information. Realizing that planar image understanding might share some common knowledge with panorama understanding, we devise a novel two-stage learning framework to facilitate knowledge transfer from the planar images to panoramas. We conduct experiments against the state-of-the-art on various panoramic tasks, i.e., panoramic object detection, panoramic classification, and panoramic layout estimation. The experimental results demonstrate the effectiveness of PanoSwin in panorama understanding.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

research
02/10/2022

Spherical Transformer

Using convolutional neural networks for 360images can induce sub-optimal...
research
07/27/2019

Reprojection R-CNN: A Fast and Accurate Object Detector for 360° Images

360 images are usually represented in either equirectangular projection ...
research
05/30/2021

StyTr^2: Unbiased Image Style Transfer with Transformers

The goal of image style transfer is to render an image with artistic fea...
research
02/06/2022

GLPanoDepth: Global-to-Local Panoramic Depth Estimation

In this paper, we propose a learning-based method for predicting dense d...
research
03/07/2022

Knowledge Amalgamation for Object Detection with Transformers

Knowledge amalgamation (KA) is a novel deep model reusing task aiming to...
research
08/28/2020

Distortion-Adaptive Grape Bunch Counting for Omnidirectional Images

This paper proposes the first object counting method for omnidirectional...
research
04/06/2023

Geometric-aware Pretraining for Vision-centric 3D Object Detection

Multi-camera 3D object detection for autonomous driving is a challenging...

Please sign up or login with your details

Forgot password? Click here to reset