Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection

08/07/2023
by   Xinhao Deng, et al.
0

Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video. As an important pre-processing step, it has many potential applications in multimedia and vision tasks. With the advance of imaging devices, SOD with high-resolution images is of great demand, recently. However, traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD (HRSOD). Although some HRSOD methods emerge, there are no large enough datasets for training and evaluating. Besides, current HRSOD methods generally produce incomplete object regions and irregular object boundaries. To address above issues, in this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution. As far as we know, it is the largest dataset for the HRSOD task, which will significantly help future works in training and evaluating models. Furthermore, to improve the HRSOD performance, we propose a novel Recurrent Multi-scale Transformer (RMFormer), which recurrently utilizes shared Transformers and multi-scale refinement architectures. Thus, high-resolution saliency maps can be generated with the guidance of lower-resolution predictions. Extensive experiments on both high-resolution and low-resolution benchmarks show the effectiveness and superiority of the proposed framework. The source code and dataset are released at: https://github.com/DrowsyMon/RMFormer.

READ FULL TEXT

page 2

page 3

page 4

page 6

page 8

page 10

page 11

research
08/08/2021

Disentangled High Quality Salient Object Detection

Aiming at discovering and locating most distinctive objects from visual ...
research
03/22/2022

High-resolution Iterative Feedback Network for Camouflaged Object Detection

Spotting camouflaged objects that are visually assimilated into the back...
research
03/29/2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

This paper presents a new Vision Transformer (ViT) architecture Multi-Sc...
research
10/24/2018

Fast and accurate object detection in high resolution 4K and 8K video using GPUs

Machine learning has celebrated a lot of achievements on computer vision...
research
05/21/2018

Object Detection in Equirectangular Panorama

We introduced a high-resolution equirectangular panorama (360-degree, vi...
research
02/11/2021

K-Hairstyle: A Large-scale Korean hairstyle dataset for virtual hair editing and hairstyle classification

The hair and beauty industry is one of the fastest growing industries. T...
research
11/01/2021

HRViT: Multi-Scale High-Resolution Vision Transformer

Vision transformers (ViTs) have attracted much attention for their super...

Please sign up or login with your details

Forgot password? Click here to reset