NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

08/23/2023
by   Ziyu Yang, et al.
0

Non-photorealistic videos are in demand with the wave of the metaverse, but lack of sufficient research studies. This work aims to take a step forward to understand how humans perceive non-photorealistic videos with eye fixation (, saliency detection), which is critical for enhancing media production, artistic design, and game user experience. To fill in the gap of missing a suitable dataset for this research line, we present NPF-200, the first large-scale multi-modal dataset of purely non-photorealistic videos with eye fixations. Our dataset has three characteristics: 1) it contains soundtracks that are essential according to vision and psychological studies; 2) it includes diverse semantic content and videos are of high-quality; 3) it has rich motions across and within videos. We conduct a series of analyses to gain deeper insights into this task and compare several state-of-the-art methods to explore the gap between natural images and non-photorealistic data. Additionally, as the human attention system tends to extract visual and audio features with different frequencies, we propose a universal frequency-aware multi-modal non-photorealistic saliency detection model called NPSNet, demonstrating the state-of-the-art performance of our task. The results uncover strengths and weaknesses of multi-modal network design and multi-domain training, opening up promising directions for future works. Our dataset and code can be found at <https://github.com/Yangziyu/NPF200>.

READ FULL TEXT

page 1

page 4

page 6

page 8

research
12/09/2022

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

Temporal video segmentation and classification have been advanced greatl...
research
11/25/2022

TPA-Net: Generate A Dataset for Text to Physics-based Animation

Recent breakthroughs in Vision-Language (V L) joint research have achi...
research
10/07/2019

CrowdFix: An Eyetracking Dataset of Real Life Crowd Videos

Understanding human visual attention and saliency is an integral part of...
research
12/01/2022

FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection

Video synthesis methods rapidly improved in recent years, allowing easy ...
research
06/15/2023

Med-MMHL: A Multi-Modal Dataset for Detecting Human- and LLM-Generated Misinformation in the Medical Domain

The pervasive influence of misinformation has far-reaching and detriment...
research
09/13/2022

M^2-3DLaneNet: Multi-Modal 3D Lane Detection

Estimating accurate lane lines in 3D space remains challenging due to th...
research
09/02/2022

Multi-Modal Experience Inspired AI Creation

AI creation, such as poem or lyrics generation, has attracted increasing...

Please sign up or login with your details

Forgot password? Click here to reset