YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

04/12/2020
by   Shizhe Chen, et al.
16

The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. We propose two novel question-answering tasks to evaluate models' fine-grained action understanding abilities. The first task is Facial Image Ordering, which aims to understand visual effects of different actions expressed in natural language to the facial object. The second task is Step Ordering, which aims to measure cross-modal semantic alignments between untrimmed videos and multi-sentence texts. In this paper, we present the challenge guidelines, the dataset used, and performances of baseline models on the two proposed tasks. The baseline codes and models are released at <https://github.com/AIM3-RUC/YouMakeup_Baseline>.

READ FULL TEXT

page 2

page 3

research
01/20/2019

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

Existing visual reasoning datasets such as Visual Question Answering (VQ...
research
01/22/2023

Champion Solution for the WSDM2023 Toloka VQA Challenge

In this report, we present our champion solution to the WSDM2023 Toloka ...
research
08/04/2017

Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering

Visual question answering (VQA) is challenging because it requires a sim...
research
03/28/2022

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

Assembly101 is a new procedural activity dataset featuring 4321 videos o...
research
08/11/2023

Detecting and Preventing Hallucinations in Large Vision Language Models

Instruction tuned Large Vision Language Models (LVLMs) have made signifi...
research
01/31/2023

Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022

Sports video analysis is a widespread research topic. Its applications a...
research
12/16/2021

Sports Video: Fine-Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2021

Sports video analysis is a prevalent research topic due to the variety o...

Please sign up or login with your details

Forgot password? Click here to reset