AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

03/08/2022
by   Benita Wong, et al.
0

A long-standing goal of intelligent assistants such as AR glasses/robots has been to assist users in affordance-centric real-world scenarios, such as "how can I run the microwave for 1 minute?". However, there is still no clear task definition and suitable benchmarks. In this paper, we define a new task called Affordance-centric Question-driven Task Completion, where the AI assistant should learn from instructional videos and scripts to guide the user step-by-step. To support the task, we constructed AssistQ, a new dataset comprising 529 question-answer samples derived from 100 newly filmed first-person videos. Each question should be completed with multi-step guidances by inferring from visual details (e.g., buttons' position) and textural details (e.g., actions like press/turn). To address this unique task, we developed a Question-to-Actions (Q2A) model that significantly outperforms several baseline methods while still having large room for improvement. We expect our task and dataset to advance Egocentric AI Assistant's development. Our project page is available at: https://showlab.github.io/assistq

READ FULL TEXT

page 2

page 8

page 9

research
06/20/2022

Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric Approach

Affordance-centric Question-driven Task Completion for Egocentric Assist...
research
11/30/2021

AssistSR: Affordance-centric Question-driven Video Segment Retrieval

It is still a pipe dream that AI assistants on phone and AR glasses can ...
research
06/26/2023

A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

Affordance-centric Question-driven Task Completion (AQTC) for Egocentric...
research
03/06/2023

Confidence-based Event-centric Online Video Question Answering on a Newly Constructed ATBS Dataset

Deep neural networks facilitate video question answering (VideoQA), but ...
research
05/02/2020

A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

Procedural knowledge, which we define as concrete information about the ...
research
07/10/2022

Human-Centric Research for NLP: Towards a Definition and Guiding Questions

With Human-Centric Research (HCR) we can steer research activities so th...
research
12/17/2021

From 3-DoF to 6-DoF: New Metrics to Analyse Users Behaviour in Immersive Applications

This work focuses on enabling user-centric immersive systems, in which e...

Please sign up or login with your details

Forgot password? Click here to reset