Composed Image Retrieval (CoIR) has recently gained popularity as a task...
In this work, we introduce Vid2Seq, a multi-modal single-stage dense eve...
Video question answering (VideoQA) is a complex task that requires diver...
Recent methods for visual question answering rely on large-scale annotat...
We consider the problem of localizing a spatio-temporal tube in a video
Modern approaches to visual question answering require large annotated
Neural Architecture Search (NAS) is an exciting new field which promises...
The Neural Architecture Search (NAS) problem is typically formulated as ...