Composed Image Retrieval (CoIR) has recently gained popularity as a task...
In this work, we introduce Vid2Seq, a multi-modal single-stage dense eve...
Video question answering (VideoQA) is a complex task that requires diver...
Recent methods for visual question answering rely on large-scale annotat...
We consider the problem of localizing a spatio-temporal tube in a video
...
Modern approaches to visual question answering require large annotated
d...
Neural Architecture Search (NAS) is an exciting new field which promises...
The Neural Architecture Search (NAS) problem is typically formulated as ...