We propose Encyclopedic-VQA, a large scale visual question answering (VQ...
Videos can be created by first outlining a global view of the scene and ...
Videos can often be created by first outlining a global description of t...
We investigate a strategy for improving the computational efficiency of
...
Predicting future frames for a video sequence is a challenging generativ...
There is growing interest in artificial intelligence to build socially
i...
People can recognize scenes across many different modalities beyond natu...
People can recognize scenes across many different modalities beyond natu...