Large multimodal datasets have been instrumental in recent breakthroughs...
Open vocabulary models are a promising new paradigm for image classifica...
The last few years have witnessed substantial progress in the field of
e...
We introduce Grounded Situation Recognition (GSR), a task that requires
...