Vision transformers (ViTs) have achieved impressive results on various
c...
We introduce an audiovisual method for long-range text-to-video retrieva...
Human perceives rich auditory experience with distinct sound heard by ea...
Person re-identification (re-ID) aims at recognizing the same person fro...
Audio-visual event localization requires one to identify theevent which ...