Most existing methods in vision-language retrieval match two modalities ...
Video-text retrieval is a class of cross-modal representation learning
p...
In many real-world datasets, like WebVision, the performance of DNN base...
Many advances of deep learning techniques originate from the efforts of
...