Interpretability is a crucial factor in building reliable models for var...
Foundation models have shown outstanding performance and generalization
...
Mixup is a commonly adopted data augmentation technique for image
classi...
Learning generic joint representations for video and text by a supervise...