Today, large language models (LLMs) are taught to use new tools by provi...
Deploying large language models (LLMs) is challenging because they are m...
In this paper, we tackle two challenges in multimodal learning for visua...
In Composed Image Retrieval (CIR), a user combines a query image with te...
Vision-language contrastive learning suggests a new learning paradigm by...
Sequence modeling has demonstrated state-of-the-art performance on natur...
Learning invariant representations is an important requirement when trai...
We introduce anomaly clustering, whose goal is to group data into
semant...
Natural reading orders of words are crucial for information extraction f...
Anomaly detection (AD), separating anomalies from normal data, has vario...
Learning visual knowledge from massive weakly-labeled web videos has
att...
Training multiple tasks jointly in one deep network yields reduced laten...
Semi-supervised learning (SSL) has promising potential for improving the...
In this work, we connect two distinct concepts for unsupervised domain
a...
Deep multitask networks, in which one neural network produces multiple
p...
This paper focuses on the task of room layout estimation from a monocula...
We present recursive recurrent neural networks with attention modeling
(...
We seek to improve deep neural networks by generalizing the pooling
oper...
One of the most promising ways of improving the performance of deep
conv...
Our proposed deeply-supervised nets (DSN) method simultaneously minimize...