In recent years, differential privacy has seen significant advancements ...
Personalized Federated Learning (pFL) has emerged as a promising solutio...
We propose Image-to-Image Schrödinger Bridge (I^2SB), a new class of
con...
Augmenting pretrained language models (LMs) with a vision encoder (e.g.,...
Pre-trained vision-language models (e.g., CLIP) have shown promising
zer...
We propose MinVIS, a minimal video instance segmentation (VIS) framework...
Autonomous agents have made great strides in specialist domains like Ata...
Language model (LM) pre-training has proven useful for a wide variety of...
Auditing trained deep learning (DL) models prior to deployment is vital ...
In this work, we study the problem of how to leverage instructional vide...
Generalization has been a long-standing challenge for reinforcement lear...
We address goal-based imitation learning, where the aim is to output the...
Recent learning-to-plan methods have shown promising results on planning...
Modeling and prediction of human motion dynamics has long been a challen...
We address one-shot imitation learning, where the goal is to execute a
p...
We propose a new challenging task: procedure planning in instructional
v...
We address weakly-supervised action alignment and segmentation in videos...
Predicting and forecasting human dynamics is a very interesting but
chal...
Our goal is for a robot to execute a previously unseen task based on a s...
Our goal is to predict future video frames given a sequence of input fra...
We propose an unsupervised method for reference resolution in instructio...
We present an unsupervised representation learning approach that compact...
We propose a weakly-supervised framework for action labeling in video, w...
We develop predictive models of pedestrian dynamics by encoding the coup...
Modern machine learning-based recognition approaches require large-scale...