Text-to-image diffusion models understand spatial relationship between
o...
Neural fields, which represent signals as a function parameterized by a
...
Pre-trained multi-modal vision-language models (VLMs) are becoming
incre...
Language models have been shown to exhibit positive scaling, where
perfo...
Recent advancements in text-to-image generation have enabled significant...
Deep representation learning is a ubiquitous part of modern computer vis...
Pretraining on large natural image classification datasets such as Image...
Recent text-to-image generative models have exhibited remarkable abiliti...
Given the prevalence of 3D medical imaging technologies such as MRI and ...
Recent multi-modal contrastive learning models have demonstrated the abi...
The task of reconstructing 3D human motion has wideranging applications....
Open World Object Detection (OWOD) is a new and challenging computer vis...
Machine learning (ML) research has generally focused on models, while th...
Automatic surgical activity recognition enables more intelligent surgica...
The ability to perceive 3D human bodies from a single image has a multit...
We present modality gap, an intriguing geometric phenomenon of the
repre...
Open procedures represent the dominant form of surgery worldwide. Artifi...
We consider the task of semi-supervised video object segmentation (VOS)....
Creating representations of shapes that are invari-ant to isometric or
a...
In the biomedical domain, there is an abundance of dense, complex data w...
Instance segmentation is an active topic in computer vision that is usua...
While federated learning traditionally aims to train a single global mod...
Open, or non-laparoscopic surgery, represents the vast majority of all
o...
There exists a need for unsupervised 3D segmentation on complex volumetr...
The 3D world limits the human body pose and the human body pose conveys
...
We study the problem of medical symptoms recognition from patient text, ...
In mobile health (mHealth), reinforcement learning algorithms that adapt...
Active learning aims to develop label-efficient algorithms by querying t...
Homomorphic encryption enables arbitrary computation over data while it
...
Five billion people in the world lack access to quality surgical care.
S...
One in twenty-five patients admitted to a hospital will suffer from a
ho...
Understanding the simultaneously very diverse and intricately fine-grain...
We propose a viewpoint invariant model for 3D human pose estimation from...
In this work we introduce a fully end-to-end approach for action detecti...
Every moment counts in action recognition. A comprehensive understanding...
In this paper we present VideoSET, a method for Video Summary Evaluation...