Text-conditional diffusion models generate high-quality, diverse images....
We introduce the task of spotting temporally precise, fine-grained event...
An ideal learned representation should display transferability and
robus...
Foundation models offer an exciting new paradigm for constructing models...
For machine learning models trained with limited labeled training data,
...
Human pose is a useful feature for fine-grained sports action understand...
Machine learning models are often deployed in different settings than th...
We accelerate deep reinforcement learning-based training in visually com...
We present a text-based tool for editing talking-head video that enables...
We focus on the real-world problem of training accurate deep models for ...
Cable TV news reaches millions of U.S. households each day, meaning that...
We present a system that converts annotated broadcast video of tennis ma...
Our goal is to enable machine learning systems to be trained interactive...
Weak supervision is a popular method for building machine learning model...
Since manually labeling training data is slow and expensive, recent
indu...
Many real-world video analysis applications require the ability to ident...
High-quality computer vision models typically address the problem of
und...
A growing number of visual computing applications depend on the analysis...