Following the success of GPT4, there has been a surge in interest in
mul...
Text-conditioned image editing has emerged as a powerful tool for editin...
Advancements in large pre-trained generative models have expanded their
...
Text-to-image diffusion models have attracted considerable interest due ...
Recently, large language models (LLMs) have made significant advancement...
Instruction tuning large language models (LLMs) using machine-generated
...
Recent state-of-the-art methods in imbalanced semi-supervised learning (...
With generative models proliferating at a rapid rate, there is a growing...
Image-text contrastive learning models such as CLIP have demonstrated st...
Large-scale text-to-image diffusion models have made amazing advances.
H...
We present X-Decoder, a generalized decoding model that can predict
pixe...
Mix-up training approaches have proven to be effective in improving the
...
We introduce a new method for diverse foreground generation with explici...
Recent state-of-the-art methods in semi-supervised learning (SSL) combin...
Knowledge distillation aims to transfer useful information from a teache...
Learning visual representations from natural language supervision has
re...
Training with an emphasis on "hard-to-learn" components of the data has ...
Edge detection has long been an important problem in the field of comput...
3D-aware generative models have shown that the introduction of 3D inform...
Masked autoencoding has achieved great success for self-supervised learn...
We propose a new approach for high resolution semantic image synthesis. ...
Timely detection of horse pain is important for equine welfare. Horses
e...
Training generative models, such as GANs, on a target domain containing
...
Video inpainting aims to fill spatio-temporal "corrupted" regions with
p...
We consider the novel task of learning disentangled representations of o...
We propose YolactEdge, the first competitive instance segmentation appro...
Aliasing refers to the phenomenon that high frequency signals degenerate...
Weakly supervised learning has emerged as a compelling tool for object
d...
We present a method for weakly-supervised action localization based on g...
We present Audiovisual SlowFast Networks, an architecture for integrated...
Existing models often leverage co-occurrences between objects and their
...
We present a simple, fully-convolutional model for real-time (>30 fps)
i...
Cameras are prevalent in our daily lives, and enable many useful systems...
We present MixNMatch, a conditional generative model that learns to
dise...
We present MixNMatch, a conditional generative model that learns to
dise...
We propose a novel unsupervised generative model, Elastic-InfoGAN, that
...
We present a novel deep neural network architecture for end-to-end scene...
We present a simple, fully-convolutional model for real-time instance
se...
We propose FineGAN, a novel unsupervised GAN framework, which disentangl...
We propose 'Hide-and-Seek' a general purpose data augmentation technique...
We introduce a novel multimodal machine translation model that utilizes
...
We propose the idea of transferring common-sense knowledge from source
c...
There is an increasing concern in computer vision devices invading the
p...
We introduce Spatial-Temporal Memory Networks (STMN) for video object
de...
In human learning, it is common to use multiple sources of information
j...
Content popularity prediction has been extensively studied due to its
im...
We propose a weakly-supervised approach that takes image-sentence pairs ...
We propose an end-to-end deep convolutional network to simultaneously
lo...
The status quo approach to training object detectors requires expensive
...
The increasing prominence of weakly labeled data nurtures a growing dema...