We evaluate the ability of semantic parsers based on large language mode...
In this paper, we study the denoising diffusion probabilistic model (DDP...
While recent language models have the ability to take long contexts as i...
Generative AI has made significant strides in computer vision, particula...
Despite the promising progress in multi-modal tasks, current large
multi...
When re-finding items, users who forget or are uncertain about identifyi...
We present a unified framework for camera-space 3D hand pose estimation ...
Model merging (e.g., via interpolation or task arithmetic) fuses multipl...
The most recent efforts in video matting have focused on eliminating tri...
This study explores the concept of equivariance in vision-language found...
We propose Text2Motion, a language-based planning framework enabling rob...
We propose MM-REACT, a system paradigm that integrates ChatGPT with a po...
We present Mesh Pre-Training (MPT), a new pre-training framework that
le...
Masked visual modeling (MVM) has been recently proven effective for visu...
Unified vision-language frameworks have greatly advanced in recent years...
In this paper, we design and train a Generative Image-to-text Transforme...
We present a cross-modal Transformer-based framework, which jointly enco...
This report proposes a combined optimal control and perception framework...
The canonical approach to video captioning dictates a caption generation...
A great challenge in video-language (VidL) modeling lies in the disconne...
Justice-centered approaches to equitable computer science (CS) education...
We introduce the task of open-vocabulary visual instance search (OVIS). ...
We present a graph-convolution-reinforced transformer, named Mesh Grapho...
The expansion of computer science (CS) education in K–12 and
higher-educ...
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct...
While many students now interact with web apps across a variety of smart...
It is highly desirable yet challenging to generate image captions that c...
Over the past decade, undergraduate Computer Science (CS) programs acros...
Standard test sets for supervised learning evaluate in-distribution
gene...
Nonparametric approaches have shown promising results on reconstructing ...
Since hardware resources are limited, the objective of training deep lea...
Text style transfer refers to the task of rephrasing a given text in a
d...
Answering compositional questions that require multiple steps of reasoni...
We introduce the first open-domain dataset, called QuaRTz, for reasoning...
A key component of successfully reading a passage of text is the ability...
The success of supervised deep learning depends on the training labels.
...
Detecting when the underlying distribution changes from the observed tim...
The sequence-to-sequence paradigm employed by neural text-to-SQL models
...
Educational researchers have increasingly drawn attention to how student...
Risk for autism can be influenced by genetic mutations in hundreds of ge...
Changepoint detection methods are used in many areas of science and
engi...
Although deep learning models perform remarkably across a range of tasks...
Fine-grained image search is still a challenging problem due to the
diff...
Generative adversarial networks (GANs) have great successes on synthesiz...
This paper presents a simple yet effective supervised deep hash approach...
In this paper we present two new approaches to efficiently solve large-s...