Pedestrian detection in the wild remains a challenging problem especiall...
Action understanding has evolved into the era of fine granularity, as mo...
Action in video usually involves the interaction of human with objects.
...
The knowledge graph (KG) stores a large amount of structural knowledge, ...
Training temporal action detection in videos requires large amounts of
l...
Supervised learning requires a large amount of training data, limiting i...
Spatio-temporal action detection in videos requires localizing the actio...
Weakly-supervised action localization problem requires training a model ...
Many interesting events in the real world are rare making preannotated
m...
Meta-learning has become a popular framework for few-shot learning in re...
Human action is naturally compositional: humans can easily recognize and...
Generating realistic images of complex visual scenes becomes very challe...
Given a video and a sentence, the goal of weakly-supervised video moment...
By leveraging the concept of mobile edge computing (MEC), massive amount...
We address the problem of temporal activity detection in continuous,
unt...
Many activities of interest are rare events, with only a few labeled exa...
Events defined by the interaction of objects in a scene often are of cri...
Video generation is an inherently challenging task, as it requires the m...
We propose a novel method capable of retrieving clips from untrimmed vid...
As a fine-grained video understanding task, dense video captioning invol...
Activity detection is a fundamental problem in computer vision. Detectin...
We address the problem of activity detection in continuous, untrimmed vi...
We address the problem of Visual Question Answering (VQA), which require...
Solving the visual symbol grounding problem has long been a goal of
arti...