The self-media era provides us tremendous high quality videos. Unfortuna...
Instant delivery services, such as food delivery and package delivery, h...
In the field of autonomous driving, accurate and comprehensive perceptio...
Early detection of dysplasia of the cervix is critical for cervical canc...
Panoptic Scene Graph Generation (PSG) parses objects and predicts their
...
Street-view imagery provides us with novel experiences to explore differ...
Real-world last-mile delivery datasets are crucial for research in logis...
Traffic forecasting plays a critical role in smart city initiatives and ...
We study the task of spatio-temporal extrapolation that generates data a...
With the rapid growth of Internet video data amounts and types, a unifie...
In today's Internet, HTTP Adaptive Streaming (HAS) is the mainstream sta...
Spatio-temporal graph neural networks (STGNN) have emerged as the domina...
Spatio-temporal graph neural networks (STGNN) have become the most popul...
Air pollution is a crucial issue affecting human health and livelihoods,...
Building robust multimodal models are crucial for achieving reliable
dep...
Automatic transfer of text between domains has become popular in recent
...
Contrastive self-supervised learning has attracted significant research
...
While transformers have shown great potential on video recognition tasks...
Sensors are the key to sensing the environment and imparting benefits to...
Deep learning models are modern tools for spatio-temporal graph (STG)
fo...
Given input images, scene graph generation (SGG) aims to produce
compreh...
Detecting human-object interactions (HOI) is an important step toward a
...
While there has been significant progress towards modelling coherence in...
In the era of MOOCs, online exams are taken by millions of candidates, w...
In this work, we explore a new problem of frame interpolation for speech...
Domain divergence plays a significant role in estimating the performance...
Visual relationship detection aims to reason over relationships among sa...
The abundance of information in web applications make recommendation
ess...
The overwhelming volume and complexity of information in online applicat...
Cross-network recommender systems use auxiliary information from multipl...
A major drawback of cross-network recommender solutions is that they can...
Indoor localization is a fundamental problem in location-based applicati...
Speech as a natural signal is composed of three parts - visemes (visual ...
Multimodal Sentiment Analysis is an active area of research that leverag...
Predicting the runtime complexity of a programming code is an arduous ta...
In this paper, we formulate keyphrase extraction from scholarly articles...
Recognizing emotions in conversations is a challenging task due to the
p...
Lipreading has a lot of potential applications such as in the domain of
...
Sarcasm is often expressed through several verbal and non-verbal cues, e...
There has been upsurge in the number of people participating in challeng...
In multilingual societies like the Indian subcontinent, use of code-swit...
Keyword extraction is a fundamental task in natural language processing ...
Multiple modalities represent different aspects by which information is
...
The literature in automated sarcasm detection has mainly focused on lexi...