The medical conversational question answering (CQA) system aims at provi...
Previous methods for dynamic facial expression recognition (DFER) in the...
With the development of deep learning, advanced dialogue generation meth...
What is an image and how to extract latent features? Convolutional Netwo...
Conditional variational models, using either continuous or discrete late...
Complex dialogue mappings (CDM), including one-to-many and many-to-one
m...
We introduce a new task, named video corpus visual answer localization
(...
Multiple object tracking (MOT) is the task containing detection and
asso...
This paper introduces the schemes of Team LingJing's experiments in
NLPC...
Infrared and visible images, as multi-modal image pairs, show significan...
Image rasterization is a mature technique in computer graphics, while im...
There is a growing interest in improving the conversational ability of m...
Previous methods for dynamic facial expression in the wild are mainly ba...
Generative dialogue models suffer badly from the generic response proble...
The medical conversational system can relieve the burden of doctors and
...
This paper describes the LingJing team's method to the Workshop on
Compu...
Convolutional neural network (CNN) has achieved great success on image
s...
The temporal answering grounding in the video (TAGV) is a new task natur...
Acronym disambiguation means finding the correct meaning of an ambiguous...
Acronym extraction aims to find acronyms (i.e., short-forms) and their
m...
In this paper, we extensively present our solutions for the MuSe-Stress
...
Sign language is commonly used by deaf or mute people to communicate but...
Domain adaptation aims to leverage information from the source domain to...
Medical Dialogue Generation (MDG) is intended to build a medical dialogu...
Generating personalized responses is one of the major challenges in natu...
Conditional Variational AutoEncoder (CVAE) effectively increases the
div...
Many existing conversation models that are based on the encoder-decoder
...
Zero-shot action recognition can recognize samples of unseen classes tha...
3D action recognition is referred to as the classification of action
seq...
Facial Expression Recognition (FER) in the wild is extremely challenging...
Sign language is a visual language that is used by deaf or speech impair...
Human dialogues are scenario-based and appropriate responses generally r...
Hyperspectral image (HSI) with high spectral resolution often suffers fr...
Segmentation of multiple organs-at-risk (OARs) is essential for radiatio...
Human pose estimation has made significant advancement in recent years.
...
Deep learning has become popular in recent years primarily due to the
po...
There is an urgent need to apply face alignment in a memory-efficient an...
Multi-modal human motion analysis is a critical and attractive research
...
The task of cross-view image geo-localization aims to determine the
geo-...