
-
Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline
The control of traffic signals is fundamental and critical to alleviate ...
read it
-
How to Train Your Agent to Read and Write
Reading and writing research papers is one of the most privileged abilit...
read it
-
Semantics for Robotic Mapping, Perception and Interaction: A Survey
For robots to navigate and interact more richly with the world around th...
read it
-
Memory-Gated Recurrent Networks
The essence of multivariate sequential learning is all about how to extr...
read it
-
The Causal Learning of Retail Delinquency
This paper focuses on the expected difference in borrower's repayment wh...
read it
-
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
Texts appearing in daily scenes that can be recognized by OCR (Optical C...
read it
-
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
When describing an image, reading text in the visual scene is crucial to...
read it
-
P3-LOAM: PPP/LiDAR Loosely Coupled SLAM with Accurate Covariance Estimation and Robust RAIM in Urban Canyon Environment
Light Detection and Ranging (LiDAR) based Simultaneous Localization and ...
read it
-
Generative Learning of Heterogeneous Tail Dependence
We propose a multivariate generative model to capture the complex depend...
read it
-
A Recurrent Vision-and-Language BERT for Navigation
Accuracy of many visiolinguistic tasks has benefited significantly from ...
read it
-
Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning
The emerging vision-and-language navigation (VLN) problem aims at learni...
read it
-
Language and Visual Entity Relationship Graph for Agent Navigation
Vision-and-Language Navigation (VLN) requires an agent to navigate in a ...
read it
-
Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning
We propose a parsimonious quantile regression framework to learn the dyn...
read it
-
MARS: Mixed Virtual and Real Wearable Sensors for Human Activity Recognition with Multi-Domain Deep Learning Model
Human activity recognition (HAR) using wearable Inertial Measurement Uni...
read it
-
CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation
Scene graphs are semantic abstraction of images that encourage visual un...
read it
-
Attention-SLAM: A Visual Monocular SLAM Learning from Human Gaze
This paper proposes a novel simultaneous localization and mapping (SLAM)...
read it
-
Data-driven Meta-set Based Fine-Grained Visual Classification
Constructing fine-grained image datasets typically requires domain-speci...
read it
-
Object-and-Action Aware Model for Visual Language Navigation
Vision-and-Language Navigation (VLN) is unique in that it requires turni...
read it
-
Soft Expert Reward Learning for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires an agent to find a specifi...
read it
-
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Visual Question Answering (VQA) has achieved great success thanks to the...
read it
-
Length-Controllable Image Captioning
The last decade has witnessed remarkable progress in the image captionin...
read it
-
Referring Expression Comprehension: A Survey of Methods and Datasets
Referring expression comprehension (REC) aims to localize a target objec...
read it
-
DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue
Visual Dialogue task requires an agent to be engaged in a conversation w...
read it
-
Foreground-Background Imbalance Problem in Deep Object Detectors: A Review
Recent years have witnessed the remarkable developments made by deep lea...
read it
-
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Fact-based Visual Question Answering (FVQA) requires external knowledge ...
read it
-
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based VisualQuestion Answering
Fact-based Visual Question Answering (FVQA) requires external knowledge ...
read it
-
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Conventional referring expression comprehension (REF) assumes people to ...
read it
-
Structured Multimodal Attentions for TextVQA
Text based Visual Question Answering (TextVQA) is a recently raised chal...
read it
-
Quantized Adam with Error Feedback
In this paper, we present a distributed variant of adaptive stochastic g...
read it
-
Sub-Instruction Aware Vision-and-Language Navigation
Vision-and-language navigation requires an agent to navigate through a r...
read it
-
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension
Referring expression comprehension (REF) aims at identifying a particula...
read it
-
Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only
Home design is a complex task that normally requires architects to finis...
read it
-
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Cross-modal retrieval between videos and texts has attracted growing att...
read it
-
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Humans are able to describe image contents with coarse to fine details a...
read it
-
Diva: A Declarative and Reactive Language for In-Situ Visualization
The use of adaptive workflow management for in situ visualization and an...
read it
-
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
Different from Visual Question Answering task that requires to answer on...
read it
-
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
This notebook paper presents our model in the VATEX video captioning cha...
read it
-
Neural Learning of Online Consumer Credit Risk
This paper takes a deep learning approach to understand consumer credit ...
read it
-
Understanding Distributional Ambiguity via Non-robust Chance Constraint
The choice of the ambiguity radius is critical when an investor uses the...
read it
-
Show, Price and Negotiate: A Hierarchical Attention Recurrent Visual Negotiator
Negotiation, as a seller or buyer, is an essential and complicated aspec...
read it
-
RERERE: Remote Embodied Referring Expressions in Real indoor Environments
One of the long-term challenges of robotics is to enable humans to commu...
read it
-
You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding
Visual Grounding (VG) aims to locate the most relevant region in an imag...
read it
-
What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions
One of the core challenges in Visual Dialogue problems is asking the que...
read it
-
An Active Information Seeking Model for Goal-oriented Vision-and-Language Tasks
As Computer Vision algorithms move from passive analysis of pixels to ac...
read it
-
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
The task in referring expression comprehension is to localise the object...
read it
-
Deep Template Matching for Offline Handwritten Chinese Character Recognition
Just like its remarkable achievements in many computer vision tasks, the...
read it
-
Topological Data Analysis Made Easy with the Topology ToolKit
This tutorial presents topological methods for the analysis and visualiz...
read it
-
Learning Semantic Concepts and Order for Image and Sentence Matching
Image and sentence matching has made great progress recently, but it rem...
read it
-
Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards
Despite significant progress in a variety of vision-and-language problem...
read it
-
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
The Visual Dialogue task requires an agent to engage in a conversation a...
read it