All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers

06/18/2021
by   Carmelo Scribano, et al.
0

Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.

READ FULL TEXT

page 1

page 4

research
05/31/2021

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

Vehicle search is one basic task for the efficient traffic management in...
research
04/18/2022

OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval

Retrieving tracked-vehicles by natural language descriptions plays a cri...
research
01/12/2021

CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language Descriptions

Natural Language (NL) descriptions can be the most convenient or the onl...
research
06/22/2022

Symmetric Network with Spatial Relationship Modeling for Natural Language-based Vehicle Retrieval

Natural language (NL) based vehicle retrieval aims to search specific ve...
research
04/22/2021

SBNet: Segmentation-based Network for Natural Language-based Vehicle Search

Natural language-based vehicle retrieval is a task to find a target vehi...
research
05/25/2023

Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language

Due to recent advances in pose-estimation methods, human motion can be e...
research
04/17/2019

Casting Light on Invisible Cities: Computationally Engaging with Literary Criticism

Literary critics often attempt to uncover meaning in a single work of li...

Please sign up or login with your details

Forgot password? Click here to reset