Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation

by   Yifeng Shi, et al.

With the continuous improvement of computing power and deep learning algorithms in recent years, the foundation model has grown in popularity. Because of its powerful capabilities and excellent performance, this technology is being adopted and applied by an increasing number of industries. In the intelligent transportation industry, artificial intelligence faces the following typical challenges: few shots, poor generalization, and a lack of multi-modal techniques. Foundation model technology can significantly alleviate the aforementioned issues. To address these, we designed the 1st Foundation Model Challenge, with the goal of increasing the popularity of foundation model technology in traffic scenarios and promoting the rapid development of the intelligent transportation industry. The challenge is divided into two tracks: all-in-one and cross-modal image retrieval. Furthermore, we provide a new baseline and benchmark for the two tracks, called Open-TransMind. According to our knowledge, Open-TransMind is the first open-source transportation foundation model with multi-task and multi-modal capabilities. Simultaneously, Open-TransMind can achieve state-of-the-art performance on detection, classification, and segmentation datasets of traffic scenarios. Our source code is available at


page 6

page 7


Visual Prompt Multi-Modal Tracking

Visible-modal object tracking gives rise to a series of downstream multi...

Deep Learning Algorithms for Rotating Machinery Intelligent Diagnosis: An Open Source Benchmark Study

With the development of artificial intelligence and deep learning (DL) t...

OpenTripPlanner, OpenStreetMap, General Transit Feed Specification: Tools for Disaster Relief and Recovery

Open Trip Planner was identified as the most promising open source multi...

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

In this report, we present our champion solutions to five tracks at Ego4...

Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection

This technical report introduces the winning solution of the team \texti...

Team Triple-Check at Factify 2: Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact Verification

Multi-modal fact verification has become an important but challenging is...

Shipper collaboration matching: fast enumeration of triangular transports with high cooperation effects

The logistics industry in Japan is facing a severe shortage of labor. Th...

Please sign up or login with your details

Forgot password? Click here to reset