YOLO9000: Better, Faster, Stronger

by   Joseph Redmon, et al.

We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don't have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.


YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers

This paper focuses on YOLO-LITE, a real-time object detection model deve...

Cross-dataset Training for Class Increasing Object Detection

We present a conceptually simple, flexible and general framework for cro...

State-of-the-art Models for Object Detection in Various Fields of Application

We present a list of datasets and their best models with the goal of adv...

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs

Performance of object detection models has been growing rapidly on two m...

Few-shot Object Detection with Refined Contrastive Learning

Due to the scarcity of sampling data in reality, few-shot object detecti...

Improved Multiscale Vision Transformers for Classification and Detection

In this paper, we study Multiscale Vision Transformers (MViT) as a unifi...

An Implementation of Faster RCNN with Study for Region Sampling

We adapted the join-training scheme of Faster RCNN framework from Caffe ...

Please sign up or login with your details

Forgot password? Click here to reset