CBNetV2: A Composite Backbone Network Architecture for Object Detection
Consistent performance gains through exploring more effective network structures. In this paper, we propose a novel backbone network, namely CBNetV2, by constructing compositions of existing open-sourced pre-trained backbones. In particular, CBNetV2 architecture groups multiple identical backbones, which are connected through composite connections. Specifically, CBNetV2 integrates the high- and low-level features of multiple backbone networks and gradually expands the receptive field to more efficiently perform object detection. We also propose a better training strategy with the Assistant Supervision for CBNet-based detectors. Without additional pre-training, CBNetV2 can be adapt to various backbones, including manual-based and NAS-based, as well as CNN-based and Transformer-based ones. Experiments provide strong evidence showing that composite backbones are more efficient, effective, and resource-friendly than wider and deeper networks. CBNetV2 is compatible with most mainstream detectors, including one-stage and two-stage detectors, as well as anchor-based and anchor-free-based ones, and significantly improve their performance by more than 3.0 single-scale testing, our HTC Dual-Swin-B achieves 58.6 AP on COCO test-dev, which is significantly better than the state-of-the-art result (i.e., 57.7 Code is released at https://github.com/VDIGPKU/CBNetV2.
READ FULL TEXT