1 Introduction
Designing lowlatency neural networks is critical to the application on mobile devices, e.g., mobile phones, security cameras, augmented reality glasses, selfdriving cars, and many others. Neural architecture search(NAS) is expected to automatically search a neural network where the performance surpasses the humandesigned one under the most common constraints, such as latency, FLOPs and parameter number. One of the key factors for the success of NAS is the artificially designed search space. Many previous NAS methods are designed to search on DARTS space
[19, 24, 20] and MobileNetlike space[12, 23, 25]. DARTS space contains a multibranch topology and a combination of various paths with different operations and complexities. The multibranch design can enrich the feature space and improve the performance[22]. Multibranch structure is also used in some neural networks designed by human before, e.g., the Inception models[22] and DenseNet models[13]. However, these multibranch architectures result in a longer inference time that is unfriendly for real tasks. Therefore, many applicable NAS methods are adopted to MobileNetlike space by keeping only two branches(skip connection and convolutional operation) for searched networks to realize a tradeoff between the inference time and the performance.To maintain the inference speed of a single branch while retaining the advantages of multiple branches, several Rep techniques[8, 4, 6, 7, 5]
have been introduced. Rep techniques can retain multibranch topology at training time while keeping a single path at running time, and thus can realize a balance of accuracy and efficiency. The multibranch can be fused to one path because of linear operation, for example, a 3x3 Conv and a 1x1 Conv can be replaced by a 3x3 Conv by padding the 1x1 kernel to 3x3 and elementwise adding with the 3x3 Conv. It is worthy to note that the fused network has a lower inference time while keeping almost the same accuracy compared with the multibranch network.
However, prior works that utilize Rep techniques to train models have some limitations. 1) The branch number and branch types of each block. The multibranch training requires multiple memory(increasing linearly with the branches number) to save the middle feature representations. Therefore, the branch number and branch types of each block are manually fixed because of memory constraints. For instance, the RepVGG block only contains a 1x1 Conv, a 3x3 Conv and a skip connection while the ACB has 1x3 Conv and 3x1 Conv. 2) The role of each branch is unclear, which means some branches may counterproductive. Ding[6] experimented with various micro Diverse Branch Block(DBB) structures to explore which structure works better. However, on one hand, the manual micro DBB(all blocks are the same) is always suboptimal. Many NAS work[10, 25, 12] revealed that the optimal structure of blocks is different from each other. On the other hand, it is infeasible to design macro DBB of which search space approximately reaches (Assuming there have 30 blocks and each block has 4 branches).
To address the limitation 1), we devise a multibranch search space, called Rep search space. Unlike previous works that use Rep techniques, each block can preserve an arbitrary number of branches and all block architectures are independent in this paper. Facing increasing memory, the total branch number of the model can be flexibly adjusted. To address the limitation 2), a gradientbased NAS method, RepNAS, is proposed to automatically search the optimal branch for each block without parameter retraining. Compared with previous gradientbased NAS methods[19, 24]
that only learn the importance in one edge, RepNAS learns the netwise importance of each branch. Moreover, RepNAS can be used under low GPU memory conditions by setting the branch number constraint. In each training iteration, the branches of low importance will be sequentially pruned until the memory constraint is met. The importance of branches is updated in the same round of backpropagation as gradients to neural parameters. As the training progresses, the importance of each branch is estimated accurately, meanwhile, the redundant branches hardly participate in the training. Once the training process finished, the optimized DDB structure is obtained with optimized network parameters.
To summarize, our main comtributions are as follow:

A Rep search space is proposed in this paper, which allows the searched model to preserve arbitrary branches in training and can be fused into one path in inference. To our best knowledge, it is the first time that Rep techniques can be used to NAS.

To fit the new search space, RepNAS is presented to automatically onestage search the optimal DBB. The searched model can be converted to a singlepath model and directly deployed without timeconsuming retraining.

Extensive experiments on models with various sizes demonstrate that the searched ODDB outperform both the humandesigned DDB and NAS models under similar inference time.
2 Related Work
2.1 Structural Reparameterization
Structural reparameterization techniques have been widely used to improve model performance by injecting several branches in training and fusing these branches in inference. RepVGG[8] simply inserted a
and a residual connection into a
, which makes the performance of VGGlike networks have a huge improvement. A concurrent work, DBB[6], summarized more structural reparameterization techniques and proposed a Diverse Branch Block which can be inserted into any convolutional network. Here, we list the existing techniques as followsa Conv for ConvBN. BN[14] can be fused into its preceding Conv parameters and for inference. The th fused channel parameters and can be formulated as
(1) 
where , and denotes the learned scaling factor and bias term of BN, respectively.
A Conv for branch Conv addition. Convs with different kernel size in different branches can be fused into one Conv(without nonlinear activation). The kernel size of each Conv should be padded to the maximum of them. And then, they can be merged into a single Conv by
(2) 
where denotes the branch number.
A Conv for sequential Convs. A sequence of Convs with  can be merged into one single Conv.
(3) 
where represents transpose. and denote Conv and Conv, respectively.
A Conv for average pooling. A average pooling can be viewed as a special Conv. Its kernel parameter is
(4) 
where is identity matrix.
However, the architecture of each block is taskspecific. For simple tasks or tinyscale datasets, too many branches lead to overfitting. Besides, the output of each branch needs to be saved in GPU memory for backward. Limited GPU memory prevents the structural Reparameterization techniques from the application.
2.2 Neural Achitecture Search
The purpose of neural network structure search is to automatically find the optimal network structure with reinforcement learning(RL)
[28, 29], evolutionary algorithm(EA)
[20, 10, 25], gradient[19, 24] methods. RLbased and EAbased methods need to evaluate each sampled network by retraining them on the dataset, which are timeconsuming. The gradientbased method can simultaneously train and search the optimal subnet by assigning a learnable weight to each candidate operation. However, the gradientbased approach causes incorrect ranking results. A subnet has top proxy accuracy while performing not as expected. Moreover, since gradientbased approaches need more memory for training, they cannot be applied to the largescale dataset. To address the above problem, some onestage NAS methods[24, 12, 9] are proposed to simultaneously optimize the architecture and parameters. Once the supernet training is complete, the topperforming subnet is also given without retraining.3 Reparameterization Neural Achitecture Search
In this section, we first present an overview of our proposed approach for searching optimal diverse branch blocks and discuss the difference with other NAS work. We then propose a new search space based on some Rep techniques mentioned in Eq. 14. Afterward, the RepNAS approach is presented to fit the proposed search space.
3.1 Overview
The goal of Rep techniques is to improve the training effect of a CNN by inserting various branches with different kernel size. The inserted branches can be linearly fused into the original convolutional branch after training, such that no extra computational complexity(FLOPs, latency, memory footprint or model size) is subjoined. However, training with various branches costs a large GPU memory consumption, and it is hard to optimize a network with too many branches. The core idea of the proposed method is to prune out some branches across different blocks in a differentiable way, which is shown in Figure 2. It has two essential steps:
(1) Given a CNN architecture(e.g., MobileNets, VGGs), we netwisely insert several linear operations into original convolutional operations as its branches. For each branch, a learnable architecture parameter that represents the importance is set. During the training, we optimize both architecture parameters and network parameters by discretely sampling branches, simultaneously. Once training finishes, we can obtain a pruned architecture with optimized network parameters.
(2) In inference, the rest of the branches can be directly fused into the original convolutional operations, such that the multibranch architecture can be converted to the singlepath architecture without a performance drop. No extra finetune is required in this step.
Compared with many NAS work, cumbersome architectures with various branches and skip connections are no longer an obstacle to application in RepNAS. In contrast to prior structural reparameterization work, the architectures of blocks in each layer can be automatically designed without any extra time consumption. The whole optimization is in one stage.
3.2 Rep Search Space Design
NAS methods are usually designed to search for the optimal subnetwork on DARTS space or MobileNetlike space. The former contains multibranch architecture, which makes the searched network difficult to apply becausfunne of the large inference time. The latter which refers to the expert experience in designing mobile networks includes efficient networks, however, multibranch architecture into no consideration. Many humandesigned networks[22, 13] demonstrate multibranch architecture can improve model performance by enhancing the feature representation from the various scale. To combine the advantages of multibranch and singlepath architecture, a more flexible search space is proposed based on some current structural reparameterization work. Shown in Figure 3, each block contains 7 branches(,, , , , and
). Different from previous search space, we release the heuristic constraints to offer higher flexibility to the final architecture, which means each block can preserve arbitrary branches and all block architectures are independent. It is worthy to note that the multibranch will be fused to a singlepath after searching, thus, have no impact on inference.
The new search space reaches approximately architectures. Compared with either micro() or macro search space(, the proposed search space has greater potential to offer better architectures and evaluate the effectiveness of NAS algorithms. However, such an enlarged search space brings challenges for preceding NAS approaches:1) incorrect architecture rattings[3]. 2) large memory cost because of multibranch[1].
3.3 Weight Sharing for Network Parameters
Many previous NAS methods share weights across architectures during supernet training while decoupling the weights of different branches at the same block. However, this strategy does not work in the proposed Rep search space because of a surging number of subnets. In each training iteration, only a few branches can be sampled which results in the weights being updated by limited times. Therefore, the performances of sampled subnets grow slowly. This limits the learning of architecture parameter .
Inspired by BigNAS[25] and slimmable networks[26], we also share network parameters across the same block. For any branch that needs a convolutional operation, the weights of this branch can be inherited from the convolutional operation in the main branch(see the right part of Figure 3). We represent its weights as
(5) 
where is the floor operation, and denotes the kernel size of the inherited branch and main branch, respectively. Equipped with weight sharing, the supernet can get faster convergence, meanwhile, the ranking of branches can be evaluated precisely. We discuss the detail in Ablation Study.
3.4 Searching for Rep Blocks
To overcome the incorrect architecture ratings, an elegant solution is the onestage differentiable NAS. The onestage differentiable NAS is expected to simultaneously optimize the architecture parameter and its network parameter by differentiable way, which can be formulated as the following optimization:
(6) 
where
denotes the loss function computed in a specified training dataset.
Many previous NAS works are designed to search on DARTs search space. With the heuristic rule, each edge only can preserve one branch. Therefore, in prior differentiable NAS work[19, 24]
, the parameters probability
of the th branch in the th block( branches) can be written as(7) 
Though the Eq.(6) can be optimized with gradient descent as most neural network training[19], it would suffer from the huge performance gap between supernet and its child network[1]. Instead, a continuous and differentialble reparameterization trick, [15], is used in NAS approaches[9, 24]. With random variable, Eq.(7) can be rewritten as
(8) 
where and is a uniform random variable.
can be approximated as a onehot vector if temperature
. This relaxation realizes the discretization of probability distribution. In multibranch search space proposed in
Rep Search Space Design, each block can preserve an arbitrary branches so that we cannot directly obtain probability as Eq.(7) or Eq.(8). To fit the new space, we can regard whether each branch is retained or not as a binary classification. Hence, the discretization probability of the th branch in the th block can be given as(9) 
(10) 
where and represent preserve and prune this branch, respectively. Different from Eq.(8), each branch in the new space is independent. Thus, the temperature can be set differently according to requirements. Furthermore, we can combination the Eq.(9) and Eq.(10) into a sigmoid mode
(11) 
where . We only need to optimize , instead of and , through grident descent. It grident can be given as
(12) 
where and denotes the output of the th block and the th branch, respectively. In each training iteration, we can firstly compute a threshold according to the global ranking. Subsequently, the branches whose ranking is below the will be pruned out and do not participate in current forward or backward. Thanks to the independence of each branch, we can easily control the activation of each branch through its temperature .
(13) 
where is a random variable. denotes the global ranking of the branch.
Dataset  Arch.  Epochs  Batch size  Init LR  Weight decay  Data augmentation 

CIFAR10  VGG16  600  128  0.1  same as [6]  
ImageNet  ODBBA0/A1/A2/B1/B2/ResNet18  150  256  0.1  same as [8]  
ImageNet  ODBBB3/ResNet101  240  256  0.1  same as [8]  
COCO  ResNet18/ODBBA0  140  114  5e4  0  same as 
COCO  ResNet101/ODBBB3  140  96  3.75e4  0  same as 
In implementation, we sort the importance of each branch by Eq.(11) and only keep the topk branches for forward. The importance of unpruned branch is convergence to by Eq.(13). Afterward, we simply multiply to the output of unpruned branch, such that
can be obtained by the chain rule. The whole algorithm is shown in Alg
1.Model  Parameters(M)  FLOPs(B)  Inference(s)  Search Space  Search+Retrain Cost(GPU Days)  Top1(%) 
ODBB(A1)  12.78  2.4  0.031/0.028  Rep(Ours)  24+0  75.24 
DARTS  4.9  0.59  0.067/0.178  DARTS  1+24  73.1 
SNAS  4.3  0.52  0.063/0.167  DARTS  1.5+24  72.7 
NASNetA  5.3  0.56  0.078/0.195  DARTS  1800+24  74.0 
AmoebaNetB  4.9  0.56  0.218  DARTS  3150+24  74.0 
ODBB(A2)  25.49  5.1  0.038/0.031  Rep(Ours)  24+0  76.86 
SPOS  3.5  0.32  0.061/0.053  ShuffleNet  24+24  74.0 
BigNASModelS  4.5  0.24  0.045/0.049  MobileNet  40+0  76.5 
OnceForAll  4.4  0.23  0.051/0.057  MobileNet  40+24  76.4 
ODBB(B3)  110.96  26.2  0.107/0.106  Rep(Ours)  50+0  80.97 
BigNASModelL  9.5  1.1  0.186/0.265  MobileNet  80+0  80.9 
4 Experiments
We first compare ODBB with baseline, random search, DBB[6] and ACB[4] on CIFAR10 for a quick sanity check. To further demonstrate the effectiveness of ODBB with various model size, experiments on a largescale dataset, ImageNet1k[17] are conducted. At last, the impact of weight sharing and the effect of the branch number constraints on model performance is given in the ablation study.
4.1 Quick Sanity Check on CIFAR10
We use VGG16[21] as the benchmark architecture. The convolutional operations in the benchmark architecture are replaced by ODBB, DBB and ACB, respectively. For a fair comparison, the data augmentation techniques and training hyperparameter setting are followed with DBB and ACB which can be given in Table 1. To optimize architecture parameter , simultaneously, we use Adam[16] optimizer with learning rate and betas. One of the indicators to evaluate the effectiveness of a NAS approach is whether the search result exceeds the result of the random search. To produce the random search result, 25 architectures are randomly sampled from the supernet. We train each of them for 100 epochs and pick up the best one for an entire 600epoch retraining. The comparison results are shown in Table 3. ODBB can improve VGG16 on CIFAR10 by 0.85% and surpass the DBB and ACB. The architecture generated by random search also can slightly improve the performance of the benchmark model but fall behind the ODBB by 0.77%, which demonstrates the effectiveness of our proposed NAS algorithm.
4.2 Performance improvements on ImageNet
To reveal the generalization ability of our method, we then search for a series of models on ImageNet1K which comprises 1.28M images for training and 50K for validation from 1000 classes. RepVggseries[21] are used as benchmark architectures. We replace each original RepVGG block by designed block in Rep Search Space Design. For a fair comparison, the total branch number of all ODBB models is limited to be equal to RepVGGs. The training strategies of each network are listed in Table 1. We also use Adam optimizer with learning rate and betas to optimize . We will discuss the impact of the branch number on performance in Ablation Study.
Arch.  Param.(M)  Bran.  Ori.(%)  ODBB(%) 

A0  8.3  66  72.41  72.96 
A1  12.78  66  74.46  75.24 
A2  25.49  66  76.48  76.86 
B1  14.33  84  78.37  78.61 
B2  80.31  84  78.78  79.10 
B3  110.96  84  80.52  80.97 
Results are summarized in Table 4. RepVGG is stacked with several multibranch blocks that contain a , a and a skip connection and also can be fused into a single path in inference. We search for more powerful architectures for blocks in RepVGG. For RepVGGA0A2 that has 22 layers, ODBB can achieve 0.55%, 0.38% and 0.24% better accuracy than original baselines, respectively. To our best knowledge, RepVGGB3 with ODBB refreshes the performance for plain models from 80.52% to 80.97%. Noting that ACB, DBB and RepVGG Block are special cases of our proposed search space, our proposed NAS really can search the optimal architecture beyond humandefined ones.
4.3 Comparison with Other NAS
We compare ODBBseries models with other models searched from DARTS and mobile search space on ImageNet1K. We verify the real inference time on two different hardware platforms, including GPU(NVIDIA Tesla V100) and embedded device(NVIDIA Xavier). Table 2 presents the results and shows that: 1) ODDBseries models searched from Rep search space consistently outperform other networks searched from other search space with lower inference time since ODDBseries networks are only stacked by 3x3 convolutions and have no branches. 2) RepNAS can directly search on ImageNet1k and combine the train and search, which brings high performance and low GPU days cost. We show the architectures of searched ODBBseries in the Appendix.
4.4 transferability
To verify the transferability of searched models on the other computer version task, object detection experiments are conducted on MS COCO dataset[18] with CenterNet framework[27]. The detailed implementation is following [27], ImageNet1k pretrained backbones with three upconvolutional networks are finetuned on COCO. The input resolution is . We use Adam to optimize the overall parameters. Specifically, the baseline backbone of CenterNet is ResNetseries and we replace it with our searched ODBBseries. All backbone networks are pretrained on ImageNet1K with train schedule in Table 1. After finetuning on COCO, ODBBseries will be fused into one path for fast inference. The results are shown in Table 5. Both in slight and heavy models, ODBBA0 and ODBBB3 surpass ResNet18 and ResNet101 by 3.3% and 2.7% AP with comparable inference speed, respectively, which demonstrates our searched models have outstanding performance in other computer version tasks.
4.5 Ablation Study
Backbone  COCO AP(%)  Inference(s) 

ResNet18  28.1  0.065 
ResNet101  34.6  0.141 
ODBB(A0)  31.4  0.048 
ODBB(B3)  37.3  0.148 
Training Under Branch Number Constraint. Training multibranch models requires linearly increased GPU memory to save the middle feature maps for forward and backward. How to memoryfriendly search and train multibranch model is significant for the application of structural reparameterization techniques. For simplicity, we reduce the branch number of the network to meet the various memory constraints by pushing temperature of lowrank branches to . To prove the efficiency and effectiveness over other humandesigned blocks, we also train DBB and ACB on benchmark RepVGGA0 by replacing the original RepVGG block, respectively. All experiments in this subsection are conducted on RepVGGA0 with the same training set.
The results shown in Figure 4 reveal that the performance of ODBB can surpass other blocks with fewer branches. For instance, ODBB only containing 50 branches has higher top1 Acc. over the original RepVGG block(66 branches), DBB(88 branches) and ACB(66 branches). Besides, we found that the ODBB with 75 branches is enough to obtain the same results as the ODBB with more branches.
Efficacy of Weight Sharing. We train and search ODBB(A0) with the following two methods: 1) The weights of the different branches in the same block are shared. 2) The weights of the different branches are independently updated. Figure 5 presents the comparisons on ImageNet1k with ODBB(A0). It is apparent that the ODBB(A0) updated with weight sharing gets faster convergence than the one that is updated independently. Besides, we sample the highperforming subnet in every five training epochs according to the ranking of the architecture parameter . Each sampled model is trained from scratch with the scheme in Table 1. Shown as Figure 5, the performance of subnet sampled from weight sharing supernet has more stable and higher accuracy in each period. This phenomenon illustrates the weightsharing supernet serves as a good indicator of ranking.
5 Limitations
The searched networks have the same channels and depth as RepVGG[8] networks which contain huge parameters. Therefore, less concerning the number of parameters prevents the deployment of our searched models from CPU devices. However, VGGlike models can be easily pruned by many existing fast channel pruning methods[5, 2, 11] for parameter reduction.
6 Conclusion
In this work, we first propose a new search space, Rep space, where each block is architectureindependent and can preserve arbitrary branches. Any subnet searched from Rep space has fast inference speed since it can be converted to a single path model in inference. To efficiently train the supernet, blockwisely weight sharing is used in supernet training. To fit the new search space, a new onestage NAS method is presented. The optimal diverse branch blocks can be obtained without retraining. Extensive experiments demonstrate that the proposed RepNAS can search various sizes of multibranch networks, named ODDBseries, which strongly outperform the previous NAS networks under a similar inference time. Moreover, Compared with other networks utilized Rep techniques, ODDBseries also achieve the stateoftheart top1 accuracy on ImageNet in various model size. In the feature work, we will consider adding the channel number of each block to search space, which can further boost the performance of RepNAS.
References

[1]
(2019)
Progressive differentiable architecture search: bridging the depth gap between search and evaluation.
In
Proceedings of the IEEE/CVF International Conference on Computer Vision
, pp. 1294–1303. Cited by: §3.2, §3.4. 
[2]
(2020)
Towards efficient model compression via learned global ranking.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp. 1518–1528. Cited by: §5.  [3] (2019) Fairnas: rethinking evaluation fairness of weight sharing neural architecture search. arXiv preprint arXiv:1907.01845. Cited by: §3.2.
 [4] (2019) Acnet: strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1911–1920. Cited by: §1, Figure 3, Table 3, §4.
 [5] (2020) Lossless cnn channel pruning via decoupling remembering and forgetting. arXiv preprint arXiv:2007.03260. Cited by: §1, §5.
 [6] (2021) Diverse branch block: building a convolution as an inceptionlike unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10886–10895. Cited by: §1, §1, §2.1, Figure 3, Table 1, Table 3, §4.
 [7] (2021) RepMLP: reparameterizing convolutions into fullyconnected layers for image recognition. arXiv preprint arXiv:2105.01883. Cited by: §1.
 [8] (2021) Repvgg: making vggstyle convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742. Cited by: §1, §2.1, Figure 3, Table 1, §5, Appendix: Model Architecture.
 [9] (2019) Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1761–1770. Cited by: §2.2, §3.4.
 [10] (2020) Single path oneshot neural architecture search with uniform sampling. In European Conference on Computer Vision, pp. 544–560. Cited by: §1, §2.2.
 [11] (2017) Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pp. 1389–1397. Cited by: §5.
 [12] (2020) Dsnas: direct neural architecture search without parameter retraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12084–12092. Cited by: §1, §1, §2.2.
 [13] (2014) Densenet: implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869. Cited by: §1, §3.2.

[14]
(2015)
Batch normalization: accelerating deep network training by reducing internal covariate shift.
In
International conference on machine learning
, pp. 448–456. Cited by: §2.1.  [15] (2016) Categorical reparameterization with gumbelsoftmax. Cited by: §3.4.
 [16] (2015) Adam: a method for stochastic optimization. In ICLR (Poster), Cited by: §4.1.

[17]
(2012)
Imagenet classification with deep convolutional neural networks
. Advances in neural information processing systems 25, pp. 1097–1105. Cited by: §4.  [18] (2014) Microsoft coco: common objects in context. In European conference on computer vision, pp. 740–755. Cited by: §4.4.
 [19] (2018) DARTS: differentiable architecture search. In International Conference on Learning Representations, Cited by: §1, §1, §2.2, §3.4.

[20]
(2019)
Regularized evolution for image classifier architecture search
. InProceedings of the aaai conference on artificial intelligence
, Vol. 33, pp. 4780–4789. Cited by: §1, §2.2.  [21] (2014) Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §4.1, §4.2.
 [22] (2017) Inceptionv4, inceptionresnet and the impact of residual connections on learning. In Thirtyfirst AAAI conference on artificial intelligence, Cited by: §1, §3.2.
 [23] (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, pp. 6105–6114. Cited by: §1.
 [24] (2018) SNAS: stochastic neural architecture search. In International Conference on Learning Representations, Cited by: §1, §1, §2.2, §3.4.
 [25] (2020) Bignas: scaling up neural architecture search with big singlestage models. In European Conference on Computer Vision, pp. 702–717. Cited by: §1, §1, §2.2, §3.3.
 [26] (2018) Slimmable neural networks. In International Conference on Learning Representations, Cited by: §3.3.
 [27] (2019) Objects as points. In arXiv preprint arXiv:1904.07850, Cited by: §4.4.
 [28] (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §2.2.
 [29] (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710. Cited by: §2.2.
Appendix: Model Architecture
The design of channels and depth of ODBBseries networks is following RepVGG[8], we show it in Table 6.
Name  Layers of each stage  Channels of each stage 

A0  1, 2, 4, 14, 1  64, 48, 96, 192, 1280 
A1  1, 2, 4, 14, 1  64, 64, 128, 256, 1280 
A2  1, 2, 4, 14, 1  64, 96, 192, 384, 1408 
B1  1, 4, 6, 16, 1  64, 128, 256, 512, 2048 
B3  1, 4, 6, 16, 1  64, 160, 320, 640, 2560 
B3  1, 4, 6, 16, 1  64, 192, 384, 768, 2560 
Comments
There are no comments yet.