1 Introduction
As a fundamental topic in computer vision, semantic image segmentation
[24, 44, 7, 8] aims at predicting pixellevel labels for an image. Leveraging the strong ability of CNNs, many works have achieved stateoftheart performance in popular semantic segmentation benchmarks [13, 15, 4]. To achieve higher accuracy, stateoftheart models become increasingly larger and deeper that require high computational resources and large memory overhead, which makes it difficult to deploy on resourceconstrained platforms, such as mobile devices, robotics, and selfdriving cars, etc.Recently, many researches have focused on designing and improving CNN models with light computation cost and high segmentation accuracy. For example, [1, 30] reduce the computation cost via the pruning algorithms, and [43] uses an image cascade network to incorporate multiresolution input. BiSeNet [41] and DFANet [21] utilize a lightweight backbone to speed up, and equip with a welldesigned feature fusion or aggregation module to remedy the accuracy drop. Normally, researchers acquire expertise in architecture design through enormous trial and error to carefully balance accuracy and resourceefficiency.
To design more effective segmentation network for embedded devices, some researchers have explored automatically neural architecture search (NAS) methods [23, 46, 27, 20, 32, 5, 39] and achieved excellent results. For example, AutoDeeplab [22] searches cell structure and the downsampling strategy together in the same round. CAS [42] searches an architecture with customized resource constraints and a multiscale module that has been widely used in semantic segmentation field [7, 44].
Particularly, CAS has achieved stateoftheart segmentation performance in lightweight community [43, 21, 41]. Like the general NAS methods, such as ENAS [32], DARTS [23] and SNAS [39], CAS also searches for a few types of cells (i.e. normal cell and reduction cell) and then repeatedly stacks the same cells through the network. This simplifies the search process, but also increases the difficulties to find a good tradeoff between performance and speed due to the limited cell diversity. For example, the cell is prone to learn a complicated structure to pursue high performance without any resource constraint. As shown in Fig.2(a), the whole network stacked with complicated cell will result in high latency. When a lowcomputation constraint is applied, the cell structure tends to be oversimplified as shown in Fig.2(b), which may not achieve satisfactory performance.
Different from the traditional search algorithms with simplified search space, in this paper, we propose a novel search mechanism with new search space, where a lightweight model with high performance can be fully explored through the welldesigned celllevel diversity and latencyoriented constraint. On one hand, to encourage the celllevel diversity, we make each cell structure independent, thus the cells with different computation cost can be flexibly stacked to form a lightweight network shown in Fig.2(c). For example, simple cells can be applied to the stage with high computation cost to achieve low latency, while complicated cells can be chosen in deep layers with low computation for high accuracy. On the other hand, we apply a realworld latencyoriented constraint into the search process, through which the searched model can achieve better tradeoff between the performance and latency.
However, simply endowing cells with independence in exploring its own structure enlarges the search space and makes the optimization more difficult, which causes accuracy degradation as shown in Table 3. To address this issue, we incorporate a Graph Convolution Network (GCN) [19] as the communication deliverer between cells. We name the method as Graphguided Architecture Search (GAS). Our idea is inspired by [26] that different cells can be treated as multiple agencies, whose achievement of social welfare may require communication between them. Specifically, in the forward process, starting from the first cell, the information of each cell is propagated to the next adjacent cell with a GCN. Our ablation study exhibits that this communication mechanism tends to guide cells to select lessparametric operations, thus achieving the balance between accuracy and latency.
We conduct extensive experiments on the standard Cityscapes [13] and CamVid [4] benchmarks. Compared to other stateoftheart methods, the proposed method achieves the new best performance while maintaining competitive latency. Particularly, our method locates in the topright area shown in Fig. 1, which achieves the stateoftheart tradeoff between speed and performance.
The main contributions can be summarized as follows:

We propose a novel search framework, for realtime semantic segmentation task, with a new search space in which a lightweight model with high performance can be effectively explored.

We integrate the graph convolution network seamlessly into neural architecture search as a communication mechanism between cells.

The lightweight segmentation network searched with GAS is customizable in real applications. Notably, GAS has achieved 73.3% mIoU on Cityscapes test dataset and 102FPS on NVIDIA Titan Xp with an image.
2 Related Work
Efficient Semantic Segmentation Methods
Fully convolutional neural networks
[24] is the pioneer work in semantic segmentation. Some remarkable network have achieved stateoftheart performance by introducing heavy network backbones (VGGNet [34], ResNet [17], DenseNet [18], Xception [12]). And some outstanding works introduce effective modules to capture multiscale context information [44, 8, 9]. In terms of efficient segmentation methods, there are two mainstreams: One is to employ relatively lighter backbone (e.g. ENet [30]) or introduce some efficient operations (depthwise dilated convolution). DFANet [21] utilizes a lightweight backbone to speed up and equips with a crosslevel feature aggregation module to remedy the accuracy drop. Another is multibranch based algorithm that consists of more than one path. For example, ICNet [43] proposed to use the multiscale image cascade to speed up the inference. BiSeNet [41] decouples the extraction for spatial and context information using two paths.Neural Architecture Search
Neural Architecture Search (NAS) aims at automatically searching network architectures. Most existing architecture search papers are based on either reinforcement learning
[45, 16][33, 11]. Though they can achieve satisfactory performance, they need thousands of GPU hours. To solve this timeconsuming problem, oneshot methods [2, 3] have been developed to greatly solve the timeconsuming problem by training an parent network from which each subnetwork can inherit the weight. They can be roughly divided into cellbased and layerbased methods according to the type of search space. For cellbased methods, ENAS [32] proposes a parameter sharing strategy among subnetworks, DARTS [23] relaxes the discrete architecture distribution as continuous deterministic weights, such that they could be optimized with gradient descent. SNAS [39] propose novel search gradients that train neural operation parameters and architecture distribution parameters in same round of backpropagation. What’s more, there are also some excellent works [10, 29] to reduce the difficult of optimization by decreasing gradually the size of search space. For layerbased methods, FBNet [37], MnasNet [35], ProxylessNAS [5] use a multiobjective search approach that optimizes both accuracy and realworld latency. In the field of semantic segmentation, [6] is the pioneer work by introducing metalearning techniques into the network search problem. AutoDeeplab [22] search cell structure and the downsampling strategy together in same round. More recently, CAS [42] search an architecture with customized resource constraints and a multiscale module that has been widely used in semantic segmentation field. And [28] overparameterise the architecture during the training via a set of auxiliary cells using reinforcement learning.Graph Convolution Network
Convolution neural networks on graphstructure data is an emerging topic in deep learning research. Kipf
[19] present a scalable approach for graphstructured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs, for better information transfer. After that, Graph Convolution Networks (GCNs) [19] is widely used in many domains, such as video classification [36] and action recognition [40]. In this paper, we apply the GCNs [19] to model the relationship of adjacent cells in network architecture search. As far as we know, we propose a novel mechanism which is the first that applies graphbased neural networks for the network architecture search task.3 Methods
As shown in Fig. 3, GAS searches for, with GCNGuided module (GGM), an optimal network constructed by a series of independent cells. In the search process, we take the latency into consideration to get a network with computational efficiency. This searching problem can be formulated as:
(1) 
where denotes the search space, and are the validation loss and the latency loss, respectively. Our goal is to search an optimal architecture that can achieves the best tradeoff between the performance and speed.
In this section, we will describe three main components in GAS: 1) Network Architecture Search; 2) GCNGuided Module; 3) LatencyOriented Optimization.
3.1 Network Architecture Search
As shown in Fig. 3 (a), the whole backbone takes one image as input which is first filtered with three convolutional layers followed by a series of independent cells. The ASPP [7] module is subsequently used to extract the multiscale context for the final prediction.
A cell is a directed acyclic graph (DAG) as shown in Fig. 4. Each cell consists of ordered nodes, denoted by , and each node represents the latent representation (e.g. feature map) in network. Each directed edge in this DAG represents an operation transformation (e.g. conv, pooling). Each cell has two input nodes, represented as , and , and output the concatenation of all intermediate nodes . In our work, we set =2. So for intermediate node , it has two input . For intermediate node , it has three input . The intermediate nodes can be calculated by:
(2) 
where is the final operation at edge (, ).
To search the final operation , we use the method described in SNAS [39]
, where the search space is represented with a set of onehot random variables from a fully factorizable joint distribution
. Concretely, each edge is associated with a onehot random variable which is multiplied as a mask to the all possible operations =(, , …, ) in this edge. We denote onehot random variable as = (, , …, ) where is the number of candidate operations. The intermediate nodes during search process in such way are:(3) 
To make differentiable, reparameterization trick [25] is used to relax the discrete architecture distribution to be continuous:
(4) 
where is the architecture parameters at the edge , and =
is a vector of Gumbel random variables,
is a uniform random variable and is used to control the temperature of softmax.For the set of candidate operations , we only use the following 7 kinds of operations to better balance the speed and performance: 3 3 separable conv, 3
3 max pooling, 3
3 conv, skip connection, zero operation, 3 3 dilated separable conv (dilation=2), and 3 3 dilated separable conv (dilation=4).3.2 GCNGuided Module
With cell independent to each other, the intercell relationship becomes every important for searching efficiently. We propose a novel GCNGuided Module (GGM) to naturally bridge the operation information between adjacent cells. The total network architecture of our GGM is shown in Fig. 3(b). Inspired by [36], the GGM represents the communication between adjacent cells as a graph and perform reasoning on the graph for information delivery. Specifically, we utilize the similarity relations of edges in adjacent cells to construct the graph where each node represents one edge in cells. In this way, the state changes for previous cell can be delivery to current cell by reasoning on this graph.
Let represents the architecture parameters matrix for the cell , and the dimension of is where and represents the number of edges and the number of candidate operations respectively. Same for cell , the architecture parameters for cell also is a dimension matrix. Given a edge in cell , we calculate the similarity between this edge and all other edges in cell . Therefore, the adjacency matrix of the graph between two adjacent cells and can be established by
(5) 
where we have = and = for two different transformations of the original matrixes, and parameters and are both dimensions weights which can be learned via back propagation. The result is an matrix.
Based on this adjacency matrix , we use Graph Convolution Networks (GCNs) [19] to perform reasoning on the graph, efficiently propagating information from cell to cell . The reasoning process includes the following three steps.
Firstly, to get the graph node feature representation, we apply the convolutional operation to the architecture parameters and then obtain its dimension node feature representation matrix in the embedding space:
(6) 
where represents the convolutional operation weight.
Secondly, with the graph node representations and the , we use the GCNs [19] to perform information propagation on the graph as shown in Equation 7
. A residual connection is added to each layer of GCN. The GCNs allow us to compute the response of a node based on its neighbors defined by the graph relations, so performing graph convolution is equal to performing message propagation on the graphs.
(7) 
where the denotes the GCNs weight with dimension , which can be learned via back propagation.
Finally, the output of each GCNs is still in dimensions, so we use another convolutional operation to map the representation from the embedding space to source space as the in Equation 8. So we then add the and the original in elementwise manner as the updated in Equation 9, where control the weight between and . Then the current cell has fused the parameter information of previous cell . We use the new as the new architecture parameter of cell . And the denotes the weight for .
(8) 
(9) 
Through the proposed welldesigned GGM that seamlessly integrates the graph convolution network and neural architecture search, which can bridge the operation information between adjacent cells.
3.3 LatencyOriented Optimization
Similar to many excellent NAS works [5, 37, 42, 35], we also take realworld latency for a network into consideration during the search process, which orients the search process toward the direction to find a optimal lightweight model. Specifically, we create a GPUlatency lookup table which records the inference latency of each candidate operation. During the search process, each candidate operation at edge (, ) will be assigned a cost given by the designed lookup table. In this way, the total latency for cell is accumulated as:
(10) 
where is the architecture parameter for operation at edge (, ) and is the number of candidate operations. Given a architecture
, the total latency cost is estimated as:
(11) 
where refers to the number of cells in architecture . The latency for each operation is a constant and thus total latency loss is differentiable with respect to the architecture parameters .
Different from the exponent coefficient latency loss [37]
, we briefly define the total loss function as follows:
(12) 
where denotes the crossentropy loss of architecture with parameter , denotes the overall latency of architecture , which is measured in microsecond, and the coefficient controls the balance between the accuracy and latency. During the search phrase, we directly optimize the architecture parameter and the weight in same round of backpropagation rather than using iterative optimization [37].
4 Experiments
In this section, we conduct extensive experiments to verify the effectiveness of our GAS. Firstly, we compare the network searched by our method with the stateoftheart works on two standard benchmarks. Secondly, we perform the ablation study for the GCNGuided Module and latency optimization settings and close with a insight about GCNGuided module.
4.1 Benchmark and Evaluation Metrics
Datasets In order to verify the effectiveness and robustness of our method, we evaluate our method on Cityscapes [13] and CamVid [4] datasets. The Cityscapes [13]
is a public released dataset for semantic urban scene understanding. It contains 5,000 high quality pixellevel fine annotated images (2975, 500, and 1525 for the training, validation, and testing sets respectively) with size 1024 x 2048 collected from 50 cities. The dense annotation contains 30 common classes and 19 of them are used in training and testing following
[13]. CamVid [4] is another public released dataset with object class semantic labels. It contains 701 images in total, in which 367 for training, 101 for validation and 233 for testing. The images have a resolution of 960 x 720 and 11 semantic categories.Evaluation Metrics For evaluation, we use three metrics, including mean of classwise intersection over uniou (mIOU), network forward time (Latency), and Frames Per Second (FPS).
4.2 Implementation Details
We conduct all experiments based on Pytorch 0.4
[31]. All experiments are on a workstation with Titan Xp GPU cards under CUDA 9.0, and the inference time in all experiments is also reported on Nvidia Titan Xp GPU.We first conduct architecture search using GAS on segmentation dataset and then obtain the target lightweight network architecture according to the optimized
. We then utilize the ImageNet
[14]dataset to pretrain the searched network from scratch. We finally finetune the network on the specific segmentation dataset for 200 epochs.
In search process, the architecture contains 16 cells and each cell has = 2 nodes. With the consideration of speed, the initial channel for network is 8. For the training hyperparameters, the minibatch size is set to 16. The architecture distribution parameters are optimized by Adam, with initial learning rate 0.001, = (0.5, 0.999) and weight decay 0.0001. The network parameters are optimized using SGD with momentum 0.9, weight decay 0.001, and cosine learning scheduler that decays learning rate from 0.025 to 0.001. For gumbel softmax, we set the initial temperature in equation 4 as 1.0, and gradually decrease to the minimum value of 0.03.
For finetuning details, we train the network with minibatch 8 and SGD optimizer with poly learning rate scheduler that decay learning rate from 0.01 to zero. Following [38], The online bootstrapping strategy has been applied to the training process. For data augmentation, we use random flip and random resize with scale between 0.5 and 2.0. Finally, we randomly crop the image into a fixed size for training.
For the GCNguided Module, we use one Graph Convolution Network (GCN) [19] between every adjacent cells, and each GCN contains one layer of graph convolutions. The kernels size of the parameters in graph convolutions operation is 64x64.
Method  InputSize  mIOU  Latency(ms)  FPS 

FCN8S  512x1024  65.3  227.23  4.4 
PSPNet  713x713  81.2  1288.0  0.78 
DeepLabV3  769x769  81.3  769.23  1.3 
SegNet  640x320  57.0  30.3  33 
ENet  640x320  58.3  12.7  78.4 
SQ  1024x2048  59.8  46.0  21.7 
ICNet  1024x2048  69.5  26.5  37.7 
BiSeNet  768x1536  68.4  9.52  105.8 
DFANet A  1024x1024  71.3  10.0  100.0 
DFANet A ^{1}^{1}1 1024x1024  71.3  11.48  87.1  
CAS  768x1536  70.5  9.25  108.0 
CAS  768x1536  72.3  9.25  108.0 
GAS  769x1537  71.6  9.80  102.0 
GAS A  769x1537  70.4  8.68  115.1 
GAS  769x1537  73.3  9.80  102.0 
4.3 Realtime Semantic Segmentation Results
In this part, we compare the model searched by GAS with other existing stateoftheart realtime segmentation models on semantic segmentation datasets. The inference time is calculated on one Nvidia Titan Xp GPU and the speed of other methods reported in the paper [42] are used for comparing. Moreover, the speed is measured again on the Titan Xp if the origin paper reports the speed on different GPU.
Results on Cityscapes. We evaluate the network searched by GAS on Cityscapes test sets. The validation set is added to train network before submitting to Cityscapes server. Following [41, 42], GAS takes as an input image with size 769x1537 that resize from origin image size 1024x2048. Overall, our GAS get the best performance among all methods while maintains the comparable speed with 102 FPS. With only fine data and without any evaluation tricks, our GAS yields 71.6% mIoU which is the stateoftheart tradeoff for lightweight semantic segmentation. The performance achieve 73.3% when coarse data is added into the training dataset. The full comparison results are shown in Table 1. Compared to BiSeNet and CAS which have a slight speed advantage, our GAS beat them along multiple performance points with 3.2% and 1.1% respectively. Compared to other methods such as SegNet, ENet, SQ and ICNet, our method achieves significant improvement in speed while get performance improvement over them about 14.6%, 13.3%, 11.8%, 2.1% respectively. GAS A is searched by the latency constraint 0.01.
Results on CamVid. We also conduct the whole GAS pipeline on CamVid dataset to further verify our method’s ability. Table 2 shows the comparison results with other methods. With input size 720x960, we achieve the 71.9% mIoU with 142 FPS which is also the stateoftheart tradeoff between accuracy and speed.
Method  mIOU  Latency(ms)  FPS 

SegNet  55.6  34.01  29.4 
ENet  51.3  16.33  61.2 
ICNet  67.1  28.98  34.5 
BiSeNet  65.6     
DFANet A  64.7  8.33  120 
CAS  71.2  5.92  169 
GAS  71.9  7.04  142.0 
4.4 Ablation Study
In this part, we detailedly verify the effect of each component in our framework, we perform the ablation study experiments for the GCNGuided Module and the latency loss. Furthermore, we give a insight about what role does the GCNGuided Module play in the search process.
Methods  mIOU 

a) Cell shared  63.0 
b) Cell independent  60.7 
c) Cell independent + FC  63.6 
d) Cell independent + GCN  66.3 
4.4.1 Effectiveness of the GCNGuided Module
We propose the GCNGuided Module (GGM) to build the connection between cells. To verify the advantage of the GGM, we conducted a series of experiments with different strategies: a) network stacked by shared cell; b) network stacked by independent cell; c) Based on b, using fully connected layer to infer the relationship between cells; d) Based on b, using GCNGuided Module to infer the relationship between cells. Experiment results are shown in Table 3. The performance reported here is the average mIoU over five repeated experiments on Cityscapes validation dataset during search phrase without latency loss. Overall, with only independent cell, the performance degrades due to the enlarge search space which make optimization more difficult. This reduction is mitigated through adding communication mechanism between cells by GCN. Specially, our GCNguided module can bring about 2.7 points performance improvement compare to the setting (c).
We illustrate the network structure searched by GAS in the Fig. 6. An interesting observation is that the operations selected by GAS with GGM have fewer parameters and less computational complexity than GAS without GGM, where more dilated or separated convolution kernels are preferred. This exhibits the emergence of concept of burden sharing in a group of cells when they know how much others are willing to contribute. It also explains why GAS with GGM on is less overfitted.
4.4.2 Effectiveness of the Latency Constraint
As mentioned earlier, GAS provides the ability to flexibly achieve a good tradeoff between the performance and speed with the latencyoriented optimization. We conduct a series of experiments with different loss weight in Equation 12. Fig. 5 shows the variation of mIoU and latency as changes. With smaller , we can obtain a model with higher accuracy, and viceversa. When the increases from 0.0005 to 0.005, the latency decreases rapidly and the performance is slowly falling. But when increases from 0.005 to 0.05, the performance drops quickly and the decline of latency is fairly limited. So in our experiments, we set as 0.005. We can clearly see that the latencyoriented optimization is effective for balancing the accuracy and latency.
4.4.3 Analysis of the GCNGuided Module
One concern is about what kind of role does GCN play in the search process. We suspect that its effectiveness is derived from the following two aspects: 1) In order to learn a lightweight network, we allow the cell structures not to share with each other to encourage structure diversity. Apparently, learning cell independently makes the search more difficult and does not guarantee better performance, thus the GCNGuided Module can be regraded as a regularization term to regularize the search process. 2) We have discussed that is a fully factorizable joint distribution in above section. As shown in Equation 4,
for current cell becomes a conditional probability if the architecture parameter
depends on the probabilityfor previous cell. In this case, the GCNGuided Module plays a role that model the condition in probability distribution
.5 Conclusion & Discussion
In this paper, a novel Graphguided architecture search (GAS) framework is proposed to tackle the realtime semantic segmentation task. Different to the existing NAS approaches that stacks the same searched cell into a whole network, GAS explores to search different cell architectures and adopts the graph convolution network to bridge the information connection among cells. In addition, a latencyoriented constraint is endowed into the search process for balancing the accuracy and speed. Extensive experiments have demonstrated that GAS has achieved much better performance than the stateoftheart realtime segmentation approaches.
In the future, we will extend the GAS to the following directions: 1) We will search networks directly for the segmentation and detection tasks without retraining. 2) We will explore some deeper research on how to effectively combine the NAS and the graph convolution network. 3) Exploring other approaches to apply latency constraint.
References
 [1] (2017) Segnet: a deep convolutional encoderdecoder architecture for image segmentation. IEEE trans. PAMI 39 (12), pp. 2481–2495. Cited by: §1.
 [2] (2018) Understanding and simplifying oneshot architecture search. In ICML, pp. 549–558. Cited by: §2.
 [3] (2017) SMASH: oneshot model architecture search through hypernetworks. arXiv:1708.05344. Cited by: §2.
 [4] (2008) Segmentation and recognition using structure from motion point clouds. In ECCV (1), pp. 44–57. Cited by: §1, §1, §4.1.
 [5] (2018) Proxylessnas: direct neural architecture search on target task and hardware. arXiv:1812.00332. Cited by: §1, §2, §3.3.
 [6] (2018) Searching for efficient multiscale architectures for dense image prediction. In NeurIPS, pp. 8713–8724. Cited by: §2.
 [7] (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE trans. PAMI 40 (4), pp. 834–848. Cited by: §1, §1, §3.1.
 [8] (2017) Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587. Cited by: §1, §2.
 [9] (2018) Encoderdecoder with atrous separable convolution for semantic image segmentation. In ECCV, pp. 833–851. Cited by: §2.
 [10] (2019) Progressive differentiable architecture search: bridging the depth gap between search and evaluation. arXiv:1904.12760. Cited by: §2.
 [11] (2018) Reinforced evolutionary neural architecture search. CoRR abs/1808.00193. External Links: Link Cited by: §2.
 [12] (2017) Xception: deep learning with depthwise separable convolutions. In CVPR, pp. 1800–1807. Cited by: §2.

[13]
(2016)
The cityscapes dataset for semantic urban scene understanding.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 3213–3223. Cited by: §1, §1, §4.1.  [14] (2009) Imagenet: a largescale hierarchical image database. In CVPR, pp. 248–255. Cited by: §4.2.
 [15] (2015) The pascal visual object classes challenge: a retrospective. IJCV 111 (1), pp. 98–136. Cited by: §1.
 [16] (2018) IRLAS: inverse reinforcement learning for architecture search. CoRR abs/1812.05285. Cited by: §2.
 [17] (2016) Deep residual learning for image recognition. In CVPR, pp. 770–778. Cited by: §2.
 [18] (2016) Densely connected convolutional networks. CVPR, pp. 1–9. Cited by: §2.
 [19] (2016) Semisupervised classification with graph convolutional networks. arXiv:1609.02907. Cited by: §1, §2, §3.2, §3.2, §4.2.
 [20] (2017) Dynamic evaluation of neural sequence models. arXiv:1709.07432. Cited by: §1.

[21]
(2019)
Dfanet: deep feature aggregation for realtime semantic segmentation
. In CVPR, pp. 9522–9531. Cited by: §1, §1, §2.  [22] (2019) Autodeeplab: hierarchical neural architecture search for semantic image segmentation. CoRR abs/1901.02985. Cited by: §1, §2.
 [23] (2019) DARTS: differentiable architecture search. In ICLR, Cited by: §1, §1, §2.
 [24] (2015) Fully convolutional networks for semantic segmentation. In CVPR, pp. 3431–3440. Cited by: §1, §2.

[25]
(2016)
The concrete distribution: a continuous relaxation of discrete random variables
. arXiv:1611.00712. Cited by: §3.1.  [26] (1988) The society of mind. Simon & Schuster. Cited by: §1.
 [27] (2017) Deeparchitect: automatically designing and training deep architectures. arXiv:1704.08792. Cited by: §1.
 [28] (2019) Fast neural architecture search of compact semantic segmentation models via auxiliary cells. In CVPR, pp. 9126–9135. Cited by: §2.
 [29] (2019) ASAP: architecture search, anneal and prune. arXiv:1904.04123. Cited by: §2.
 [30] (2016) Enet: a deep neural network architecture for realtime semantic segmentation. arXiv:1606.02147. Cited by: §1, §2.
 [31] (2017) Automatic differentiation in pytorch. Cited by: §4.2.
 [32] (2018) Efficient neural architecture search via parameter sharing. In ICML, pp. 4092–4101. Cited by: §1, §1, §2.

[33]
(2018)
Regularized evolution for image classifier architecture search
. CoRR abs/1802.01548. External Links: Link Cited by: §2.  [34] (2014) Very deep convolutional networks for largescale image recognition. CoRR abs/1409.1556. Cited by: §2.
 [35] (2019) Mnasnet: platformaware neural architecture search for mobile. In CVPR, pp. 2820–2828. Cited by: §2, §3.3.
 [36] (2018) Videos as spacetime region graphs. In ECCV, pp. 399–417. Cited by: §2, §3.2.
 [37] (2019) Fbnet: hardwareaware efficient convnet design via differentiable neural architecture search. In CVPR, pp. 10734–10742. Cited by: §2, §3.3, §3.3.
 [38] (2016) Highperformance semantic segmentation using very deep fully convolutional networks. CoRR abs/1604.04339. Cited by: §4.2.
 [39] (2019) SNAS: stochastic neural architecture search. In ICLR, Cited by: §1, §1, §2, §3.1.
 [40] (2018) Spatial temporal graph convolutional networks for skeletonbased action recognition. In AAAI, Cited by: §2.
 [41] (2018) BiSeNet: bilateral segmentation network for realtime semantic segmentation. In ECCV, pp. 334–349. Cited by: §1, §1, §2, §4.3.
 [42] (2019) Customizable architecture search for semantic segmentation. In CVPR, pp. 11641–11650. Cited by: §1, §2, §3.3, §4.3, §4.3.
 [43] (2018) Icnet for realtime semantic segmentation on highresolution images. In ECCV, pp. 405–420. Cited by: §1, §1, §2.
 [44] (2017) Pyramid scene parsing network. In CVPR, pp. 2881–2890. Cited by: §1, §1, §2.
 [45] (2017) Neural architecture search with reinforcement learning. In ICLR, Cited by: §2.
 [46] (2018) Learning transferable architectures for scalable image recognition. In CVPR, pp. 8697–8710. Cited by: §1.
Comments
There are no comments yet.