The UniNAS framework: combining modules in arbitrarily complex configurations with argument trees

12/03/2021
by   Kevin Alexander Laube, et al.
Universität Tübingen
0

Designing code to be simplistic yet to offer choice is a tightrope walk. Additional modules such as optimizers and data sets make a framework useful to a broader audience, but the added complexity quickly becomes a problem. Framework parameters may apply only to some modules but not others, be mutually exclusive or depend on each other, often in unclear ways. Even so, many frameworks are limited to a few specific use cases. This paper presents the underlying concept of UniNAS, a framework designed to incorporate a variety of Neural Architecture Search approaches. Since they differ in the number of optimizers and networks, hyper-parameter optimization, network designs, candidate operations, and more, a traditional approach can not solve the task. Instead, every module defines its own hyper-parameters and a local tree structure of module requirements. A configuration file specifies which modules are used, their used parameters, and which other modules they use in turn This concept of argument trees enables combining and reusing modules in complex configurations while avoiding many problems mentioned above. Argument trees can also be configured from a graphical user interface so that designing and changing experiments becomes possible without writing a single line of code. UniNAS is publicly available at https://github.com/cogsys-tuebingen/uninas

READ FULL TEXT VIEW PDF

Authors

page 7

page 15

06/09/2022

Neural Prompt Search

The size of vision models has grown exponentially over the last few year...
04/22/2019

Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks

Recently, deep learning has become a de facto standard in machine learni...
01/29/2021

Isolation mechanisms for high-speed packet-processing pipelines

Data-plane programmability is now mainstream, both in the form of progra...
11/08/2021

Triple-level Model Inferred Collaborative Network Architecture for Video Deraining

Video deraining is an important issue for outdoor vision systems and has...
01/16/2018

GitGraph - Architecture Search Space Creation through Frequent Computational Subgraph Mining

The dramatic success of deep neural networks across multiple application...
12/24/2020

Union-net: A deep neural network model adapted to small data sets

In real applications, generally small data sets can be obtained. At pres...
01/18/2017

NMODE --- Neuro-MODule Evolution

Modularisation, repetition, and symmetry are structural features shared ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Preface

This paper primarily serves as a reference for my Ph.D. dissertation, which I am currently writing. As a consequence, the framework is not under active development. The presented concepts, problems, and solutions may be interesting regardless, even for other problems than Neural Architecture Search (NAS). The framework’s name, UniNAS, is a wordplay of University and Unified NAS since the framework was intended to incorporate almost any architecture search approach.

2 Introduction and Related Work

An increasing supply and demand for automated machine learning causes the amount of published code to grow by the day. Although advantageous, the benefit of such is often impaired by many technical nitpicks. This section lists common code bases and some of their disadvantages.

2.1 Available NAS frameworks

The landscape of NAS codebases is severely fragmented, owing to the vast differences between various NAS methods and the deep-learning libraries used to implement them. Some of the best supported or most widely known ones are:

  • NASLib (Ruchte et al., 2020)

  • Microsoft NNI (Zhang, 2019) and Archai (Shah et al., 2020)

  • Huawei Noah Vega (Jiajin, 2020)

  • Google TuNAS (Bender et al., 2020) and PyGlove (Peng et al., 2020) (closed source)

Counterintuitively, the overwhelming majority of publicly available NAS code is not based on any such framework or service but simple and typical network training code. Such code is generally quick to implement but lacks exact comparability, scalability, and configuration power, which may be a secondary concern for many researchers. In addition, since the official code is often released late or never, and generally only in either TensorFlow 

(Abadi et al., 2016)

or PyTorch 

(Paszke et al., 2019), popular methods are sometimes re-implemented by some third-party repositories.

Further projects include the newly available and closed-source cloud services by, e.g., Google111https://cloud.google.com/automl/ and Microsoft222https://www.microsoft.com/en-us/research/project/automl/. Since they require very little user knowledge in addition to the training data, they are excellent for deep learning in industrial environments.

2.2 Common disadvantages of code bases

With so many frameworks available, why start another one? The development of UniNAS started in early 2020, before most of these frameworks arrived at their current feature availability or were even made public. In addition, the frameworks rarely provide current state-of-the-art methods even now and sometimes lack the flexibility to include them easily. Further problems that UniNAS aims to solve are detailed below:

Research code is rigid

The majority of published NAS code is very simplistic. While that is an advantage to extract important method-related details, the ability to reuse the available code in another context is severely impaired. Almost all details are hard-coded, such as:

  • the used gradient optimizer and learning rate schedule

  • the architecture search space, including candidate operations and network topology

  • the data set and its augmentations

  • weight initialization and regularization techniques

  • the used hardware device(s) for training

  • most hyper-parameters

This inflexibility is sometimes accompanied by the redundancy of several code pieces that differ slightly for different experiments or phases in NAS methods. Redundancy is a fine way to introduce subtle bugs or inconsistencies and also makes the code confusing to follow. Hard-coded details are also easy to forget, which is especially crucial in research where reproducibility depends strongly on seemingly unimportant details. Finally, if any of the hard-coded components is ever changed, such as the optimizer, configurations of previous experiments can become very misleading. Their details are generally not part of the documented configuration (since they are hard-coded), so earlier results no longer make sense and become misleading.

A configuration clutter

In contrast to such simplistic single-purpose code, frameworks usually offer a variety of optimizers, schedules, search spaces, and more to choose from. By configuring the related hyper-parameters, an optimizer can be trivially and safely exchanged for another. Since doing so is a conscious and intended choice, it is also documented in the configuration. In contrast, the replacement of hard-coded classes was not intended when the code was initially written. The disadvantage of this approach comes with the wealth of configurable hyper-parameters, in different ways:

Firstly, the parametrization is often cluttered. While implementing more classes (such as optimizers or schedules) adds flexibility, the list of available hyper-parameters becomes increasingly bloated and opaque. The wealth of parametrization is intimidating and impractical since it is often nontrivial to understand exactly which hyper-parameters are used and which are ineffective. As an example, the widely used PyTorch Image Models framework (Wightman, 2019) (the example was chosen due to the popularity of the framework, it is no worse than others in this respect) implements an intimidating mix of regularization and data augmentation settings that are partially exclusive.333https://github.com/rwightman/pytorch-image-models/blob/ba65dfe2c6681404f35a9409f802aba2a226b761/train.py, checked Dec. 1st 2021; see lines 177 and below.

Secondly, to reduce the clutter, parameters can be used by multiple mutually exclusive choices. In the case of the aforementioned PyTorch Image Models framework, one example would be the selection of gradient-descent optimizers. Sharing common parameters such as the learning rate and the momentum generally works well, but can be confusing since, once again, finding out which parameters affect which modules necessitates reading the code or documentation.

Thirdly, even with an intimidating wealth of configuration choices, not every option is covered. To simplify and reduce the clutter, many settings of lesser importance always use a sensible default value. If changing such a parameter becomes necessary, the framework configurations become more cluttered or changing the hard-coded default value again results in misleading configurations of previous experiments.

To summarize, the hyper-parametrization design of a framework can be a delicate decision, trying for them to be complete but not cluttered. While both extremes appear to be mutually exclusive, they can be successfully united with the underlying configuration approach of UniNAS: argument trees.

Nonetheless, it is great if code is available at all. Many methods are published without any code that enables verifying their training or search results, impairing their reproducibility. Additionally, even if code is overly simplistic or accompanied by cluttered configurations, reading it is often the best way to clarify a method’s exact workings and obtain detailed information about omitted hyper-parameter choices.

3 Argument trees

The core design philosophy of UniNAS is built on so-called argument trees. This concept solves the problems of Section 2.2 while also providing immense configuration flexibility. As its basis, we observe that any algorithm or code piece can be represented hierarchically. For example, the task to train a network requires the network itself and a training loop, which may use callbacks and logging functions.

Sections 3.1 and 3.2 briefly explain two requirements: strict modularity and a global register. As described in Section 3.3, this allows each module to define which other types of modules are needed. In the previous example, a training loop may use callbacks and logging functions. Sections 3.4 and 3.5 explain how a configuration file can fully detail these relationships and how the desired code class structure can be generated. Finally, Section 3.6 shows how a configuration file can be easily manipulated with a graphical user interface, allowing the user to create and change complex experiments without writing a single line of code.

3.1 Modularity

As practiced in most non-simplistic codebases, the core of the argument tree structure is strong modularity. The framework code is fragmented into different components with clearly defined purposes, such as training loops and datasets. Exchanging modules of the same type for one another is a simple issue, for example gradient-descent optimizers. If all implemented code classes of the same type inherit from one base class (e.g., AbstractOptimizer) that guarantees specific class methods for a stable interaction, they can be treated equally. In object-oriented programming, this design is termed polymorphism.

UniNAS extends typical PyTorch (Paszke et al., 2019)

classes with additional functionality. An example is image classification data sets, which ordinarily do not contain information about image sizes. Adding this specification makes it possible to use fake data easily and to precompute the tensor shapes in every layer throughout the neural network.

@Register.task(search=True) class SingleSearchTask(SingleTask): @classmethod def args_to_add(cls, index=None) -¿ [Argument]: return [ Argument(’is_test_run’, default=’False’, type=str, is_bool=True), Argument(’seed’, default=0, type=int),‘ Argument(’save_dir’, default=’path_tmp’, type=str), ] @classmethod def meta_args_to_add(cls) -¿ [MetaArgument]: methods = Register.methods.filter_match_all(search=True) return [ MetaArgument(’cls_device’, Register.devices_managers, num=1), MetaArgument(’cls_trainer’, Register.trainers, num=1), MetaArgument(’cls_method’, methods, num=1), ]
Figure 1: UniNAS code excerpt for a SingleSearchTask. The decorator function in Line 1 registers the class with type ”task” and additional information. The method in Line 5 returns all arguments for the task to be set in a config file. The method in Line 13 defines the local tree structure by stating how many modules of which types are needed. It is also possible to specify additional requirements, as done in Line 14.

3.2 A global register

A second requirement for argument trees is a global register for all modules. Its functions are:

  • Allow any module to register itself with additional information about its purpose. The example code in Figure 1 shows this in Line 1.

  • List all registered classes, including their type (task, model, optimizer, data set, and more) and their additional information (search, regression, and more).

  • Filter registered classes by types and matching information.

  • Given only the name of a registered module, return the class code located anywhere in the framework’s files.

As seen in the following Sections, this functionality is indispensable to UniNAS’ design. The only difficulties in building such a register is that the code should remain readable and that every module has to register itself when the framework is used. Both can be achieved by scanning through all code files whenever a new job starts, which takes less than five seconds. Python executes the decorators (see Figure 1, Line 1) by doing so, which handle registration in an easily readable fashion.

3.3 Tree-based dependency structures

Figure 2: Part of a visualized SingleSearchTask configuration, which describes the training of a one-shot super-network with a specified search method (omitted for clarity, the complete tree is visualized in Figure 10). The white colored tree nodes state the type and number of requested classes, the turquoise boxes the specific classes used. For example, the SingleSearchTask requires exactly one type of hardware device to be specified, but the SimpleTrainer accepts any number of callbacks or loggers.
1cls_task”: ”SingleSearchTask”, 2”{cls_task}.save_dir”: ”{path_tmp}/”, 3”{cls_task}.seed”: 0, 4”{cls_task}.is_test_run”: true, 5 6cls_device”: ”CudaDevicesManager”, 7”{cls_device}.num_devices”: 1, 8 9cls_trainer”: ”SimpleTrainer”, 10”{cls_trainer}.max_epochs”: 3, 11”{cls_trainer}.ema_decay”: 0.5, 12”{cls_trainer}.ema_device”: cpu”, 13 14cls_exp_loggers”: ”TensorBoardExpLogger”, 15”{cls_exp_loggers#0}.log_graph”: false, 16 17cls_callbacks”: ”CheckpointCallback”, 18”{cls_callbacks#0}.top_n”: 1, 19”{cls_callbacks#0}.key”: train/loss”, 20”{cls_callbacks#0}.minimize_key”: true,
Figure 3: Example content of the configuration text-file (JSON format) for the tree in Figure 3. The first line in each text block specifies the used class(es), the other lines their detailed settings. For example, the SimpleTrainer

is set to train for three epochs and track an exponential moving average of the network weights on the CPU.

A SingleSearchTask requires exactly one hardware device and exactly one training loop (named trainer, to train an over-complete super-network), which in turn may use any number of callbacks and logging mechanisms. Their relationship is visualized in Figure 3.

Argument trees are extremely flexible since they allow every hierarchical one-to-any relationship imaginable. Multiple optional callbacks can be rearranged in their order and configured in detail. Moreover, module definitions can be reused in other constellations, including their requirements. The ProfilingTask does not need a training loop to measure the runtime of different network topologies on a hardware device, reducing the argument tree in size. While not implemented, a MultiSearchTask could use several trainers in parallel on several devices.

The hierarchical requirements are made available using so-called MetaArguments, as seen in Line 16 of Figure 1. They specify the local structure of argument trees by stating which other modules are required. To do so, writing the required module type and their amount is sufficient. As seen in Line 14, filtering the modules is also possible to allow only a specific subset. This particular example defines the upper part of the tree visualized in Figure 3. The names of all MetaArguments start with ”cls_” which improves readability and is reflected in the visualized arguments tree (Figure 3, white-colored boxes).

3.4 Tree-based argument configurations

While it is possible to define such a dynamic structure, how can it be represented in a configuration file? Figure 3 presents an excerpt of the configuration that matches the tree in Figure 3. As stated in Lines 6 and 9 of the configuration, CudaDevicesManager and SimpleTrainer fill the roles for the requested modules of types ”device” and ”trainer”. Lines 14 and 17 list one class of the types ”logger” and ”callback” each, but could provide any number of comma-separated names. Also including the stated ”task” type in Line 1, the mentioned lines state strictly which code classes are used and, given the knowledge about their hierarchy, define the tree structure.

Additionally, every class has some arguments (hyper-parameters) that can be modified. SingleSearchTask defined three such arguments (Lines 7 to 9 in Figure 1) in the visualized example, which are represented in the configuration (Lines 2 to 4 in Figure 3). If the configuration is missing an argument, maybe to keep it short, its default value is used. Another noteworthy mechanism in Line 2 is that ”{cls_task}.save_dir” references whichever class is currently set as ”cls_task” (Line 1), without naming it explicitly. Such wildcard references simplify automated changes to configuration files since, independently of the used task class, overwriting ”{cls_task}.save_dir” is always an acceptable way to change the save directory. A less general but perhaps more readable notation is ”SingleSearchTask.save_dir”, which is also accepted here.

A very interesting property of such dynamic configuration files is that they contain only the hyper-parameters (arguments) of the used code classes. Adding any additional arguments will result in an error since the configuration-parsing mechanism, described in Section 3.5, is then unable to piece the information together. Even though UniNAS implements several different optimizer classes, any such configuration only contains the hyper-parameters of those used. Generated configuration files are always complete (contain all available arguments), sparse (contain only the available arguments), and never ambiguous.

A debatable design decision of the current configuration files, as seen in Figure 3, is that they do not explicitly encode any hierarchy levels. Since that information is already known from their class implementations, the flat representation was chosen primarily for readability. It is also beneficial when arguments are manipulated, either automatically or from the terminal when starting a task. The disadvantage is that the argument names for class types can only be used once (”cls_device”, ”cls_trainer”, and more); an unambiguous assignment is otherwise not possible. For example, since the SingleSearchTask already owns ”cls_device”, no other class that could be used in the same argument tree can use that particular name. While this limitation is not too significant, it can be mildly confusing at times.

Finally, how is it possible to create configuration files? Since the dynamic tree-based approach offers a wide variety of possibilities, only a tiny subset is valid. For example, providing two hardware devices violates the defined tree structure of a SingleSearchTask and results in a parsing failure. If that happens, the user is provided with details of which particular arguments are missing or unexpected. While the best way to create correct configurations is surely experience and familiarity with the code base, the same could be said about any framework. Since UniNAS knows about all registered classes, which other (possibly specified) classes they use, and all of their arguments (including defaults, types, help string, and more), an exhaustive list can be generated automatically. However, resulting in almost 1600 lines of text, this solution is not optimal either. The most convenient approach is presented in Section 3.6: Creating and manipulating argument trees with a graphical user interface.

Content of the configuration file
All modules in the code are registered
recursive parsing function to build a tree
function parse() E.g.
     
     
      first parse all arguments (hyper-parameters) of this tree node
     for () in  do E.g. (0, )
          
          
     end for
     
      then recursively parse all child classes, for each module type…
     for  in  do E.g.
          
          Assert The number of is in the specified limits
          
           for each module type, check all configured classes
          for () in  do E.g. (0, )
               
               parse()
               
          end for
     end for
     return
end function
parse() Recursively parse the tree, is the entry point
every argument in the configuration has been parsed
Algorithm 1 Pseudo-code for building the argument tree, best understood with Figures 3 and 3 For a consistent terminology of code classes and tree nodes: If the class uses a , then in that context, the child. Lines starting with # are comments.

3.5 Building the argument tree and code structure

The arguably most important function of a research code base is to run experiments. In order to do so, valid configuration files must be translated into their respective code structure. This comes with three major requirements:

  • Classes in the code that implement the desired functionality. As seen in Section 3.3 and Figure 3, each class also states the types, argument names and numbers of additionally requested classes for the local tree structure.

  • A configuration that describes which code classes are used and which values their parameters take. This is described in Section 3.4 and visualized in Figure 3.

  • To connect the configuration content to classes in the code, it is required to reference code modules by their names. As described in Section 3.2 this can be achieved with a global register.

Algorithm 1 realizes the first step of this process: parsing the hierarchical code structure and their arguments from the flat configuration file. The result is a tree of ArgumentTreeNodes, of which each refers to exactly one class in the code, is connected to all related tree nodes, and knows all relevant hyper-parameter values. While they do not yet have actual class instances, this final step is no longer difficult.

Figure 4: The graphical user interface (left) that can manipulate the configurations of argument trees (visualized right). Since many nodes are missing classes of some type (”cls_device”, …), their parts in the GUI are highlighted in red. The eight child nodes of DartsSearchMethod are omitted for visual clarity.

3.6 Creating and manipulating argument trees with a GUI

Manually writing a configuration file can be perplexing since one must keep track of tree specifications, argument names, available classes, and more. The graphical user interface (GUI) visualized in Figures 4 and 9 solves these problems to a large extent, by providing the following functionality:

  • Interactively add and remove nodes in the argument tree, thus also in the configuration and class structure. Highlight violations of the tree specification.

  • Setting the hyper-parameters of each node, using checkboxes (boolean), dropdown menus (choice from a selection), and text fields (other cases like strings or numbers) where appropriate.

  • Functions to save and load argument trees. Since it makes sense to separate the configurations for the training procedure and the network design to swap between different constellations easily, loading partial trees is also supported. Additional functions enable visualizing, resetting, and running the current argument tree.

  • A search function that highlights all matches since the size of some argument trees can make finding specific arguments tedious.

In order to do so, the GUI manipulates ArgumentTreeNodes (Section 3.5), which can be easily converted into configuration files and code. As long as the required classes (for example, the data set) are already implemented, the GUI enables creating and changing experiments without ever touching any code or configuration files. While not among the original intentions, this property may be especially interesting for non-programmers that want to solve their problems quickly.

Still, the current version of the GUI is a proof of concept. It favors functionality over design, written with the plain Python Tkinter GUI framework and based on little previous GUI programming experience. Nonetheless, since the GUI (frontend) and the functions manipulating the argument tree (backend) are separated, a continued development with different frontend frameworks is entirely possible. The perhaps most interesting would be a web service that runs experiments on a server, remotely configurable from any web browser.

3.7 Using external code

There is a variety of reasons why it makes sense to include external code into a framework. Most importantly, the code either solves a standing problem or provides the users with additional options. Unlike newly written code, many popular libraries are also thoroughly optimized, reviewed, and empirically validated.

External code is also a perfect match for a framework based on argument trees. As shown in Figure 5, external classes of interest can be thinly wrapped to ensure compatibility, register the module, and specify all hyper-parameters for the argument tree. The integration is seamless so that finding out whether a module is locally written or external requires an inspection of its code. On the other hand, if importing the AdaBelief (Zhuang et al., 2020) code fails, the module will not be registered and therefore not be available in the graphical user interface. UniNAS fails to parse configurations that require unregistered modules but informs the user which external sources can be installed to extend its functionality.

Due to this logistic simplicity, several external frameworks extend the core of UniNAS. Some of the most important ones are:

  • pymoo (Blank and Deb, 2020), a library for multi-objective optimization methods.

  • Scikit-learn (Pedregosa et al., 2011)

    , which implements many classical machine learning algorithms such as Support Vector Machines and Random Forests.

  • PyTorch Image Models (Wightman, 2019), which provides the code for several optimizers, network models, and data augmentation methods.

  • albumentations (Buslaev and Kalinin, 2018), a library for image augmentations.

from uninas.register import Register from uninas.training.optimizers.abstract import WrappedOptimizer try: from adabelief_pytorch import AdaBelief # if the import was successful, # register the wrapped optimizer @Register.optimizer() class AdaBeliefOptimizer(WrappedOptimizer): # wrap the original … except ImportError as e: # if the import failed, # inform the user that optional libraries are not installed Register.missing_import(e)
Figure 5: Excerpt of UniNAS wrapping the official AdaBelief optimizer code. The complete text has just 45 lines, half of which specify the optimizer parameters for the argument trees.

4 Dynamic network designs

As seen in the previous Sections, the unique design of UniNAS enables powerful customization of all components. In most cases, a significant portion of the architecture search configuration belongs to the network design. The FairNAS search example in Figure 10 contains 25 configured classes, of which 11 belong to the search network. While it would be easy to create a single configurable class for each network architecture of interest, that would ignore the advantages of argument trees. On the other hand, there are many technical difficulties with highly dynamic network topologies. Some of them are detailed below.

4.1 Decoupling components

In many published research codebases, network and architecture weights jointly exist in the network class. This design decision is disadvantageous for multiple reasons. Most importantly, changing the network or NAS method requires a lot of manual work. The reason is that different NAS methods need different amounts of architecture parameters, use them differently, and optimize them in different ways. For example:

  • [noitemsep,parsep=0pt,partopsep=0pt]

  • DARTS (Liu et al., 2019)

    requires one weight vector per architecture choice. They weigh all different paths, candidate operations, in a sum. Updating the weights is done with an additional optimizer (ADAM), using gradient descent.

  • MDENAS (Zheng et al., 2019) uses a similar vector for a weighted sample of a single candidate operation that is used in this particular forward pass. Global network performance feedback is used to increase or decrease the local weightings.

  • Single-Path One-Shot (Guo et al., 2020) does not use weights at all. Paths are always sampled uniformly randomly. The trained network is used as an accuracy prediction model and used by a hyper-parameter optimization method.

  • FairNAS (Chu et al., 2019b) extends Single-Path One-Shot to make sure that all candidate operations are used frequently and equally often. It thus needs to track which paths are currently available.

Figure 6: The network and architecture weights are decoupled. Top: The structure of a fully sequential super-network. Every layer (cell) uses the same set of candidate operations and weight strategy. Bottom left: One set of candidate operations that is used multiple times in the network. This particular experiment uses the NAS-Bench-201 candidate operations. Bottom right: A weight strategy that manages everything related to the used NAS method, such as creating the architecture weights or which candidates are used in each forward pass.

The same is also true for the set of candidate operations, which affect the sizes of the architecture weights. Once the definitions of the search space, the candidate operations, and the NAS method (including the architecture weights) are mixed, changing any part is tedious. Therefore, strictly separating them is the best long-term approach. Similar to other frameworks presented in Section 2.1, architectures defined in UniNAS do not use an explicit set of candidate architectures but allow a dynamic configuration. This is supported by a WeightStrategy interface, which handles all NAS-related operations such as creating and updating the architecture weights. The interaction between the architecture definition, the candidate operations, and the weight strategy is visualized in Figure 6.

The easy exchange of any component is not the only advantage of this design. Some NAS methods, such as DARTS, update network and architecture weights using different gradient descent optimizers. Correctly disentangling the weights is trivial if they are already organized in decoupled structures but hard otherwise. Another advantage is that standardizing functions to create and manage architecture weights makes it easy to present relevant information to the user, such as how many architecture weights exist, their sizes, and which are shared across different network cells. An example is presented in Figure 8.

”cell_3”: ”name”: ”SingleLayerCell”, ”kwargs”: ”name”: ”cell_3”, ”features_mult”: 1, ”features_fixed”: -1 , ”submodules”: ”op”: ”name”: ”MobileInvConvLayer”, ”kwargs”: ”kernel_size”: 3, ”kernel_size_in”: 1, ”kernel_size_out”: 1, ”stride”: 1, ”expansion”: 6.0, ”padding”: ”same”, ”dilation”: 1, ”bn_affine”: true, ”act_fun”: ”relu6”, ”act_inplace”: true, ”att_dict”: null, ”fused”: false ,

Figure 7: A high-level view on the MobileNet V2 architecture (Sandler et al., 2018) in the top left, and a schematic of the inverted bottleneck block in the bottom left. This design uses two 11 convolutions to change the channel count n by an expansion factor of 6, and a spatial 33 convolution in their middle. The text on the right-hand side represents the cell structure by referencing the modules by their names (”name”) and their keyworded arguments (”kwargs”).

4.2 Saving, loading, and finalizing networks

As mentioned before, argument trees enable a detailed configuration of every aspect of an experiment, including the network topology itself. As visualized in Figure 10, such network definitions can become almost arbitrarily complex. This becomes disadvantageous once models have to be saved or loaded or when super-networks are finalized into discrete architectures. Unlike TensorFlow (Abadi et al., 2016), the used PyTorch (Paszke et al., 2019) library saves only the network weights without execution graphs. External projects like ONNX (Bai et al., 2019) can be used to export limited graph information but not to rebuild networks using the same code classes and context.

The implemented solution is inspired by the official code444https://github.com/mit-han-lab/proxylessnas/tree/master/proxyless_nas of ProxylessNAS (Cai et al., 2019), where every code module defines two functions that enable exporting and importing the entire module state and context. As typical for hierarchical structures, the state of an outer module contains the states of all modules within. An example is visualized in Figure 7, where one cell in the famous MobileNet V2 architecture is represented as readable text. The global register can provide any class definition by name (see Section 3.2) so that an identical class structure can be created and parameterized accordingly.

The same approach that enables saving and loading arbitrary class compositions can also be used to change their structure. More specifically, an over-complete super-network containing all possible candidate operations can export only a specific configuration subset. The network recreated from this reduced configuration is the result of the architecture search. This is made possible since the weight strategy controls the use of all candidate operations, as visualized in Figure 6

. Similarly, when their configuration is exported, the weight strategy controls which candidates should be part of the finalized network architecture. In another use case, some modules behave differently in super-networks and finalized architectures. For example, Linear Transformers 

(Chu et al., 2019a) supplement skip connections with linear 11 convolutions in super-networks to stabilize the training with variable network depths. When the network topology is finalized, it suffices to simply export the configuration of a skip connection instead of their own.

Another practical way of rebuilding code structures is available through the argument tree configuration, which defines every detail of an experiment (see Section 3.4). Parsing the network design and loading the trained weights of a previous experiment requires no further user interaction than specifying its save directory. This specific way of recreating experiment environments is used extensively in Single-Path One-Shot

tasks. In the first step, a super-network is trained to completion. Afterward, when the super-network is used to make predictions for a hyper-parameter optimization method (such as Bayesian optimization or evolutionary algorithms), the entire environment of its training can be recreated. This includes the network design and the dataset, data augmentations, which parts were reserved for validation, regularization techniques, and more.

5 Discussion and Conclusions

We presented the underlying concepts of UniNAS, a PyTorch-based framework with the ambitious goal of unifying a variety of NAS algorithms in one codebase. Even though the use cases for this framework changed over time, mostly from DARTS-based to SPOS-based experiments, its underlying design approach made reusing old code possible at every step. However, several technical details could be changed or improved in hindsight. Most importantly, configuration files should reflect the hierarchy levels (see Section 3.4) for code simplicity and to avoid concerns about using module types multiple times. The current design favors readability, which is now a minor concern thanks to the graphical user interface. Other considered changes would improve the code readability but were not implemented due to a lack of necessity and time.

In summary, the design of UniNAS fulfills all original requirements. Modules can be arranged and combined in almost arbitrary constellations, giving the user an extremely flexible tool to design experiments. Furthermore, using the graphical user interface does not require writing even a single line of code. The resulting configuration files contain only the relevant information and do not suffer from a framework with many options. These features also enable an almost arbitrary network design, combined with any NAS optimization method and any set of candidate operations. Despite that, networks can still be saved, loaded, and changed in various ways. Although not covered here, several unit tests ensure that the essential framework components keep working as intended.

Finally, what is the advantage of using argument trees over writing code with the same results? Compared to configuration files, code is more powerful and versatile but will likely suffer from problems described in Section 2.1. Argument trees make any considerations about which parameters to expose unnecessary and can enforce the use of specific module types and subsets thereof. However, their strongest advantage is the visualization and manipulation of the entire experiment design with a graphical user interface. This aligns well with Automated Machine Learning (AutoML), which is also intended to make machine learning available to a broader audience.

References

  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng (2016) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp. 265–283. Note: Software available from tensorflow.org External Links: Link Cited by: §2.1, §4.2.
  • J. Bai, F. Lu, K. Zhang, et al. (2019) ONNX: Open Neural Network Exchange. GitHub. Note: https://github.com/onnx/onnx Cited by: §4.2.
  • G. Bender, H. Liu, B. Chen, G. Chu, S. Cheng, P. Kindermans, and Q. Le (2020) Can weight sharing outperform random architecture search? An investigation with TuNAS. GitHub. Note: https://github.com/google-research/google-research/tree/master/tunas Cited by: 4th item.
  • J. Blank and K. Deb (2020) pymoo: Multi-Objective Optimization in Python. IEEE Access 8 (), pp. 89497–89509. Cited by: 1st item.
  • A. Buslaev and A. A. Kalinin (2018) Albumentations: fast and flexible image augmentations. ArXiv e-prints. External Links: 1809.06839 Cited by: 4th item.
  • H. Cai, L. Zhu, and S. Han (2019) ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, External Links: Link Cited by: §4.2.
  • X. Chu, B. Zhang, J. Li, Q. Li, and R. Xu (2019a) ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search. CoRR abs/1908.06022. External Links: Link, 1908.06022 Cited by: §4.2.
  • X. Chu, B. Zhang, R. Xu, and J. Li (2019b) FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search. CoRR abs/1907.01845. External Links: Link, 1907.01845 Cited by: 4th item.
  • Z. Guo, X. Zhang, H. Mu, W. Heng, Z. Liu, Y. Wei, and J. Sun (2020) Single Path One-Shot Neural Architecture Search with Uniform Sampling. In

    European Conference on Computer Vision

    ,
    pp. 544–560. External Links: 1904.00420, Link Cited by: 3rd item.
  • Z. Jiajin (2020) Huawei Noah Vega. GitHub. Note: https://github.com/huawei-noah/vega/ Cited by: 3rd item.
  • H. Liu, K. Simonyan, and Y. Yang (2019) DARTS: Differentiable Architecture Search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, External Links: Link Cited by: 1st item.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. External Links: Link Cited by: §2.1, §3.1, §4.2.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. (2011) Scikit-learn: Machine learning in Python. Journal of machine learning research 12 (Oct), pp. 2825–2830. Cited by: 2nd item.
  • D. Peng, X. Dong, E. Real, M. Tan, Y. Lu, G. Bender, H. Liu, A. Kraft, C. Liang, and Q. Le (2020) PyGlove: symbolic programming for automated machine learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin (Eds.), External Links: Link Cited by: 4th item.
  • M. Ruchte, A. Zela, J. Siems, J. Grabocka, and F. Hutter (2020) NASLib: a modular and flexible neural architecture search library. GitHub. Note: https://github.com/automl/NASLib Cited by: 1st item.
  • M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 4510–4520. Cited by: Figure 7.
  • S. Shah, D. Dey, S. Majumdar, and D. Sagar (2020) Archai. GitHub. Note: https://github.com/microsoft/archai Cited by: 2nd item.
  • R. Wightman (2019) PyTorch image models. GitHub. Note: https://github.com/rwightman/pytorch-image-models External Links: Document Cited by: §2.2, 3rd item.
  • Q. Zhang (2019) NNI - Neural Network Intelligence. GitHub. Note: https://github.com/microsoft/nni Cited by: 2nd item.
  • X. Zheng, R. Ji, L. Tang, B. Zhang, J. Liu, and Q. Tian (2019) Multinomial Distribution Learning for Effective Neural Architecture Search. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 1304–1313. External Links: Link, Document Cited by: 2nd item.
  • J. Zhuang, T. Tang, Y. Ding, S. Tatikonda, N. Dvornek, X. Papademetris, and J. Duncan (2020) AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients. Conference on Neural Information Processing Systems. Cited by: §3.7.

Appendix A Additional resources

>---------------------------------------------- Args ----------------------------------------------<
 > cls_task                                  SingleSearchTask                                task
 > cls_device                                CudaDevicesManager                              device manager
 > cls_trainer                               SimpleTrainer                                   trainer
 > cls_method                                UniformRandomMethod                             method
 > cls_benchmark                                                                             immediately look up the search result in this benchmark set (optional)
 > cls_callbacks                             CheckpointCallback                              training callbacks
 > cls_clones                                                                                training clones
 > cls_exp_loggers                           TensorBoardExpLogger                            experiment logger
 > cls_data                                  Cifar10Data                                     data set
 > cls_network                               SearchUninasNetwork                             network
 > cls_criterion                             CrossEntropyCriterion                           criterion
 > cls_metrics                               AccuracyMetric                                  training metric
 > cls_initializers                                                                          weight initializer
 > cls_regularizers                          DropOutRegularizer                              regularizer
 > cls_optimizers                            SGDOptimizer                                    optimizer
 > cls_schedulers                            CosineScheduler                                 scheduler
 > cls_augmentations                         DartsCifarAug                                   data augmentation
 > cls_network_body                          StackedCellsNetworkBody                         network
 > cls_network_stem                          ConvStem                                        network stem
 > cls_network_heads                         Bench201Head                                    network heads
 > cls_network_cells                         Bench201CNNSearchCell, Bench201ReductionCell    network cells
 > cls_network_cells_primitives              Bench201Primitives, Bench201Primitives          network cells primitives
 > SingleSearchTask.is_test_run              True                                            test runs stop epochs early
 > SingleSearchTask.seed                     0                                               random seed for the experiment
 > SingleSearchTask.is_deterministic         False                                           use deterministic operations
 > SingleSearchTask.note                     s1 SPOS-like training                           just to take notes
 > SingleSearchTask.save_dir                 /tmp/demo/icw/train_supernet/                   where to save
 > SingleSearchTask.save_del_old             True                                            wipe the save dir before starting
 > CudaDevicesManager.num_devices            1                                               number of available devices
 > CudaDevicesManager.use_cudnn              True                                            try using cudnn
 > CudaDevicesManager.use_cudnn_benchmark    True                                            use cudnn benchmark
 > SimpleTrainer.max_epochs                  10                                              max training epochs, affects schedulers + regularizers
 > SimpleTrainer.stop_epoch                  -1                                              stop after training n epochs anyway, if > 0
 > SimpleTrainer.log_fs                      False                                           log file system usage
 > SimpleTrainer.log_ram                     False                                           log RAM usage
 > SimpleTrainer.log_device                  True                                            log device usage
 > SimpleTrainer.eval_last                   10                                              run eval for the last n epochs, always if <0
 > SimpleTrainer.test_last                   10                                              run test for the last n epochs, always if <0
 > SimpleTrainer.accumulate_batches          1                                               accumulate gradients over n batches before stepping updating. Does not change the learning rate, may cause issues when there are multiple alternating optimizers
...
 > StackedCellsNetworkBody.cell_order        n, n, r, n, n, r, n, n                          arrangement of cells
 > ConvStem.features                         16                                              num output features of this stem
...

>----------------------------------------- setting up... ------------------------------------------<
Data Set: splitting the training set, will use 5000 data points as validation set
Building StackedCellsNetworkBody:
    cell index          name    class              input shapes               output shapes          #params
                        -       ConvStem           Shape(3, 32, 32)           [Shape(16, 32, 32)]    464
    0                   n       Bench201CNNCell    [Shape(16, 32, 32)]        [Shape(16, 32, 32)]    18160
    1                   n       Bench201CNNCell    [Shape(16, 32, 32)]        [Shape(16, 32, 32)]    18160
    2                   r       SingleLayerCell    [Shape(16, 32, 32)]        [Shape(32, 16, 16)]    14464
    3                   n       Bench201CNNCell    [Shape(32, 16, 16)]        [Shape(32, 16, 16)]    67040
    4                   n       Bench201CNNCell    [Shape(32, 16, 16)]        [Shape(32, 16, 16)]    67040
    5                   r       SingleLayerCell    [Shape(32, 16, 16)]        [Shape(64, 8, 8)]      57600
    6                   n       Bench201CNNCell    [Shape(64, 8, 8)]          [Shape(64, 8, 8)]      256960
    7                   n       Bench201CNNCell    [Shape(64, 8, 8)]          [Shape(64, 8, 8)]      256960
                        -       Bench201Head       Shape(64, 8, 8)            Shape(10)              778
    complete network                               Shape(3, 32, 32)           [Shape(10)]            757626
Network built, it has 757626 parameters!
Using device: CudaDeviceMover([0])
Continuously logging (devices=CudaDeviceMover([0]), RAM=False, file_system=False) each 5s
...

>---------------------------------------- Weight strategy -----------------------------------------<
RandomChoiceStrategy("default", 6 architecture weights)
Weights:
   name                num choices    used
 > n/block-0/1/op-0    5              6x
 > n/block-1/2/op-0    5              6x
 > n/block-1/2/op-1    5              6x
 > n/block-2/3/op-0    5              6x
 > n/block-2/3/op-1    5              6x
 > n/block-2/3/op-2    5              6x
Figure 8: Excerpts of UniNAS’ text output. Top: The names, values, and help text of all (meta-) arguments. The effect of the last two can be observed in the network structure. Center: Since the network code is well-defined, it is possible to generate an overview of layers, inputs, outputs, and parameters. Bottom: The weight strategy can present the interesting information about the used architecture weights. There are five candidates in the chosen operation set (Bench201Primitives), and six cells ”n” with shared architecture.
Figure 9: Additional images for the graphical user interface (GUI). Left: Hovering the mouse cursor over any name brings up a tooltip, describing the comment in the code and in which file it is implemented. Pressing the Plus and Minus dropdown buttons on the right side enables adding and removing any appropriate classes in the tree structure. Right: By adding a search text (top), matches are highlighted in blue. The text ”as” can be present in argument names (”cls_task”, ”mask_indices”), module names (”SingleSearchTask”), or argument values (”save_dir” has ”…/uninas/…”).
Figure 10: The full argument tree to train a FairNAS-like super-network, extending Figure 3

. The model is trained on ImageNet using the CrossEntropy loss, SGD and a cosine learning rate schedule. The tracked accuracy is used for checkpointing, and dropout is enabled. The different cell definition model has six stages with defined numbers of channels (32, 40, 80, 96, 192, 320), the last cell definition retains the current input and output sizes.

@Register.network_mixed_op() class MixedOp(SumParallelModules): def __init__(self, submodules: list, name, strategy_name): # store all arguments, thus including them in the config super().__init__(submodules) self._add_to_kwargs(name=name, strategy_name=strategy_name) # create the needed architecture weights self.sm = StrategyManager() # singleton class self.ws = self.sm.make_weight(strategy_name, name, submodules)

def forward(self, x: torch.Tensor) -¿ torch.Tensor: # let the weight strategy decide how to forward inputs return self.ws.combine(self.name, x, self.submodules)

def config(self, finalize=True, **_) -¿ dict: # describe this module, so that it can be rebuilt later if finalize: # only a subset of the candidates are requested # ask the weight strategy which candidates are best indices = self.ws.get_finalized_indices(self.name) modules = [self.submodules[i] for i in indices] if len(modules) == 1: return modules[0].config(finalize, **_) return SumParallelModules(modules).config(finalize, **_) else: # the entire super-network is requested return super().config(finalize=finalize, **_) @classmethod def from_config(cls, **kwargs): # the rebuilding of owned code sub-modules is omitted # the global register is used to create Modules by name submodules_= … submodule_lists_= … submodule_dicts_= … # rebuild this module with the exact same arguments as before return cls(**submodules_, **submodule_lists_, **submodule_dicts_, **kwargs)
Figure 11: Excerpt of the UniNAS MixedOp code, an operation that manages multiple candidate operations. They are stored (Lines 5 and 6) and used in a forward pass (Line 14). The methods starting in Lines 16 and 31 define how this MixedOp module is exported as a JSON description and later rebuilt from such. The from_config function belongs to a super-class that every UniNAS network module inherits from and is not required to be implemented again in any new class. It is only displayed for completeness.