CHOPT : Automated Hyperparameter Optimization Framework for Cloud-Based Machine Learning Platforms

10/08/2018 ∙ by Jinwoong Kim, et al. ∙ NAVER Corp. 0

Many hyperparameter optimization (HyperOpt) methods assume restricted computing resources and mainly focus on enhancing performance. Here we propose a novel cloud-based HyperOpt (CHOPT) framework which can efficiently utilize shared computing resources while supporting various HyperOpt algorithms. We incorporate convenient web-based user interfaces, visualization, and analysis tools, enabling users to easily control optimization procedures and build up valuable insights with an iterative analysis procedure. Furthermore, our framework can be incorporated with any cloud platform, thus complementarily increasing the efficiency of conventional deep learning frameworks. We demonstrate applications of CHOPT with tasks such as image recognition and question-answering, showing that our framework can find hyperparameter configurations competitive with previous work. We also show CHOPT is capable of providing interesting observations through its analysing tools

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 introduction

Deep neural networks (DNNs) have become an essential method for solving difficult tasks in computer vision, signal processing, and natural language processing 

He et al. (2016); Choi et al. (2018); Han et al. (2017); Van Den Oord et al. (2016); Seo et al. (2016); Vaswani et al. (2017). As the capabilities of deep learning have expanded with more modular architectures and advanced optimization methods, the number of hyperparameters has increased in general. This increase of hyperparameter sizes makes it more difficult for a researcher to optimize a model, wasting a lot of human resources and potentially leading unfair comparisons. This reinforces the importance of efficient automated hyperparameter tuning methods and interfaces.

To address this problem, several hyperparameter optimization (HyperOpt) methods have been proposed Jaderberg et al. (2017); Falkner et al. (2018); Li et al. (2017). These methods have many advantages such as strong final performance, parallelism, early stopping which significantly improve performance in terms of computing resource efficiency and optimization time. However, for using the previous methods, users must implement it in their own system or code, and even if the source code is provided, considerable time and efforts are required for implementation.

Recently, there have been proposed HyperOpt frameworks to eliminate this integration cost  Liaw et al. (2018); Tsirigotis et al. (2018); Golovin et al. (2017). These frameworks host methods of optimizing hyperparameters and provides a visualization tool for analyzing the results of a trained model. However, the proposed frameworks do not consider computing resource management. Visualization tool plots only one type for whole parameter information but not supports interactive function to analyze properties and trends between user-specified hyperparameters.

In this work, we propose an efficient HyperOpt framework called Cloud-based Hyperparamter OPTimziation (CHOPT). CHOPT efficiently improves the resource usages in flexible computing environments based on Stop-and-Go approach, with supporting state-of-the art hyperparameter optimization algorithms. Our CHOPT framework also provides web-based analytic visual tool which users can monitor the current tuning progress of models or compare conducted results with other ones. Moreover, user can fine tune hyperparameters through visualization tool.

The main contributions of this work are as follows

  • We develop an automated HyperOpt framework that hosts hyperparamter optimization methods. We have shown that our framework can find competitive hyperparameters compared to results reported in previous papers.

  • We propose Stop-and-Go to address the early stopping problem and maximize utilization of shared-cluster resources, which we demonstrate in our evaluation.

  • We introduce an implemented analytic visual tool and show a step by step use case of hyperparameter fine tuning with a real world example.

2 related work

2.1 Hyperparameter Optimization

AutoML is a general concept which covers diverse techniques for automated model learning including automatic data preprocessing, architecture search, and model selection. HyperOpt is one of core components of AutoML because hyperparameter tuning has large influence on performance for a fixed model structure even, in particular in neural network-based models. We mainly focuses on automated HyperOpt throughout this paper.

To this end, there have been proposed several hyperparameter optimization algorithms  Jaderberg et al. (2017); Falkner et al. (2018); Li et al. (2017). Population-Based Training (PBT) Jaderberg et al. (2017) is an asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. In practice, Hyperband Li et al. (2017) works very well and typically outperforms random search and Bayesian optimization methods operating on the full function evaluation budget quite easily for small to medium total budgets. However, its convergence to the global optimum is limited by its reliance on randomly-drawn configurations, and with large budgets its advantage over random search typically diminishes. BOHB Falkner et al. (2018) is a combination of Hyperband and Bayesian Optimization. Hyperband already satisfies most of the desiderata (in particular, strong anytime performance, scalability, robustness and flexibility), and with BOHB it also satisfies the desideratum of strong final performance.

Although these hyperparameter optimization methods outperform traditional methods in terms of resource utilization and optimization time, it requires high cost to apply on a new environment or code. Moreover, it is hard to know which solution performs better than others before applying all methods to the model. For this reason, there have been some efforts to solve these problems through systematic improvements.

2.2 Hyperparameter Optimization Framework

Tune Liaw et al. (2018) is a scalable hyperparameter optimization framework for deep learning that provides many hyperparameter optimization methods. However, Tune requires user code modification to access intermediate training results while our framework does not require any user code modification. Also, Tune provides visual tool via web interface, but user cannot interfact with it for fine tune, analyzing, or changing hyperparameter sets. Oríon Tsirigotis et al. (2018) is another hyperparameter optimization tool which mainly focuses on version control for experiments. It is designed to work with any language or framework since it has a very simple configuration interface, however, it does not really manage computing resources such as GPUs to improve performance at optimization time and resource efficiency. Google Vizier Golovin et al. (2017) is a Google-interval service. Although it provides many hyperparameter optimization, parallel execution, early stopping and many other features, it is closed-source project so it is hard to apply this system to user’s code or system.

2.3 Nsml

NSML Sung et al. (2017)

is a platform designed for helping machine learning developers and researchers. Although many machine learning libraries such as TensorFlow 

Abadi et al. (2016)

and PyTorch 

Paszke et al. (2017) make many researcher’s life easy by simplifying model implementation, they still suffer from a non-trivial amount of manual tasks such as computing resource allocation, training model progress monitoring, and comparison of models with different hyperparameter settings.

In order to handles all these tasks so that the researchers can focus on models, NSML includes automatic GPU allocation, learning status visualization, and handling model parameter snapshots as well as comparison of performance metrics between models via a leaderboard.

We chose NSML as a cloud machine learning platform for implementing and evaluating the proposed CHOPT due to the following properties: 1) since NSML has well-designed interfaces including Command Line Interface (CLI) and web, so that it is easy to assign the hyperparameters to model and get the result from the model. 2) NSML enables us to easily manage cluster computing resources and effectively monitor the progress of model training.

3 CHOPT: Cloud-based Hyperparameter OPTimizaion

3.1 Design goals and Requirements

The main goals of designing our framework are as follows.

Efficient resource management. We need our framework to manage shared resources efficiently so that it balances throughout CHOPT users and non-CHOPT users while maximizing resource utilization.

Minimum configuration and code modification. By hosting hyperparameter optimization methods in our system, users should be able to easily tune their models without implementation overheads. In addition, configurations should be simple yet flexible.

Web-based analytic visual tool. As hyperparameter optimization requires many training sessions, it is essential to have powerful visualization tools to analyze from hundreds of training results.

3.2 System Architecture

Figure 1: System Architecture of CHOPT

In this section, we would like to introduce our hyperparameter optimization system step-by-step, where Figure 1 is the architecture of the proposed CHOPT framework.

Users can run CHOPT sessions through both a command line interface (CLI) or Web interface. CLI is light-weight, therefore suitable for simple tasks such as running and stopping sessions. The web interface is relatively heavy but provides many features, making it more suitable for analyzing and monitoring training results. To run a CHOPT session, the user needs to submit codes compatible with NSML and a configuration file containing details on the tuning method, hyperparameter sets, and more. While holding the submitted codes, a CHOPT session is initilized with the configuration file and gets inserted in a Queue, which stores CHOPT sessions before running. When a CHOPT session manager (Agent) is available, a CHOPT session obtained from the Queue is ran by the Agent.

3.2.1 Agent

An Agent is a module responsible for running CHOPT sessions as well as monitoring the progress of an automl session so that users can check and analyze corresponding NSML training sessions with given interfaces. Note that NSML session is a single training model.

To manage and maximize computing resource CHOPT sessions have three session pools, live pool, stop pool, and dead pool. All running NSML sessions are in the live pool, which have resource limits set by the configuration file. With NSML session in live pool, agent periodically compares the performance of NSML sessions and tuning NSML sessions according to the configuration file. If NSML session is exited, it will be moved to stop pool or dead pool. NSML session in stop pool can be restarted while NSML session in dead pool is completely removed from our system since automl systems commonly creates models a lot and it often takes up too much system storage space. Thus, user can specify the stop ratio, how much of exited NSML session goes to stop pool and then NSML session in stop pool can be resumed when system becomes under-utilized or recovered by user.

3.2.2 Master Agent

Master agent assigns a session to one of agents according to the configuration file. Master agent is elected from one of agents like zookeeper’s leader election Hunt et al. (2010). If master agent falls, any agent can be the next master agent. Whenever user submits CHOPT session into the queue, One of the most important roles of the master agent is shifting available computing resources such as GPUs between CHOPT users and Non-CHOPT users. Otherwise, users will face serious computing resource imbalances. We describe the details of this in the next Section 3.3.

3.3 Stop-and-Go

Automated optimization methods and frameworks often take a lot of computing resources because many models are trained in parallel. If not carefully managed, this leads to crucial load imbalance across users in shared cluster environments. To avoid this problem, our framework controls available computing resources such as GPUs among CHOPT users and non-CHOPT users with fairness while maximizing the utilization of computing resources. Here, we introduce a new feature called Stop-and-Go. The key idea of this feature is that the master agent controls available resources among NSML and CHOPT sessions according to the cluster’s condition. By doing so, we can maximize the utilization of shared cluster environment as well as mitigate the problem of early stopping in CHOPT environments.

3.3.1 Efficient Resource Management

Whenever a resource cluster is under-utilized, the master agent assigns more resources (GPUs) to CHOPT sessions so that they can quickly finish hyperparameter optimization. On the other hand, if cluster is over-utilized, the master agent takes GPUs from CHOPT sessions so that other non-CHOPT users can train their models on the shared-cluster environment. This feature enables to maximimum utilization of computing resources and efficiently train models.

3.3.2 Resolving Early Stopping Problem

Many hyperparameter optimization methods and frameworks have a feature called, Early Stopping Liaw et al. (2018); Golovin et al. (2017); Jaderberg et al. (2017); Li et al. (2017) which terminates unpromising model training earlier rather than keeps them training. This feature can significantly save computing resources and also enables researchers to train more models during the same time.

However, early stopping can be harmful since models with high performance are not guaranteed to train quickly.Figure 2

presents a history of model search by our framework with step size of 7 epochs. Here, we refer step size as the checking interval for early stopping. After a few steps, CHOPT with early stopping only gets to search a space with shallow depth because models with shallow depth perform better in the early stage of training than others. This may cause a premature convergence of CHOPT, ending up with poor models. More importantly, it is difficult to avoid because setting a small number of epochs as a step size would be the first option to choose as a beginner of CHOPT.

The proposed Stop-and-Go method can help users avoid the pitfall. In our framework, when the Master agent takes GPUs from a CHOPT session, the session randomly splits running NSML sessions into the stop pool and dead pool. If the CHOPT session gets allocated with more GPUs, it attempts to resume NSML sessions from the stop pool instead of creating new sessions, which is only done when the stop pool is empty. By reviving early stopped sessions which have potential, we can avoid the problem of early stopping.

Figure 2: Hyperparameter Optimization with Early Stopping. Colors indicate the depth of the model as a hyperparameter to showcase the negative effects of early stopping.

3.4 Configuration

config = {
    ’h_params’: {
        ’lr’: {’parameters’: [0.01, 0.09], ’distribution’: ’log\_uniform’,
               ’type’: ’float’, ’p_range’: [0.001, 0.1]},
        ’depth’: {’parameters’: [5, 10], ’distribution’: ’uniform’, ’type’: ’int’,
                  ’p_range’: [5, 10]},
        ’activation’: {’parameters’: [

relu

, ’sigmoid’], ’distribution’: ’categorical’,
                       ’type’: ’str’, ’p_range’: []}
    },
    ’h_params_conditions’: [],
    ’h_params_conjunctions’: [],
    ’measure’: ’test/accuracy’,
    ’order’: ’descending’,
    ’step’: 5,
    ’population’: 5,
    ’tune’: {’pbt’: {’exploit’: ’truncation’, ’explore’: ’perturb’}},
    ’termination’: {’max_session_number’: 50}
}
Listing 1: Example of configuration

Many hyperparameter optimization frameworks require users to modify their code. However, it takes time to understand and perform implementations for these modifications. In contrast with other frameworks, CHOPT does not require any code modification while require minimum configuration to tune their models on our framework. In this section, we introduce the typical form of our framework’s configuration file, which uses a dictionary based structure. We provide many examples of configuration files in our system for user convenience.

3.4.1 Hyperparmeter Space Definition

Defining hyperparameter space is one of the most important steps in automated machine learning. Hyperparameter space should be large enough to cover many points, and it should be small enough to search points efficiently at the same time. We chose to use a dictionary format from python for hyperparameters because python is one of the most popular programming language among ML researchers. Code in Listing 1 shows an example of an interface for hyperparameter space which has four components; parameters, distribution, type, and range. The framework supports various distributions other than uniform such as Gaussian distributions. With the simple python-dictionary-style configuration, users can easily add or remove any hyperparameters, and also compare two CHOPT sessions with different hyperparameter configurations via web interface. As deep learning models get more complex to get better performance, in many cases they require hierarchical hyperparameters. For instance, if a model uses stochastic gradient descent 

Ruder (2016) as an optimizer, training would require momentum and learning rate as hyperparameters.To fulfill these requirements, CHOPT also supports hierarchical hyperparameter space.

3.4.2 CHOPT Session Configuration

As shown in Code Listing 1, there are a few configurations for CHOPT other than defining hyperparameter space. It is necessary to define the goal of CHOPT, so that it can compare many models and find the best performing one. Users can define the goal of a task by defining measure and order. For example, we can select either top-1 accuracy or loss on the validation set as the measure, and set descending or ascending as the order respectively by their focus.

As we mentioned in section 3.3, our framework supports early stopping. Setting the checking interval for stopping is important because it controls the performance of early stopping. Here, users can configure this interval by setting step. Note that if step is -1, CHOPT session disables early stopping.

Every iterative optimization process has a termination condition to avoid from running infinitely. We support three options for termination: time, max_session_number, and performance_threshold. When multiple conditions are used, our framework stops as soon as it reaches the first condition.

To choose a CHOPT algorithm we can configure it by tune. In this paper, we support three algorithms: random search with or without early stopping, population based training Jaderberg et al. (2017), and hyperband Li et al. (2017). Each algorithm requires different set of parameters. For instance, population based training requires two functions to exploit to compare models and explore new search space while hyperband requires resource limit.

3.5 Web-Based Analytic Environment

Although hyperparameter optimization framework enables users to train hundreds, thousands of models in parallel, if is difficult to analyze, results can be useless. CHOPT provides powerful visualization tools that allows users to monitor, analyze the results to get constructive insights. In this section, we describe our visual tool’s design goals, implementation and features.

3.5.1 Design Goals

  • Scalable representation: illustrate tuning results in a scalable way without loss of information even user adds hyperparameter more and more.

  • Compatibility environment: integrate multiple results although they have different configurations such as the number of hyperparameter sets, tuning algorithm, and others.

  • Fine-tuning Method: provide interface that simply re-configurate the hyperparameter spaces. for searching unexplored or shrinking ranges.

  • Expand Hyperparameter Dimension: allow user to add a new hyperparameter to be tuned.

  • Model Analyze: serve many features for model analysis.

3.5.2 Scalable Hyperparameter Configuration Visualization

For scalable representation, we use parallel coordinates visualization  Inselberg & Dimsdale (1987); Heinrich & Weiskopf (2013). Parallel coordinates visualization has an advantage in representing high-dimensional or multivariate data in a 2D plane by arranging each dimension in parallel. Additionally, the visualization can provide overall distributions of datasets for each dimension. Other CHOPT visualization systems also have used parallel coordinates visualization to represent hyperparameter tuning results  Golovin et al. (2017); Liaw et al. (2018); Tsirigotis et al. (2018).

In the visualization as in Figure 3, each line represents a machine learning model, NSML session, created by CHOPT session, where each axis represents a hyperparameter such as learning rate, activation, dropout rate, and more. By arranging each hyperparameter in parallel, the visualization can increase the number of hyperparameters and present a number of hyperparameter configurations in a scalable way. We expect this visualization to help users analyze the relationships between hyperparameters as well as analyzing results.

Figure 3:

An example of representation of hyperparameter tuning result on parallel coordinates visualization with on CIFAR100 random erasing model. Each axis represents each hyperparameter (except for last right axis for evaluation metric), and each line represents the trained model’s hyperparameter configuration.

For the rest of design goals, we implement several features and they are described in the following section.

3.5.3 Visualization Feature List

CHOPT visual tool serves many useful features to the user. Users can monitor and analyze their models by interacting with visual tools. We list up our selected features as follows.

  • Model Selection and View

    • Masking Top K sessions: users can set thresholds and select models with the highest performances at once (Top at Figure 4).

    • Multiple range selection: users can select models with specific hyperparameter ranges (Bottom at Figure 4).

    • Highlighting axis: users can highlight lines with specific conditions to find relations between hyperparameters and model performance.

    Figure 4: An example of how select interested models on the parallel coordinates. Users can select the top performed K models at once by one click and select the models within certain ranges by dragging the ranges on each axis.
  • Axis Control

    • Filtering axis: if there are too many hyperparameters to analyze, users can specify the interested hyperparamters by setting the visibility of each axis.

    • Controlling range scale: if there are too many lines to see the details on the parallel coordinates, users can easily adjust the scale of each axis by selecting the interested ranges on the density plot.

  • Exhaustive Analysis

    • Merging or switching interesting sessions: users can load and see the multiple autoML sessions, as well as merge and see the results simultaneously

    • Scalar plot view: users can analyze the details of selected models such as loss values on each step or wall-time.

    • Model summary view: users can easily see the precise values of the hyperparamters and the performance of selected models.

    Figure 5: An example of further analytic tools. Learning duration view is a bar plot for the models’ learning duration; hyperparamter set clustered view is a t-SNE plot for the structural overview of created models; hierarchical view is a node-link diagram for the parent-child relation overview.
  • Further Analytic Tools

    • Rerun Interface: users can change the configuration of a CHOPT session and submit a new one by selecting different hyperparameter ranges or adding hyperparameters on the parallel coordinates visualization.

    • Parameter analytic view: users can select visualization type and parameters to see the distributions of the interested parameter or relation between two parameters via histogram or scatter plot.

    • Session overview: users can see the a summarized overview on hyperparameter configurations by selecting from learning duration, hierarchical, and clustered views.

Figure 6: Basic usage flow of hyperparameter tuning with CHOPT visualization

3.5.4 Fine Tuning with Analytic Visual Tool

When optimizing hyperparameters, the typical user would start with a small set of hyperparameters and add would incrementally expand the number of hyperparameters while optimizing. This fine tuning process can be done with the following steps, which is also illustrated in Figure 6.

  1. Run CHOPT session with initial hyperparameter sets: users run the CHOPT session with an initial configuration which contains hyperparameter sets to be tuned, ranges to be explored, and tuning method, and so on.

  2. Analyze hyperparameter tuning results: users analyze the CHOPT results with a single session or multiple sessions according to whether the session is the initial one or not. We expect users will obtain insight into which values (for numerical) or types (for categorical) of hyperparameters affect the performance of the model at this Step.

  3. (Optional) Rerun with unexplored/narrowed ranges of hyperparamter sets: users can select unexplored or narrowed ranges of the hyperparamter sets to make the CHOPT session explored the ranges based on the previous results. In case of unexplored ranges, users may want to see the effect of other ranges to the performance of their models. In case of narrowed ranges, they may want to find more precise values affecting the performance of the models. Also, users can change the other configurations for the next CHOPT session such as tuning algorithm, step size for early stopping, population size, and so on. Additionally, we expect that the users can combine tuning configuration based on what they aim at in this Step such as tuning algorithm, early stopping option, and etc.

  4. (Optional) Append new hyperparamter to be tuned: If users get an optimal combination of a given hyperparamter sets through a single CHOPT session or multiple sessions, they can append a new hyperparameter to be tuned which was a constant value in the previous sessions.

  5. Get desired models with fine-tuned hyperparamter sets

4 Practical use cases

In this section, we describe how CHOPT visualization works with a real-world example. Figure 7 shows an overview of CHOPT visualization with six CHOPT sessions with CIFAR100 dataset.

Figure 7: An overview of CHOPT visualization system: the six different sessions with different hyperparameter configurations are represented on parallel coordinates, and the three top performed models are shown in details. The tooltips on parallel coordinates are activated by mouse hovered on the ‘train/loss’ line plot on the middle of the figure.

As described at the usage flow in Section  3.5.4, we fine-tuned six hyperparameters step-by-step. First, we run an initial CHOPT session (purple color on the Figure 7) to tune ‘learning rate’ with other hyperparameters which not to be tuned.

For the next step, we selected the top ten models with our masking method (figure 4) and obtained the optimal ranges related to the selected models. With the optimized ranges of ‘learning rate’, we run a second CHOPT session with additional hyperparameter ‘momentum’ to be tuned (red color) so that CHOPT session explores the spaces with the optimal range of ‘learning rate‘ and the initial range of ‘momentum’.

Subsequently, we obtained the optimized ranges that performed nicely and run with additional hyperparameters, as same with the earlier steps. With this context, we gradually added other hyperparameters one by one and fine-tuned each optimized range.

However, in the fifth step (yellow), we added ‘depth’ hyperparameter and obtained slightly improved performance with 70.54, and we found that the experiment was quite biased by the early stopping because most models with depths greater than twenty dropped very quickly. Therefore, we rerun the CHOPT sessions (blue color) without the early stopping method and obtained the better results and the highest performed model with 79.37 percent accuracy, which shown at the top-right side of the parallel coordinates in the Figure 7. As we mentioned in Section, early stopping does not guarantee better results than non early stopping results.

Table 1 shows the changes of the configuration (hyperparameter spaces) according to the progress of the fine tuning. Each constant value means there was no exploration by the CHOPT. Even if each CHOPT sessions has different configuration of hyperparamter spaces to be tuned, our visualization allows users to integrate the sessions by setting the constant value. For example, the ‘depth’ hyperparamter was a constant value of 20 in the all sessions (1-4th sessions in table) except for the yellow and blue colored session (5-6th session in table) as shown in the Figure 7.

no. Top Acc. early stopped learning rate momentum prob sh depth
1st 69.62 True 0.001 - 0.2 0.9 0.0 0.4 20
2nd 69.78 True 0.0334 - 0.0868 0.1 - 0.999 0.0 0.4 20
3rd 70.4 True 0.0334 - 0.0868 0.9 - 0.9381 0.0 - 0.9 0.4 20
4th 70.36 True 0.0334 - 0.0868 0.9 - 0.9381 0.2378 - 0.579 0.2 - 0.9 20
5th 70.54 True 0.0266 - 0.0399 0.9 - 0.9381 0.2111 - 0.380 0.2237 - 0.3496 [20, 92, 110, 122, 134, 140]
6th 79.37 False 0.0266 - 0.0399 0.9 - 0.9381 0.2111 - 0.380 0.2237 - 0.3496 [20, 92, 110, 122, 134, 140]
Table 1: Fine tuning results and configurations for each session in Figure 7

Meanwhile, the right side of the figure shows the further analysis of Automl results and model summary table of the selected models. The scatter plot between ‘prob’ hyperparamter and ‘test/accuracy’ metric is visualized (right top side of the figure), and it can be seen that the ‘sh’ affects to high performance values in the range of 0.22 to 0.35.

The horizontal bar plot (right middle) shows the learning duration of each model. The x-axis represents the last learning step for each model. In practice, this plot can help users to find biased experiments because we found that the fifth experiment had never explored at the depth with 140 by early stopping. The figure 2 shows the difference between fifth and sixth sessions which have same hyperparamter spaces but different early stopping use. We found that the fifth session was biased to depth with 20, and the depth 140 was not explored. On the other hand, the sixth session was not biased.

The selected model summary table (the right-bottom side) is shown to provide users to recognize the precise values of selected models at once such as performance and hyperparamter configuration.

5 Evaluation

In order to evaluate the performance of our system, we conduct experiments on two different tasks; Image classification and Question-Answering. For each experiment, we used CIFAR-100 dataset Krizhevsky & Hinton (2009) and SQuAD 1.1 Rajpurkar et al. (2016) dataset, respectively. Both evaluations show that our proposed framework finds better performance than performance reported in the papers.

5.1 Image Classification and Question-Answering

Image classification and Question-Answering are well-known as the most challenging, and most popular tasks in deep learning community  He et al. (2016); Seo et al. (2016)

. There were huge breakthrough on convolutional neural networks (CNNs) for image classification after residual network  

He et al. (2016) and attention architectures for Question-Answering using SQuAD dataset, respectively. We test our CHOPT framework on various CNNs structures on the image classification task with CIFAR-100. We choose CIFAR-100 data because it is large enough to compare deeper model’s performance comparing with mnist  LeCun (1998)

or CIFAR-10, and simultaneously, it is small enough to train many models in a short time comparing with Imagenet  

Deng et al. (2009). We examine our framework with residual network (ResNet), wide residual network (WRN)  Zagoruyko & Komodakis (2016). In addition, we test on regularization method as well, specifically on data augmentation by Random Erasing (RE)  Zhong et al. (2017) with ResNet and WRN to prove that our framework works on search space with high dimension. CHOPT is also evaluated on SQuAD 1.1 for Question-Answering. For experiments, we use BiDAF  Seo et al. (2016). In this experiment, we use random search with early stopping, population based training (PBT) and Hyperband while reporting the best result among these methods.


Tasks Models References CHOPT
IC ResNet 76.27 77.75
IC WRN 81.51 81.66
IC ResNet with RE 77.9 79.45
IC WRN with RE 82.27 83.1
QA BiDAF 77.3 77.93
Table 2: Best top 1 accuracy (%) reports obtained with CHOPT. Image classification (IC) and question-answering (QA) tasks were performed with various models. Bold indicates higher performance. ResNet He et al. (2016), WRN Zagoruyko & Komodakis (2016), RE Zhong et al. (2017)

Table 2 summarizes the comparing results of the models reported by references and the best models searched by our CHOPT framework. We use top-1 accuracy for measure, and in most cases, CHOPT succeeds in finding better-performing models than the references.

However, it may seem unfair to claim that CHOPT is better than the human-tuned model from the reference because of different model parameter sizes. For instance, CHOPT’s best model of WRN with RE in Table 2 has 172.07M parameters while the best result from the reference contains only 36.54M. We tried one more experiment on WRN with RE while limiting the maximum number of parameters.


Top-1 # of parameters
Baseline 82.27% 36.54M
CHOPT w/ constraint 82.41% 36.54M
CHOPT w/o constraint 83.1% 172.07M
Table 3: Best model with parameter limit

Table 3 shows the result of this experiment. The best model by CHOPT is slightly better or at least same performance as human-tuned model even it is limited by the model size. Specifically, both models shared the same architecture of 28 depth and 10 widen factor, but they had different hyperparameters for regularization and optimization.

5.2 Stop-and-Go

Our early stopping method Stop-and-Go is one of the key features of our framework. In the best case scenario, early stopping can help users save both resources and time, while still finding the best model. However, naive early stopping might fail and drop potentially good training sessions. In this section, we will show how the early stopping affects performance and resource efficiency on CHOPT. We chose ResNet with RE as the target architecture for image classification on CIFAR-100 with the termination condition as 200 models for the experiment. Each model is configured to run up to 300 epochs. We used popular-based training (PBT) for all experiments with early stopping, and used random search without early stopping.


GPU time Top-1
Without early stopping 60+ days 79.75%
Large step size (25 epochs) 22 days 79.45%
Small step size (3 epochs) 2 days 77.42%
Table 4: GPU time and performance by step size

Both resource efficiency and performance by step size are presented in Table 4. We denote GPU time as the time taken if only one GPU is used for the entire CHOPT pipeline. Without early stopping, CHOPT can generate the best model among all algorithms including the human-tuned model (77.9%), while taking more than 2 months of GPU time. When we enable early stopping with small step size, it only takes 2 days of GPU time, but it cannot find a good enough model. However, if we put the right step size like the second row in the table, we can increase our GPU efficiency by more than a double, while obtaining similar performance without early stopping.

Figure 8: Adaptive available GPUs control between NSML and CHOPT sessions
Figure 9: Fully trained result of a revived early stopped model. Top: Hyperparameter configuration of early stopped model. Bottom: Fully trained result of early stopped model

As another advantage, the stop-and-go feature allows us to get the same performance results in shorter time by assigning idle GPUs to CHOPT sessions. Figure 8 shows how proposed Stop-and-Go method improves resource utilization. We divide time period into multiple zones, A,B,C,D, and E for better explanation. In the time zone A, no CHOPT session is running. So, green and yellow lines overlap. Then, several CHOPT sessions start hyperparameter optimization in the time zone B. As a result, green line goes up little bit. During the time zone C, master agent assigns idle GPUs to CHOPT sessions to maximize the utilization of cluster and make tune much faster since cluster is under-utilized. However, other users suddenly try to use GPUs for their models in time zone D (Yellow line goes up). It turns out, master agent takes GPUs from CHOPT session. Although it exceeds maximum number of GPU for CHOPT but not that much. Last part of time zone D and time zone E, CHOPT sessions almost finish optimizing hyperparameters so green line goes down and finally overlap with yellow line.

By reviving stopped sessions from early stopping, the Stop-and-Go method sometimes improves performance of CHOPT. Figure 9 presents one example of successful Stop-and-Go case. As shown in the top side of Figure 9, the model was stopped by CHOPT at the early stage because its poor performance. However, after it got revived by Stop-and-Go, the model ended up with 76.61% of top-1 accuracy. Although it was not the best model CHOPT has found (77.42%), considering that the results are competitive, we can say that Stop-and-Go can potentially save valuable hyperparameter configurations.

6 conclusion

Although several HyperOpt methods have been proposed, many researchers still suffer from optimizing hyperparameters due to the difficulty in usage and inconvenient interfaces. In this paper, we propose a novel cloud-based HyperOpt (CHOPT) framework that efficiently tackles flexible computing resources with supporting various HyperOpt algorithms. In addition, we propose Stop-and-Go to address the early stopping problem while maximizing utilization of shared-cluster resources. We presented our analytic visual tool and showed our system’s capabilites with a practical use case of hyperparameter fine tuning. Finally, we demonstrate the performance of CHOPT with two real world tasks, image classification and question-answering. The results have shown that our framework can find competitive hyperparameter configurations compared to previously reported results.

Although Stop-and-Go is an effective method, it is still hard to pick a session to revive among all early-stopped models. For future work, we would like to find more effective policies other than using random selection methods. In addition, we would like to extend our AutoML method by adding more criteria for our framework. Currently CHOPT only uses for selecting the best model, but we would also like to find models which also have less number of parameters, are resource efficient, and train fast.

References