The global express delivery industry has been a trillion market, serving the people’s daily life around the world. In 2017, the industry revenue is 248 billion USD (IBISWorld, 2018) and in China, particularly, the annual gross express volume has surpassed 30 billion USD since 2016 (Fan et al., 2017). During the recent two years, a new type of shipping warehouses, with intelligent robots sorting thousands of parcels per hour, emerged (People’s Daily, 2017). As shown in Figure 0(a) and 0(b), autonomous robots carry parcels across the warehouse and unload the parcels into the target holes which connect to the vehicles heading to the target destinations. The layout of the warehouse, i.e. the matching of the holes and the target destinations, is usually designed by human experts. It can be challenging and also likely to be suboptimal, especially when the number of holes is large as shown in Figure 0(b). Moreover, the demand of such warehouse layout design is not one-off, since the distribution of the parcel destinations is not fixed and the warehouse layout design should be adaptive to achieve the best performance.
In this paper, we present an evolution-based method for automatically designing warehouse layout. To tackle the efficiency issue arising from time-consuming evaluation of each designed warehouse layout, we consider to train a neural network to predict outcomes of layouts without actually running agents in it, which is known as fitness approximation in the context of evolution(Jin, 2005)
. We further propose a novel two-layer population structure to incorporate the prediction model into the evolution framework for improving efficiency, which can be categorised as multiple-deme parallel genetic algorithms(Cantú-Paz, 1998). Particularly, the higher layer consists of layouts that are actually evaluated and occupies a small fraction of the whole population while the lower layer contains layouts whose fitnesses are predicted by the learned model. Compared to existing methods for combining fitness approximation with evolution (de Jong, Thierens, and Watson, 2004; Hong, Lee, and Tahk, 2003), the proposed two-layer evolutionary algorithm explicitly manages evaluated individuals and predicted individuals separately in two sub-populations and trains the approximation model online using the samples evaluated by the original fitness function. As such, the proposed method incorporates fitness function approximation into the multiple-deme parallel genetic algorithm naturally. Moreover, within an evaluation of a designed warehouse layout, we can observe not only the final outcome but also additional agent trajectories that comprise hidden information about the causes of the outcome. To take advantage of such additional information to improve the quality of the prediction model, we construct an auxiliary objective, i.e. to predict the heatmap of the environment where each individual value is the total number of visits of a point.
Our experiments of designing warehouse layouts demonstrate improved efficiency and better performance compared to both manual design and vanilla evolution-based methods without fitness approximation. Such a two-layer evolution-based environment optimization framework is promising to be applied onto various environment design tasks.
2. Related Work
There are many real-world scenarios that can be regarded as environment design problems, ranging from game-level design with a desired level of difficulty (Togelius et al., 2011), shopping space design for impulsing customer purchase and long stay (Penn, 2005) to traffic signal control for improving transportation efficiency (Ceylan and Bell, 2004). In a recent work, (Zhang et al., 2018)
formulates these environment design problems using a reinforcement learning framework. In this paper, we focus on a new environment design scenario, i.e. warehouse layout design, emerging from the rapidly growing express industry.
Traditional warehouse design problems can be categorised to three levels, strategic level, tactical level and operational level (Rouwenhorst et al., 2000). At the strategic level, long-term decisions are considered, including the size of a warehouse (Roll, Rosenblatt, and Kadosh, 1989) and the selection of component systems (Oser, 1996; Keserla and Peters, 1994). At the tactical level, medium term decisions are made, such as the layout of a conventional warehouse (Bassan, Roll, and Rosenblatt, 1980; Berry, 1968). At the operational level, detailed control policies are studied, e.g. batching (Elsayed and Stern, 1983) and storage policies (Goetschalckx and Donald Ratldff, 1991). The problem discussed in this paper is about warehouse layout design, which is at the tactical level traditionally. However, in the era of big data, the layout of warehouse could be adaptive to the changes of the external environment. Specifically, the layout of the warehouse could be redesigned at intervals according to the changing destination distribution of the parcels. Thus, this problem is better to be categorised as a operational level problem.
For solving this problem, we adopt evolutionary algorithms. As getting a guiding signal means evaluating the designed objective in the target task, which would result in unacceptable computational resource requirement for scenarios where evaluation is expensive. To reduce the amount of expensive evaluations on real data needed before a satisfying result can be obtained, some works propose to learn a model to predict the outcome of a designed objective without actually running on real data (Baker et al., 2018; Liu et al., 2017). Similar idea has been explored in the field of evolution and is known as fitness approximation (Jin, 2005). Due to the inaccuracy of fitness approximation, it is essential to use the approximation model together with the original fitness function (Grierson and Pak, 1993; Ratle, 1998). To incorporate the fitness model into the simulation-based evolutionary algorithms, individual-based (Bull, 1999) and generation-based (Ratle, 1998) methods are studied. Differently, our approach explicitly manages two sub-populations whose individuals are evaluated by the approximation model and the original fitness function respectively. Similar approaches are known as multiple-deme parallel genetic algorithms (Cantú-Paz, 1998)
. Our work can be classified as a multiple-deme parallel genetic algorithm with a two-layer sub-population topology to balance exploitation and exploration.
3. Problem Definition
In this section, we formulate the environment design problem and introduce the particular robotic warehouse environment. We fix the agent policy in the robotic warehouse environment and focus on the remaining task, assigning destinations to the holes, which can be viewed as an environment design problem.
3.1. Environment Design
In many scenarios, there are agents taking actions in a designable environment, such as cars running in a transportation system, consumers shopping in a mall, and so on. Denote the agent’s policy as and the environment is parametrized as , where denote state space, action space, transition function, reward function and reward discount respectively. After the agents play in the environment in an episode, a joint trajectory is produced and a cumulative reward is given to the agent, where and denote state and joint action respectively. Moreover, the objective of the environment designer is given as , whose function form can be defined specifically, and the designer intends to design an optimal environment to maximize the expectation of its objective
Note that the randomness of is derived from the possible randomness of when selecting actions.
3.2. Robotic Warehouse Environment
In this paper, we consider a robotic warehouse environment abstracted from a real-world express system as shown in Figure 0(a), where there is a warehouse for sorting parcels from a mixed input stream to separate output streams according to their respective destinations. The sorting process is done by the robots carrying parcels from the input positions (sources) to the appropriate output positions (holes) in the plane warehouse as Figure 0(b) illustrates. In order to maximize the efficiency of sorting, we should set the robots’ cooperative pathfinding algorithm and assign the destinations to the holes. In this task, the agents share a common reward and the environment also takes as its design objective, i.e. . We set as a joint policy model for the agents. As such, the problem is formulated as
For solving Eq. (2), we should firstly set a sound cooperative pathfinding algorithm for the robots. After, we focus on optimizing the environment parameter , i.e. optimizing the layout of the warehouse (the assignment of the destinations to the holes) via
Note that the demand of such environment layout design is not one-off. Since the external variables (such as the destination distribution of the parcels) may be changing, the best layout of the warehouse is changing accordingly. Thus, the layout of the warehouse should be redesigned at intervals, which gives a reason to find an efficient layout design approach.
3.3. Detailed Environment Description
The warehouse is abstracted as a grid containing cells. Among them, cells are sources and cells are holes, whose locations are given. There are robots available to carrying parcels from sources to holes. Each cell is only for one robot to stand.
In each time-step, each robot is able to take a move to an adjacent cell. When an empty robot moves into a source, it loads a new parcel whose destination follows a distribution over destinations (cities) with the proportions . On the other hand, when a loaded robot moves into a hole with the destination that is as the same as the loading parcel’s, it unloads the parcel into that hole. That is to say, the rates of input and output flows are not restricted in our setting. Parcels are always sufficient when a robot moves into a source.
Our objective is to sort as many parcels as possible in a given time period . We could achieve this objective by designing the layout of the warehouse, i.e. assigning the proper destinations to the holes. Specifically, we should determine the parameter of the environment , where for . Intuitively, the assignment of the destinations to the holes will affect the robots’ paths and hence the efficiency of the whole warehouse.
The notations defined in this section are listed in Table 1.
3.4. Problem Complexity
For the problem defined above, the scale of the layout assignment space is , where denotes the number of the holes and denotes the number of the parcel destinations. Since the robot pathfinding algorithm works like a black box to evaluate each layout assignment, it is hard to determine a global optimum without exploring the solution space completely. Thus, this optimization problem is an exponential time problem. Even for a small setting, such as , the number of the assignments is as large as about trillion, which is hard to be explored completely.
|Height of warehouse||Input|
|Width of warehouse||Input|
|Number of source cells||Input|
|Number of hole cells||Input|
|Locations of source cells||Input|
|Locations of hole cells||Input|
|Number of robots||Input|
|Number of parcel destinations||Input|
|Proportions of parcel destinations||Input|
|Length of timestep||Input|
|Assignment of destinations to holes||Output|
3.5. Robot Pathfinding Algorithms
In our problem, the robot pathfinding algorithm is fixed. As the robots are quite dense in the real-world warehouse, jam prevention is the key point. We considered two cooperative pathfinding algorithms with jam prevention design. The first one adopts WHCA* (Silver, 2005) as a planner, which searches the shortest path from an origin to a destination for each robot in turn and ensures non-collision. The second algorithm is a greedy one, which guides the robots by a look-up table in each position and reduces conflicts by setting one-way roads in the map as illustrated in Figure 1(a). We studied these two algorithms and the results showed that the greedy one has a significant advantage on time complexity and a minor disadvantage on performance. Due to the large simulation demand for testing environment parameter, we selected the time-saving greedy algorithm as the agent policy in our experiments. However, the proposed warehouse layout design solution can work with other robot pathfinding algorithm as well.
In this section, we first introduce an evolution framework for automatically designing warehouse layout, and then present the auxiliary objective fitness approximation and the two-layer population structure for improving the efficiency.
(a) An illustration of one-way roads: i) the odd-row cells allow moving right and forbid moving left, while the even-row cells allow moving left and forbid moving right; ii) the odd-column cells allow moving down and forbid moving up, while the even-column cells allow moving up and forbid moving down. The left-down cell is in Row 1 and Column 1. (b) A layout sample as an individual in the evolutionary algorithm. (c) An example of the heatmap.
4.1. Evolution with Robot Policy Simulation
Under the evolution framework, we maintain a population containing warehouse layout individuals, i.e. assignments of the destinations to the holes (Figure 1(b)), and evolve the population for generations. Within each generation, we perform crossover, mutation and selection in order:
In the crossover phase, we randomly select pairs of samples. For each pair of samples, we splice their holes from two matrices to two lines respectively. Then, we randomly select a common breakpoint for both lines and cross the two lines just like chromosomal crossover. Finally we generate two square matrices by reshaping the two lines.
In the mutation phase, we randomly select samples generated in the crossover phase. For each sample, we randomly select holes and randomly permute their destinations.
In the selection phase, we evaluate the generated samples in the crossover and mutation phases by robot policy simulations, then merge the original and the generated samples. The best ones are selected for the next generation.
4.2. Two-layer Evolutionary Algorithm with Fitness Approximation
In this section, we propose a novel evolutionary algorithm that trains an auxiliary objective fitness function to evaluate a large population for providing promising individuals to a small population evaluated by simulations.
4.2.1. Auxiliary Objective Fitness Approximation
In practise, the simulation of robots performing in the environment is time-consuming. A promising way to reducing the simulation time is to use an approximation function to compute fitness:
where is the fitness approximation function, is a sample of environment parameter and is the predicted fitness of , whose learning target is the expectation of the reward .
Moreover, since a simulation generates a trajectory in addition to the reward , we consider utilizing to help training fitness function . Although is the exact objective for fitness function to learn, we may extract additional information from that helps training the fitness function, under the assumption that and are correlated. We set an auxiliary training objective and use a neural network to capture this:
where is a neural network consisting of three sub-networks: is the bottom network that captures the common features and outputs ; and are the two separate networks on the top of that predict and respectively.
In the robotic warehouse layout design problem, represents the assignment of the destinations to the holes and represents the movements of the robots. Furthermore, we define as the heatmap of the movements as Figure 1(c) shows. Intuitively, the distribution of busy areas should be correlated with the efficiency of sorting and the reward. The process of learning the fitness function in the warehouse layout problem is illustrated in Figure 3.
Since obtaining simulation samples is time-consuming, we train the fitness model online. Specifically, the fitness model is trained with the samples simulated along the process of the evolutionary algorithm. There is no pre-training in our approach.
4.2.2. Two-layer Population
The fitness model provides a less accurate but more speedy evaluation than the simulation. These property indicates that the simulation is better to find the local optimum exactly and the fitness model is better to explore the global space speedily. For the standard simulation-based evolution, mutation rate is usually set small enough to ensure convergence within an acceptable time, thus the search space is relatively local. Therefore, we consider incorporating the fitness model into the standard simulation-based evolution as an additional part for exploring the global space.
Specifically, we maintain two sub-populations. The first one is of the same size as the population set in the standard simulation-based evolution. Also, the individuals in the first sub-population are evaluated by simulations. The second sub-population is multiple times larger than the first one and the samples in it are evaluated by the fitness model. We view the second sub-population as a candidate population whose top individuals have a chance of joining the first sub-population. On the other hand, the bottom individuals in the first sub-population may be moved to the second sub-population. We name the first-layer sub-population noble and the second civilian. Noble population and civilian population evolve separately while keeping a channel for migration.
In detail, the two-layer population evolves as Figure 4 and Algorithm 1 show. In general, and maintain individuals evaluated by the simulation and the fitness model respectively. In each generation, migration takes place. Specifically, from the civilian layer go up to the noble layer and from the noble layer go down to the civilian layer. In addition, the civilian layer discards the worst population and absorbs randomly generated population .
There are parameters related to the proposed two-layer evolutionary algorithm. They are noble population number , civilian population number , crossover rate , mutation rate , for the number of civilian individuals migrate to the noble layer, for the number of the randomly generated individuals, and for the number of model updates in each generation. Other variables can be determined by these parameters. In each generation, simulations, model updates and model predictions are performed. Since the time cost of training the network and use it to predict is negligible compared to the simulations (see Table 4), the time complexity of the two-layer evolutionary algorithm for generations is .
We set up a virtual intelligent warehouse environment based on real-world settings and test our proposed approach comparing to the baselines. Our experiment is repeatable and the source code is provided in the supplementary.
5.1. Experiment Settings
Environment. We test our proposed approach in maps. The positions of the sources and holes are set as the real-world scenarios. The detailed parameters are given in Table 2. The destination distributions are set according to long-tail functions to reflect reality. In our experiments, the reward is defined as the sum of parcel loading times and unloading times (roughly two times as the number of parcels processed).
Robots. As introduced, we adopt a greedy algorithm as the cooperative pathfiding algorithm for the robots. Firstly, we set one-way roads in the map as Figure 1(a) shows to avoid opposite-directional conflicts, while right-angled conflicts are avoided by setting priority. On the one-way roads, the robots decide moves by a look-up table containing records, each of which indicates the first step towards a particular source or hole from a particular cell.
Baselines. We test baselines to compare with our proposed two-layer evolutionary algorithm (TLEA). Random: The holes are assigned with random destinations uniformly. Heuristic: Destinations select holes in turns according to their proportions. For example, if parcels are going to destination A, then A select of the holes. This process start from the destination with the most proportion. Each destination greedily selects each hole that minimizes the sum of the average distance from the sources to the selected holes. Simu: The evolutionary algorithm with simulations as introduced in the Solution section. SimuInd: An implementation of the individual-based evolution control algorithm (Bull, 1999). This approach maintains a single large population for evolution whose individuals are evaluated by the fitness model. In each generation, the best individuals evaluated by the fitness model are evaluated by the simulation once again. The fitness model is trained online with the samples produced by the simulations. SimuGen: An implementation of the generation-based evolution control algorithm (Ratle, 1998). This approach also maintains a single large population as SimuInd. The difference is that SimuGen uses the simulations intensively in a generation and uses the fitness model in the next several generations.
Hyper-parameters. To ensure fairness, for Simu, SimuInd, SimuGen and TLEA, the number of generation is set as and the number of simulations in each generation is set as . The model update and prediction times are also fixed as and respectively for SimuInd, SimuGen and TLEA. The population of Simu is ; in each generation individuals are generated by crossover; of them are mutated. For SimuInd and SimuGen, the populations are ; are generated by crossover in each generation; of them are mutated. For the TLEA, are set to be respectively.
Fitness model. Our network is composed of three sub-networks , , . The output of is used for the input of and .
has two fully connected layers whose output is a vector that can be reshaped to match the size of map. Then, a 2D transposed convolution layer follows.has one transposed convolution layer to generate the heat map. And
contains three fully connected layers to predict the reward. All the layers except the output layers have a ReLU activation function. The loss functions for the two outputs are set to be MSE. The first two fully connected layers have 128, 400 units respectively. The first 2D transposed convolution layer have 16 filters. And the second one has one filter. The three fully connected layers for reward prediction have 256, 128 and 1 unit respectively.
Hardware. We use two computers with an Intel core i7-4790k and an Intel core i7-6900k respectively. The one with 4790k also has an extra Nvidia Titan X GPU.
runs. The reward samples pass the Shapiro-Wilk test to be normal. T-tests are performed for TLEA against Simu, SimuInd and SimuGen. The statistical results show that the superiority of TLEA is significant.
We perform the baselines and TLEA. The results are shown in Table 3. We find Heuristic is fairly high compared to Random but is inferior to evolutionary algorithms. Moreover, TLEA outperforms all the baselines.
Figure 5 shows the layouts designed by the baselines and TLEA with the heatmaps. We can see that the tracks of the robots running in the maps of TLEA are better balanced, indicating that there are less traffic jams.
shows the learning curves. Since SimuInd and SimuGen mix the individuals evaluated by the simulation and the fitness model, their current best individuals may be the over-estimated ones by the inaccurate fitness model, which may lead to discarding the real best individuals. TLEA solves this problem by separating the two populations and ensure that the real best individual is always kept in the noble population.
In addition, TLEA and Simu are more stable than SimuInd and SimuGen, because the temporary best individual may be evaluated by the fitness model in SimuInd and SimuGen, which may be corrected by the simulation in later generations. The slight fluctuations of Simu and TLEA are caused by the variance of the simulations, which results in that the best samples can be over-estimated (which is much slighter than the fitness model) and would be averaged by extra simulations in later generations.
Time cost. The time costs of the tested algorithms are listed in Table 4. It shows that the time cost proportion of the fitness model is less than . In out experiment, we just ignore the time difference between Simu and other algorithms.
|Simulation||Model Update||Model Predicting||Time|
Effectiveness of heatmap. We evaluate randomly generated samples by the simulations and use them to train the fitness functions with and without heatmaps as auxiliary objective. We compare MSE and Pearson Correlation of them in Table 5, which shows that heatmap provides significant improvement to the fitness function.
Simulation allocation. Since simulations are scarce resources when running evolutionary algorithm, the allocation of simulations between the noble layer and the civilian layer is important. Moreover, it also determines the migration rate between the two layers. We test different , the ratio of simulations allocated to the noble layer, and find that is a proper setting (see Table 6), which means three fourths simulations are allocated to ensure the accuracy of the noble layer and one fourth simulations are allocated to give chances to the civilian layer.
Impact of civilian population. We are interested in how much contribution has the civilian population made to the evolution of the noble population. We calculate a number named purity that measures how much the evolved noble population inherits from the initial noble population. As Figure 5(b) shows, the purity of the noble population declines rapidly along with the increasing of the reward (fitness). Finally, civilian population contributes more than percent to the noble population.
In this paper, we study the problem of automatic warehouse layout design. The proposed two-layer evolutionary algorithm takes advantage of a fitness approximation model, augmented with an auxiliary objective of predicting the heatmap. Our approach enhances the exploration of the evolutionary algorithm with the help of the fitness model. The experiments demonstrates the superiority of our approach over the heuristic and the traditional evolution-based methods. For future work, we would apply the proposed two-layer evolutionary algorithm to other environment design scenarios, such as shopping mall design, game design and traffic light control.
- Baker et al. (2018) Baker, B.; Gupta, O.; Raskar, R.; and Naik, N. 2018. Accelerating neural architecture search using performance prediction.
- Bassan, Roll, and Rosenblatt (1980) Bassan, Y.; Roll, Y.; and Rosenblatt, M. J. 1980. Internal layout design of a warehouse. AIIE Transactions 12(4):317–322.
- Bello et al. (2017) Bello, I.; Zoph, B.; Vasudevan, V.; and Le, Q. V. 2017. Neural optimizer search with reinforcement learning. ICML.
- Berry (1968) Berry, J. R. 1968. Elements of warehouse layout. The International Journal of Production Research 7(2):105–121.
On model-based evolutionary computation.Soft Computing 3(2):76–82.
- Cai et al. (2018) Cai, H.; Chen, T.; Zhang, W.; Yu, Y.; and Wang, J. 2018. Efficient architecture search by network transformation. AAAI.
- Cantú-Paz (1998) Cantú-Paz, E. 1998. A survey of parallel genetic algorithms. Calculateurs paralleles, reseaux et systems repartis 10(2):141–171.
- Ceylan and Bell (2004) Ceylan, H., and Bell, M. G. 2004. Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing. Transportation Research Part B: Methodological.
- Chen et al. (2018) Chen, T.; Moreau, T.; Jiang, Z.; Shen, H.; Yan, E.; Wang, L.; Hu, Y.; Ceze, L.; Guestrin, C.; and Krishnamurthy, A. 2018. Tvm: End-to-end optimization stack for deep learning. arXiv preprint arXiv:1802.04799.
- de Jong, Thierens, and Watson (2004) de Jong, E. D.; Thierens, D.; and Watson, R. A. 2004. Hierarchical genetic algorithms. In International Conference on Parallel Problem Solving from Nature, 232–241. Springer.
Domhan, Springenberg, and
Domhan, T.; Springenberg, J. T.; and Hutter, F.
Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves.In IJCAI.
- Elsayed and Stern (1983) Elsayed, E. A., and Stern, R. G. 1983. Computerized algorithms for order processing in automated warehousing systems. The International Journal of Production Research 21(4):579–586.
- Fan et al. (2017) Fan, W.; Xu, M.; Dong, X.; and Wei, H. 2017. Considerable environmental impact of the rapid development of china’s express delivery industry. Resources, Conservation and Recycling 126:174–176.
- Goetschalckx and Donald Ratldff (1991) Goetschalckx, M., and Donald Ratldff, H. 1991. Optimal lane depths for single and multiple products in block stacking storage systems. IIE TRANSACTIONS 23(3):245–258.
- Grierson and Pak (1993) Grierson, D., and Pak, W. 1993. Optimal sizing, geometrical and topological design using a genetic algorithm. Structural Optimization 6(3):151–159.
- Hong, Lee, and Tahk (2003) Hong, Y.-S.; Lee, H.; and Tahk, M.-J. 2003. Acceleration of the convergence speed of evolutionary algorithms using multi-layer neural networks. Engineering Optimization 35(1):91–102.
- IBISWorld (2018) IBISWorld. 2018. Global courier and delivery services - global market research report. goo.gl/h6fdWq.
- Jin (2005) Jin, Y. 2005. A comprehensive survey of fitness approximation in evolutionary computation. Soft computing 9(1):3–12.
- Keserla and Peters (1994) Keserla, A., and Peters, B. A. 1994. Analysis of dual-shuttle automated storage/retrieval systems. Journal of Manufacturing Systems 13(6):424–434.
- Liu et al. (2017) Liu, C.; Zoph, B.; Shlens, J.; Hua, W.; Li, L.-J.; Fei-Fei, L.; Yuille, A.; Huang, J.; and Murphy, K. 2017. Progressive neural architecture search. arXiv preprint arXiv:1712.00559.
- Oser (1996) Oser, J. 1996. Design and analysis of an automated transfer car storage and retrieval system. Progress in material handling research: 1996.
- Penn (2005) Penn, A. 2005. The complexity of the elementary interface: shopping space. In Proceedings to the 5th International Space Syntax Symposium. Akkelies van Nes.
- People’s Daily (2017) People’s Daily, C. 2017. Robots sorting system helps chinese company finish at least 200,000 packages a day in the warehouse. goo.gl/hLbYhV.
- Ramachandran, Zoph, and Le (2018) Ramachandran, P.; Zoph, B.; and Le, Q. V. 2018. Searching for activation functions.
- Ratle (1998) Ratle, A. 1998. Accelerating the convergence of evolutionary algorithms by fitness landscape approximation. In International Conference on Parallel Problem Solving from Nature, 87–96. Springer.
- Real et al. (2017) Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y. L.; Tan, J.; Le, Q.; and Kurakin, A. 2017. Large-scale evolution of image classifiers. ICML.
- Real et al. (2018) Real, E.; Aggarwal, A.; Huang, Y.; and Le, Q. V. 2018. Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548.
- Roll, Rosenblatt, and Kadosh (1989) Roll, Y.; Rosenblatt, M. J.; and Kadosh, D. 1989. Determining the size of a warehouse container. THE INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 27(10):1693–1704.
- Rouwenhorst et al. (2000) Rouwenhorst, B.; Reuter, B.; Stockrahm, V.; van Houtum, G.-J.; Mantel, R.; and Zijm, W. H. 2000. Warehouse design and control: Framework and literature review. European Journal of Operational Research 122(3):515–533.
- Silver (2005) Silver, D. 2005. Cooperative pathfinding. AIIDE 1:117–122.
- Togelius et al. (2011) Togelius, J.; Yannakakis, G. N.; Stanley, K. O.; and Browne, C. 2011. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games 3(3):172–186.
- Zhang et al. (2018) Zhang, H.; Wang, J.; Zhou, Z.; Zhang, W.; Wen, Y.; Yu, Y.; and Li, W. 2018. Learning to design games: Strategic environments in reinforcement learning. In IJCAI, 3068–3074.
- Zoph and Le (2016) Zoph, B., and Le, Q. V. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.