Bin packing problem (BPP) is a classical and important optimization problem in logistic system and production system. There are many variants of BPP, but the most meaningful and challenging one is 3D BPP, in which a number of cuboid-shaped items with different sizes should be packed into bins orthogonally. The size and cost of bins are fixed and known and the objective is to minimize the number of bins used, i.e., minimize the total cost. BPP is a typical and interesting combinatorial optimization problem and is NP-hard ( [Coffman et al.1980]), so it is a very popular research direction in optimization area. In addition, BPPs have many applications in practice. An effective bin packing algorithm means the reduction of computation time, total packing cost and increase in utilization of resources.
Because the cost of packing materials, which is mainly determined by their surface area, occupies the most part of packing cost, and we have found that in many real business scenarios there is no bin with fixed size (e.g., flexible and soft packing materials, not cartons or other bins, are used to pack items in cross-border e-commerce), so a new type of 3D BPP is proposed in our research. The objective of this new type of 3D BPP is to pack all items into a bin with minimized surface area.
Due to the difficulty of obtaining optimal solutions of BPPs, many researchers have proposed various approximation or heuristic algorithms. To achieve good results, heuristic algorithms have to be designed specifically for different type of problems or situations, so heuristic algorithms have limitation in generality. In recent years, artificial intelligence, especially deep reinforcement learning, has received intense research and achieved amazing results in many fields. In addition, DRL method has shown huge potential to solve combinatorial optimization problems ( [Vinyals et al.2015], [Bello et al.2016]). In this paper, a DRL-based method is applied to solve this new type of 3D BPP and numerical experiments based on real data are designed and conducted to demonstrate effectiveness of this method.
2 Related Work
2.1 3D bin packing problem
Bin packing problem is a classical and popular optimization problem. Since 1970s, it has attracted great interest of many researchers and some valuable achievements have been obtained. The two-dimensional BPP is NP-hard ( [Coffman et al.1980]), so as a generalization of 2D BPP, 3D BPP is strongly NP-hard. For this reason, a lot of research focuses on approximation algorithms and heuristic algorithms. [Scheithauer1991] proposed the first approximation algorithm for 3D BPP and investigated the performance bound of the algorithm. And many effective heuristic algorithms, such as Tabu Search ( [Lodi et al.2002], [Crainic et al.2009]), guided local search ( [Faroe et al.2003]), extreme point-based heuristics ( [Crainic et al.2008]
), hybrid genetic algorithm ([Kang et al.2012]), have been proposed. There are also some research about exact solution method for 3D BPP. [Chen et al.1995] considered a problem of loading containers with cartons of non-uniform size, which is a generalization of 3D BPP where bins may have different sizes, and a mixed integer programming model was developed to obtain optimal solutions. An exact branch-and-bound algorithm for 3D BPP was proposed in [Martello et al.2000] and many instances with up to 90 items can be solved to optimality within a reasonable time limit. Some variants of BPP from real world are also studied, such as variable size bin packing problem ( [Kang and Park2003]), bin packing problem with conflicts ( [Khanafer et al.2010], [Gendreau et al.2004]) and bin packing problem with fragile objects ( [Clautiaux et al.2014]).
Another class of packing problem, named strip packing problem, is also worth mentioning here, because it is very similar to our proposed problem. In the strip packing problem, a given set of cubiod-shaped items should be packed into a given strip orthogonally by minimizing the height of packing. The length and width of the strip is fixed and limited, and the height is infinite (for two dimensional strip packing problem, the width of strip is fixed and the length is infinite). This type of problem has many applications in steel industry and textile industry, and different types of algorithms have been proposed to solve the problem, such as exact algorithms in [Martello et al.2003] and [Kenmochi et al.2009], approximation algorithm in [Steinberg1997], heuristic algorithm in [Bortfeldt and Mack2007] and meta-heuristic algorithms in [Bortfeldt2006] and [Hopper and Turton2001].
2.2 DRL in combinatorial optimization
Even though machine learning and combinatorial optimization have been studied for decades respectively, there are few investigations about application of machine learning method in combinatorial optimization problems. One research direction is designing hyper-heuristics based on reinforcement learning (RL) ideas. An overview of hyper-heuristics is presented in[Burke et al.2013], in which some hyper-heuristics based on learning mechanism are discussed. In [Nareyek2003]
, the heuristics selection probability is updated based on non-stationary RL. In addition, various score updating methods have been proposed in the area of hyper-heuristics, such as binary exponential backoff ([Remde et al.2009]), tabu search ([Burke et al.2003]) and choice function ([Cowling et al.2000]).
Recent advances in sequence-to-sequence model ([Sutskever et al.2014]
) have motivated the research about neural combinatorial optimization. Attention mechanism, which is used to augment neural networks, contributes a lot in areas such as machine translation ([Bahdanau et al.2014]) and algorithm-learning ([Graves et al.2014]). In [Vinyals et al.2015]
, a neural network with a specific attention mechanism named Pointer Net was proposed and a supervised learning method is applied to solve the Traveling Salesman Problem.[Bello et al.2016] developed a neural combinatorial optimization framework with RL, and some classical problems, such as Traveling Salesman Problem and Knapsack Problem are solved in this framework. Because of the effectiveness and generality of the methodology proposed in [Bello et al.2016], our research is mainly based on their framework and methods.
3 Deep Reinforcement Learning Method for 3D Bin Packing Problem
3.1 Definition of the problem
In a typical 3D BPP, a set of items must be packed into fixed-sized bins in the way that minimizes the number of bins used. Unlike typical BPP with fixed-sized bins, we focus on the problem of designing the bin with least surface area that could pack all the items. In real business scenarios, such as cross-board e-commerce, no fixed-sized bin is available and flexible and soft materials are used to pack all the items. At the same time, the cost of a bin is directly proportional to its surface area. In this case, minimizing the surface area for the bin would bring great economic benefits.
The exact formulation of our problem is given below. Given a set of cuboid-shaped items and each item is characterized by length(), width() and height(). Our target is to find out the least surface area bin that could pack all items. We define as the left-bottom-back (LBB) coordinate of item and define as the left-bottom-back coordinate of the bin. The details of decision variables are shown in Table 1. Based on the descriptions of problem and notations, the mathematical formulation for the new type of 3D BPP is presented as follows:
where if item is in the left side of item , if item is under item , if item is in the back of item , if the orientation of item is front-up, if the orientation of item is front-down, if the orientation of item is side-up, if the orientation of item is side-down, if orientation of item is buttom-up, if orientation of item is buttom-down.
Constraints denote the length, width, height of item after orientating it. Constraints are used to guarantee there is no overlap between two packed items while constraints are used to guarantee the item will not be put outside the bin.
We have tried to solve the problem by optimization solvers, such as IBM Cplex Optimizer, but it is very difficult to solve in reasonable time limit and we will prove this problem is NP-hard in the appendix.
|Continuous||the length of the bin|
|Continuous||the width of the bin|
|Continuous||the height of the bin|
|Continuous||LBB coordinate of item in axis|
|Continuous||LBB coordinate of item in axis|
|Continuous||LBB coordinate of item in axis|
|Binary||item is in the left side of item or not|
|Binary||item is under item or not|
|Binary||item is in the back of item or not|
|Binary||orientation of item is front-up or not|
|Binary||orientation of item is front-down or not|
|Binary||orientation of item is side-up or not|
|Binary||orientation of item is side-down or not|
|Binary||orientation of item is buttom-up or not|
|Binary||orientation of item is buttom-down or not|
3.2 A DRL-based method
In this section, we will describe the DRL-based method to solve this new type of 3D BPP. Since solving it exactly is intractable, we use a constructive approach, i.e., packing items one by one in sequence. There are three class of decisions to make:
the sequence in which the items are packed into the bins.
item orientation to be put into the bin.
the strategy that selects an empty maximal space to put the item.
We design a heuristic algorithm to choose the sequence, orientation and empty maximal space. When putting an item, the algorithm will go over all empty maximal spaces and 6 orientations for this item and choose the empty maximal space and orientation that yields least surface area. After that, we will go over all the remaining items and identify one that will yield least waste space. The detailed algorithm is described in the appendix. In this paper, DRL is used to find better sequence to pack the items, other strategies for choosing item orientation and empty maximal space are the same as the heuristic mentioned above. In doing so, we are only demonstrating that DRL can be powerful in finding a better solution than well-designed heuristic. In our future work, we will investigate how to incorporate all of item sequence, orientation and empty maximal space choice into DRL framework.
3.2.1 Architecture of the network
In our research, the design of network architecture is inspired by the work of [Vinyals et al.2015] and [Bello et al.2016]. In their studies, a neural network architecture named Pointer Net (Ptr-Net) is proposed and used to solve some classical combinatorial optimization problems, such as Traveling Salesman Problem (TSP) and Knapsack Problem. For example, when solving TSP, the coordinates of points on two-dimensional plane are used as input to the model step by step, and the sequence in which points are visited is the predicted results. This architecture is similar to sequence-to-sequence model, which is proposed in [Sutskever et al.2014]
and is a powerful method in machine translation. There are two main differences between Ptr-Net and sequence-to-sequence model: first, the number of target classes in each step of the output in sequence-to-sequence model is fixed, but in Ptr-Net, the output dictionaries size is variable; second, the attention mechanism is used to blend hidden units of the encoder to a context vector in sequence-to-sequence model, but Ptr-Net use attention as a pointer to select a member of the input sequence as the output.
The neural network architecture in our research is shown in Figure 1. The input to this network is a sequence of size data (length, width and height) of items to be packed, and the output of this network is another sequence which represents the order we pack those items. The network consists two RNNs: an encoder network and a decoder network. At each step of encoder network, the size data (length, width and height) of one item are embedded and given as input to the LSTM cell and the cell output is collected. After the final step of the encoder network, the cell state and outputs are given to the decoder network. At each step of decoder network, one of the outputs of encoder network is selected as the input of the next step. For example, as show in Figure 1, the output of the 3rd step of decoder network is 4, so the output of the 4th step of encoder network is selected (pointed) and given as the input to the 4th step of the decoder network. And the attention mechanism and glimpse mechanism proposed in [Bello et al.2016] is also used to integrate the information of output of decoder cell and outputs of encoder network to predict which item will be selected in each step.
3.2.2 Policy-based reinforcement learning method
In this paper, reinforcement learning methodology is used to train the neural network. The input of network can be denoted as , where represents the length, width and height of the th item respectively. The output of network is the sequence in which the items are packed into the bin, which can be denoted as . And if the items are packed in this sequence, there will be a smallest bin that can pack all the items. We use the surface area (SA) of the bin to evaluate the sequence, and we use to denote the surface area. The stochastic policy of the neural network can be defined as , i.e., the probability of choosing sequence in which items are packed given a number of items . And the goal of training is to give high probabilities to sequences that correspond to small surface areas. To be more specific, we use to denote the parameters of the neural network, and the training objective is the expected surface area, which is defined as:
, a general class of associative reinforcement learning algorithms, called REINFORCE algorithms are proposed. These algorithms can make weight adujustments in a direction that lies along the gradient of expected reinforcement. Based on the ideas of these algorithms, in each step of training, if the reward, baseline value and probability distribution of prediction are obtained, then the parameters of the neural network,, is incremented by an amount
denotes the baseline value of surface area and is used to reduce the variance of the gradients. And if we randomly getsamples , then the above gradients can be approximated by:
3.2.3 Baseline iteration: memory replay
For a sample , the baseline value is initialized by calculating the surface area of a packing plan which is generated by a heuristic algorithm. And in each step, the baseline value is updated as:
where is the surface area calculated at each step.
3.2.4 Random sampling and beam search
For each sample, the output of neural network is randomly sampled based on the probability distribution given by the policy network during training stage. While in testing stage, the greedy strategy is applied, i.e., in each step, the prediction with maximal probability is selected as output. In addition, a beam search method is used in the testing procedure to enhance the performance of neural network, i.e., the predictions with top-k highest probability are selected and maintained in each step.
As a conclusion of the discussion above, the training procedure of the neural network can be shown in Algorithm 1.
To test the performance of the model, a series of experiments on real data are conducted. The experiments can be classified into three categories based on the number of items in one customer order, i.e., 8, 10 and 12. In all of the experiments, we use 150,000 train samples and 150,000 test samples. Despite the difference in item number, we use the same hyper-parameters to train the model. We use mini-batch of size 128 and LSTM cell with 128 hidden units. We train the model with Adam optimizer with initial learning rate ofand decay every 5000 steps by a factor of 0.96. All the parameters are initialized randomly in and clip L2 norm of our gradients to 1.0. We use the surface area calculated by heuristic algorithm as initial baseline input and apply
during the baseline iteration. We use 1000,000 steps to train the model and it will take about 12 hours for Tesla M40 GPU machine. When testing, we use beam search (BS) of size 3. Model implementation with TensorFlow will be available soon. The performance indicator is average surface area (ASA).
The results of testing are shown in Table 2. Using beam search (BS), the proposed method achieves , , improvement than heuristic algorithm for Bin8, Bin10 and Bin12. Optimal sequences for samples of Bin8 are obtained by exhaustive method, and the gap between results of heuristic algorithm and optimal solutions is about , which means that RL BS results are very close to optimal sequences.
|No. of bins||Random||Heuristic||RL Sampling||RL BS|
In this paper, a new type of 3D bin packing problem is proposed. Different from the classical 3D BPP, the objective of the new problem is to minimize the surface area of the smallest bin that can pack all items. Due to the complexity of the problem, it is very difficult to obtain optimal solution and heuristic algorithm may have the problem of lack of generality. Therefore, we apply the Pointer Net framework and and a DRL-based method to optimize the sequence of items to be packed. The model is trained and tested with a large number of real data. Numerical experiment results show that the DRL-based method outperforms a well-designed, effective heuristic algorithm significantly. Our main contributions include: firstly, a new type of 3D BPP is proposed; secondly, the DRL technique is firstly applied in solving bin packing problem. In the future research, we will focus on investigation of more effective network architecture and training algorithm. In addition, integrating the selection of orientation and empty maximal space into the architecture of neural network is also worthy of study.
We are extremely grateful to our colleagues, including Rong Jin, Shenghuo Zhu and Sen Yang from iDST of Alibaba, Qing Da, Shichen Liu, Yujing Hu from Search Business Unit of Alibaba, Lijun Zhu, Ying Zhang and Yujie Chen from AI Department of Cainiao, for their insight and expertise that greatly assisted the research and the presentation of the paper. We would also like to show our sincere appreciation to the support of Jun Yang and Siyu Wang from Alibaba Cloud in the implementation of DRL network in TensorFlow.
- [Bahdanau et al.2014] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- [Bello et al.2016] Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.
- [Bortfeldt and Mack2007] Andreas Bortfeldt and Daniel Mack. A heuristic for the three-dimensional strip packing problem. European Journal of Operational Research, 183(3):1267–1279, 2007.
- [Bortfeldt2006] Andreas Bortfeldt. A genetic algorithm for the two-dimensional strip packing problem with rectangular pieces. European Journal of Operational Research, 172(3):814–837, 2006.
- [Burke et al.2003] Edmund K Burke, Graham Kendall, and Eric Soubeiga. A tabu-search hyperheuristic for timetabling and rostering. Journal of heuristics, 9(6):451–470, 2003.
- [Burke et al.2013] Edmund K Burke, Michel Gendreau, Matthew Hyde, Graham Kendall, Gabriela Ochoa, Ender Özcan, and Rong Qu. Hyper-heuristics: A survey of the state of the art. Journal of the Operational Research Society, 64(12):1695–1724, 2013.
- [Chen et al.1995] CS Chen, Shen-Ming Lee, and QS Shen. An analytical model for the container loading problem. European Journal of Operational Research, 80(1):68–76, 1995.
- [Clautiaux et al.2014] François Clautiaux, Mauro Dell’Amico, Manuel Iori, and Ali Khanafer. Lower and upper bounds for the bin packing problem with fragile objects. Discrete Applied Mathematics, 163:73–86, 2014.
- [Coffman et al.1980] Edward G Coffman, Jr, Michael R Garey, David S Johnson, and Robert Endre Tarjan. Performance bounds for level-oriented two-dimensional packing algorithms. SIAM Journal on Computing, 9(4):808–826, 1980.
- [Cowling et al.2000] Peter Cowling, Graham Kendall, and Eric Soubeiga. A hyperheuristic approach to scheduling a sales summit. In International Conference on the Practice and Theory of Automated Timetabling, pages 176–190. Springer, 2000.
- [Crainic et al.2008] Teodor Gabriel Crainic, Guido Perboli, and Roberto Tadei. Extreme point-based heuristics for three-dimensional bin packing. Informs Journal on computing, 20(3):368–384, 2008.
- [Crainic et al.2009] Teodor Gabriel Crainic, Guido Perboli, and Roberto Tadei. Ts 2 pack: A two-level tabu search for the three-dimensional bin packing problem. European Journal of Operational Research, 195(3):744–760, 2009.
- [Faroe et al.2003] Oluf Faroe, David Pisinger, and Martin Zachariasen. Guided local search for the three-dimensional bin-packing problem. Informs journal on computing, 15(3):267–283, 2003.
- [Gendreau et al.2004] Michel Gendreau, Gilbert Laporte, and Frédéric Semet. Heuristics and lower bounds for the bin packing problem with conflicts. Computers & Operations Research, 31(3):347–358, 2004.
- [Graves et al.2014] Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
- [Hopper and Turton2001] Eva Hopper and Brian CH Turton. A review of the application of meta-heuristic algorithms to 2d strip packing problems. Artificial Intelligence Review, 16(4):257–300, 2001.
- [Kang and Park2003] Jangha Kang and Sungsoo Park. Algorithms for the variable sized bin packing problem. European Journal of Operational Research, 147(2):365–372, 2003.
- [Kang et al.2012] Kyungdaw Kang, Ilkyeong Moon, and Hongfeng Wang. A hybrid genetic algorithm with a new packing strategy for the three-dimensional bin packing problem. Applied Mathematics and Computation, 219(3):1287–1299, 2012.
- [Kenmochi et al.2009] Mitsutoshi Kenmochi, Takashi Imamichi, Koji Nonobe, Mutsunori Yagiura, and Hiroshi Nagamochi. Exact algorithms for the two-dimensional strip packing problem with and without rotations. European Journal of Operational Research, 198(1):73–83, 2009.
- [Khanafer et al.2010] Ali Khanafer, François Clautiaux, and El-Ghazali Talbi. New lower bounds for bin packing problems with conflicts. European journal of operational research, 206(2):281–288, 2010.
- [Lodi et al.2002] Andrea Lodi, Silvano Martello, and Daniele Vigo. Heuristic algorithms for the three-dimensional bin packing problem. European Journal of Operational Research, 141(2):410–420, 2002.
- [Martello et al.2000] Silvano Martello, David Pisinger, and Daniele Vigo. The three-dimensional bin packing problem. Operations Research, 48(2):256–267, 2000.
- [Martello et al.2003] Silvano Martello, Michele Monaci, and Daniele Vigo. An exact approach to the strip-packing problem. INFORMS Journal on Computing, 15(3):310–319, 2003.
- [Nareyek2003] Alexander Nareyek. Choosing search heuristics by non-stationary reinforcement learning. In Metaheuristics: Computer decision-making, pages 523–544. Springer, 2003.
[Remde et al.2009]
Stephen Remde, Keshav Dahal, Peter Cowling, and Nic Colledge.
Binary exponential back off for tabu tenure in hyperheuristics.
European Conference on Evolutionary Computation in Combinatorial Optimization, pages 109–120. Springer, 2009.
- [Scheithauer1991] Guntram Scheithauer. A three-dimensional bin packing algorithm. Elektronische Informationsverarbeitung und Kybernetik, 27(5/6):263–271, 1991.
- [Steinberg1997] A Steinberg. A strip-packing algorithm with absolute performance bound 2. SIAM Journal on Computing, 26(2):401–409, 1997.
- [Sutskever et al.2014] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014.
- [Vinyals et al.2015] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700, 2015.
- [Williams1992] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
Appendix A 3D Bin Packing Heuristic Algorithm
The detailed 3D bin packing heuristic algorithm is:
The heuristic algorithm uses both least surface area heuristic and least waste space heuristic while our DRL method only uses least surface area heuristic.
Appendix B NP-hardness of New Type of 3D BPP
The new type of 3D BPP proposed in this paper is NP-hard.
Proof: First of all, we will prove the new type of 2D BPP is NP-hard. To show it is NP-hard, we will give a reduction of 1D Bin Packing Problem.
Given a one-dimensional bin packing problem, it consists of items with integer size and bins with integer capacity . The objective is to minimize the number of bins used to pack all items.
To convert it into new type of 2D BPP, we assume that there are items with width and height . And there is also a item with width and height , which is called as Base Item. The new type of 2D BPP problem is to find the bin with least surface area to pack the generated items.
Without loss of generality, we assume the Base Item is on the left-buttom of the bin. Adding one item on the right side of the Base Item yields the total surface area is increased by at least . At the same time, even if all the items are added on the upper side of the Base Item, the total increased area is at most . Thus, all the items will be put on the upper side of the Base Item.
Next, we will prove the length and width of the item will not be reversed. If reversing one item with width and length, the increased area is at least for this item. However, the increased area is at most for all items if no item is reversed.
If we can find out a bin with least surface area to pack this items, we find out the least number of bins of capacity that can contain items of size . Therefore, if we can solve the new type of 2D BPP in polynomial time, the one-dimensional bin packing problem can be solved in polynomial time, which completes the proof that this new type of 2D BPP is NP-hard unless P = NP.
For the new type of 3D bin packing problem, we will add length for each item in the 2D case, which will ensure no item will be added on the length side. Proof is the same as the 2D case.