1 Introduction
AI planning techniques generally require domain experts to provide background knowledge about the dynamics of the planning domains. But Specifying a complete domain model is timeconsuming, which has been a bottleneck of AI planning technique application in many realworld scenarios. Taking an example of arranging production lines in a smart factory, there are a vast number of actions and predicates, it is difficult for humans to design an appropriate domain model that covers all actions.
However, most traditional domainmodel learning approaches output a domain model in a kind of declarative planning language, such as STRIPS or PDDL, where the precondition and effects of actions are given in a declarative way. With the learned domain models, a planner for the planning language is invoked to compute a plan to new planning problems. But whether a plan can be found is sensitive to the accuracy of the learned domain model. Once some critical effect of an action is not learned correctly, the error will accelerate with the plan growing, which finally leads that there is no plan to the goal computed. One promising way is to keep away from learning domain models in the declarative language and to find a new representation to learn and then compute plans. The new planning representation requires to satisfy at least the two following conditions: the state can be represented correctly; there is an effective way to compute a plan. The former allows to represent a new planning instance and the latter is supposed to be efficient as possible, which requires a suitable heuristic function in the forward search planning.
Inspired by word embedding (Mikolov et al., 2013)
and knowledge graph embedding
(Bordes et al., 2013)which have shown great success in natural language process and knowledge graphs, it is constructive to represent propositions, states, and actions in the form of vectors. To capture the relationship between propositions and states, we consider them jointly as vertexes with a realnumber attribute vector in a graph where the interpretation of propositions in a state is captured by the directed edges. In this paper, we propose a novel learning and planning framework based on graph neural network (GNN), called
LPGNN , taking the meaning of Learning to Plan based on GNN. LPGNN integrates modelfree learning from partially observed traces and modelbased planning based on propositionstate graphs. Due to the representation way of propositionstate graphs, a new state which has not occurred in all the plan traces can be denoted. It provides the possibility to generalize the planning system to handle new planning instances.To improve the performance of planning, researchers in the planning community have proposed a number of heuristics, such as relaxed planning graph heuristics (Hoffmann and Nebel, 2001), heuristics (Bonet and Geffner, 2001), pattern database heuristics (Edelkamp, 2002), etc. It suggests to choose an appropriate heuristic function for specific domains. Based on propositionstate graphs, the relationship between states and actions are captured naturally, which may help us to find an appropriate heuristic function to guide planning. Therefore, we propose an approach based on MLP to learn heuristic to guide selecting actions towards the goal state. To evaluate the learning and planning performances, we compare LPGNN with the classical domain model learning system ARMS (Yang et al., 2007) on five wellknown planning domains and show that LPGNN outperforms ARMS and more robust on solving real planning problems.
2 Background
Now we follow the notions of (Francès et al., 2017) about classical planning. We consider a set of propositions and consider a state is a subset of where the interpretation of proposition is given by the inclusion relation. In other words, if , it means the proposition is true in the state ; otherwise, is false in . A classical planning problem is given as a tuple where is a set of states, is an initial state, is a goal state set, is an action set, is an applicable function, and is a statetransition function. Intuitively, indicates the actions applicable in state and represents the state resulting from performing the action in the state . A domain model is a tuple and a planning instance is a tuple . The solution to a classical planning problem is a plan which is an action sequence satisfying that there exists a state sequence such that , and .
A plan executed on a planning instance yields a plan trace, which is an alternating sequence of states and actions . We suppose the initial state and goal state are fully observed and the intermediate states are not. Formally, a partially observed plan traces is a sequence , where . Note that a proposition not in means to be either false or unobserved.
We say a domain model interprets a partially observed trace if is a plan of the classical planning problem and the yielded plan trace satisfies that , .
A domainmodel learning problem is a tuple where is a set of partially observed plan traces. A solution to the problem is a domain model which interprets all plan traces in .
3 An Overview of Our Approach
In this section, we give an overview of our approach LPGNN, which is based on GNN Battaglia et al. (2018). The framework consists of two modules: the first one is the learning module which takes partially observed traces as input and outputs a domain model based on GNN and heuristics function; the second one is the planning module based on the learned heuristic function.
As states are not totally known in the partially observed traces, the hidden parts are required to be estimated in order to learn the domain model. The estimation task is accomplished via a recurrent framework of graph network with a cost function evaluating the difference between estimated states and observed states. When the framework converges, it outputs a set of sequences of completed states and every state is represented as a unique vector. Also, it finally returns the vectorization representation of actions and propositions. In such a representation, a domain model which almost interprets all partially observed traces is learned.
The vectorization representation learned in the learning module provides a way to learn a heuristic function via an MLP. Every pair of states orderly occurring in the estimated plan traces make up of a training set for an action selection network. The action selection network is trained to return the actions executed in the former state in every pair, which are considered as appropriate actions towards the latter state. The heuristic function is obtained via computing the distances to the appropriate actions. Then, the heuristic function learned helps to choose a suitable action towards the goal state in the current state during planning.
4 Domainmodel learning
In this section, we propose a sequencetosequence domainmodel learning framework based on GNN (Battaglia et al., 2018) to handle plan traces, which are in the form of sequence.
To avoid ambiguity, we use the bold type for vectors: stand for the vectors of the proposition , the state and the action , respectively, and stands for the set of proposition vectors. For a unified representation, we consider they all are dimension realnumber vectors.
We define a propositionstate graph is a tuple where is a set of vertexes and is a set of directed edges from every proposition vertex to the state vertex, and is an action vector with the meaning that action will be executed in state . Every vertex is equipped with an attribute of real numbers, which is considered as their vectorization representation. Every edge has a boolean attribute , which captures the interpretation of in , which has a boolean attribute. Formally, if ; otherwise, .
4.1 Updating Propositionstate Graphs
In the start of the learning phase, the proposition vectors and the action vectors are first initialized randomly from uniform distributions. For every partially observed plan trace, we initialize the propositionstate graph
by assigning the edge attributes according to the initial state .As every state can be represented by propositions and their interpretations, in a propositionstate graph, the unique state vector is obtained via the vectors of the propositions and the edge attributes. Formally, for a state , we use a state update function to get its vector :
(1) 
where is the set of the edge attributes. Obviously, the state vector is determined by the proposition vectors and their interpretation in the state, resulting in that a state uniquely corresponds to a state vector. To learn the state update function , we use an MLP which takes the concatenation of the proposition vectors and the edge attributes as input.
To formalize the progression of a state caused by an action execution, we define that a propositionstate graph is updated by applying the action vector . In our sequencetosequence framework, the sequential actions in the input plan trace lead that the propositionstate graph is updated continuously.
In the propositionstate graph, the action vector first changes the edge attributes. Formally, the edge from to , we use an edge update function to obtain its estimated probability :
(2) 
By concatenating all the edges, we generalize the edge update function into the edge set:
(3) 
Similarly, to learn the function
, we use an MLP which ends with a sigmoid function and outputs an estimated probability
for every edge attribute . To keep the consistency with the interpretation of propositions, the estimated probability needs to be decoded to the boolean edge attribute by the decoder function .4.2 Learning Applicable and Statetransition Functions
The change of the edge attributes directly cause the change of the state vector via the state update function . Then we define a statetransition function for the state vector and the action vector:
(4)  
According to equations (1) and (3), once all proposition vectors are learned, then will not change and are determined by the state vector and the action vector . In other words, the next state is uniquely determined by the current state and the action executed.
A partially observed plan trace yields a sequence of propositionstate graphs by replacing the actions. Formally, for a plan trace , we use for , to denote the th propositionstate graphs in the corresponding sequence. Then every edge attribute set stands for an estimated state in the sequence. To train the functions
and the vectors of propositions and actions correctly, we define a loss function to evaluate the differences between the estimated states and the observed states. The estimation on the propositions in a state can be considered as a logistic regression problem for the observed propositions in the state, which suggests us to employ the crossentropy loss function.
After prorogating the gradient descent to the functions and the vectors of propositions and actions, when the loss function converges, the statetransition function is learned.
To learn the applicable function , for every action , we consider the intersection of the estimated states in which action is executed as the precondition of action , denoted by . Then for every state , we define its applicable action set as . From a safety perspective, the actions never occurring in the input plan traces are not considered as applicable in any state.
When the model converges, the functions and the proposition vectors and all action vectors are learned and fixed. We can bridge every state and its vector uniquely, i.e., . Based on the embedding of propositions, states, and actions, we represent a planning problem as a tuple where is a set of state vectors, is the initial state vector, , is a set of action vectors and are the applicable and statetransition function, respectively.
5 Planning with Heuristics
5.1 Learning Heuristics Function
The heuristic function plays an important role in the forwardsearch planning techniques, which helps the planner to select suitable actions towards the goal state. A suitable heuristic function may speed up the problemsolving significantly. With various heuristic functions being proposed, there is not an approach to choose suitable heuristic function automatically for different planning domains. For that, we propose an approach to learn the heuristic function based on the embedding of states and actions.
Given a set of fully observed plan traces we define the action selection function such that where and is the action executed in the state in some . As the same state pair may occur in different traces, there are more than one actions executed in the former state .
With the embedding of states, we generate tuples where as training examples from the estimated trace set obtained from the learning module. Actually, it is a multilabel learning task (Zhang and Zhou, 2014). Then we construct an MLP which takes the concatenation of two state vectors as input and outputs a list of recommendation confidences for every action. We train the network to minimize the sigmoid crossentropy loss between the recommendation confidences and the action labels .
Consider the latter state as the goal state, the action selection function provides a set of recommended actions to lead towards the goal state. For the current state and the goal state , we define a goaldriven heuristic function for every action as its recommendation confidence output by .
5.2 Planning with Heuristics Learned
Based on the learned domain model, we propose a progressionbased algorithm to compute a plan for the planning instance , as shown in Algorithm 1. To implement the backtracking, we use a list to record the visited state with the action executed in it and use a list to record the visited state with the plan executed until it. We first set the current state as the initial state and initialize the two lists to be empty. Then we start to find a plan via selecting a goal state from the goal state set . By selecting actions to execute repeatedly, once it reaches one of goal states, the algorithm finds a plan (line 1112). Observe that the action selection function outputs an action set with at most three actions, at every step we choose one of the top 3 recommended actions which are also applicable in the current state (line 5). Formally, we use to denote the set of actions with the three highest recommendation confidences in . Once an action is executed, we update the current state, the current plan and the two lists and (line 712). When all applicable actions in have been visited, the algorithm have to backtrack to the last state via a Pop operator on (line 1516). The current plan should be regressed by removing its last action, which is done via a Regress operator (line 17). Once the list becomes empty again, it means every possible recommended action sequence cannot achieve the selected goal state and it needs to choose another goal state (line 1314). When every goal state are tried and no plan is found, the algorithm returns failure.
6 Experiment
We apply LPGNN on five classical planning domains including Logistics, ZenoTravel, Depots, Ferry and Mprime, and compare LPGNN to the classical domainmodel learning system ARMS(Yang et al., 2007) which invokes a MAXSAT solver. LPGNN
is implemented in Tensorflow and GNN framework
^{1}^{1}1https://github.com/deepmind/graph_nets, and it takes approximately three hours to train on a single GPU GeForce RTX 2080 Ti^{2}^{2}2Due to the size limit on the supplemental materials, the omitted data, code, and supporting materials are available on(https://tinyurl.com/NeurIPS19231)..6.1 Data

proposition  action 

Logistics 
137  150 
Depot  110  115 
ZenoTravel  131  279 
Ferry  99  126 
Mprime  216  791 
We first get the problem generators from FF planner homepage^{3}^{3}3http://fai.cs.unisaarland.de/hoffmann/ffdomains.html(for Logistics, Ferry, Mprime) and International Planning Competition website^{4}^{4}4http://ipc02.icapsconference.org/ (for Depots, ZenoTravel). We randomly generate 2100 disjoint planning instances for each domain and take 2000 instances as the training set and 100 instances as the testing set. Table 1
shows the upper bound of the number of propositions and actions in each domain. For the plan traces with fewer propositions, we use a 0padding method. By invoking FF planner, we generate a plan for each planning instances and further obtain 2100 plan traces. To capture the partial observation, we randomly remove the propositions according to the partial observation percentages (0%, 20%, 40%, 60%, 80%, 100%) from every intermediate state in the plan traces.
6.2 Training Details
The training phase is divided into two parts in LPGNN . First, we trained the sequencetosequence model to acquire a domain model with the functions
and the vectorization representations of propositions and actions. These functions are designed as a twolayer MLP respectively and each layer has 100 neurons and the layer normalization. Second, we train an action selection network
on the estimated plan traces, which is designed as threelayer network with the layer normalization and has 150 neurons at each hidden layer.The action vectors and proposition vectors are represented as 100dimensional vector (). The action vector and proposition vector are initialized uniformly and randomly within the range [0.6, 0.6] (Glorot and Bengio, 2010) (). We train our model using the Adam optimizer (Kingma and Ba, 2015) with a batch in size of 20 and an initial learning rate of .
6.3 Metrics
Learning Performance Metrics
. The learning performance of our approach is measured with the precision and recall metrics, by comparing the estimated state sequences with the real ones in the testing set.
Intuitively, precision gives a notion of soundness while recall gives a notion of the completeness of the estimated state sequences. We use to denote the propositions both in the real and estimated state, to denote the propositions in neither the real state nor the estimated state, to denote the propositions not in the real state but in the estimated state and to denote the propositions in the real state but not in the estimated state. Then for an estimated state, we compute its precision by and its recall by . To evaluate the estimation performances of the learning approaches on the testing set, we generalize these two metrics into state sequence sets by computing their average precision and recall for every state in every sequence.
Planning Performance Metric. As we mention before, domain models in the declarative language are sensitive to their accuracy. Even though the learned domain models interpret the partially observed plan traces perfectly, it is possible that they cannot be used to solve the real planning problems. It is more important to evaluate the domainmodel learning approaches on the ability of solving real problems. More specially, for the learned domain model, we generate plans under it for the planning instances in the testing set and test whether these plans are solutions to these planning instances under the original domain models. If so, the testing instance is considered as solved by the learned domain model. Then we introduce a metric as the percentages of solved instances on all testing instances, i.e., .
6.4 Results
Domain  0%  20%  40%  
LPGNN  ARMS  LPGNN  ARMS  LPGNN  ARMS  
P(%)  R(%)  P(%)  R(%)  P(%)  R(%)  P(%)  R(%)  P(%)  R(%)  P(%)  R(%)  
Logistics  87.09  66.33  88.39  46.68  99.37  98.64  90.21  100  99.53  99.34  90.21  100 
ZenoTravel  82.71  61.3  95.19  68.08  99.11  98.69  100  95.96  99.77  99.75  100  95.96 
Depot  83.14  82.08  95.62  71.85  98.65  99.74  93.46  92.93  98.49  99.88  93.46  92.93 
Mprime  91.21  67.13  97.09  61.85  92.62  88.84  91.29  99.91  95.21  93.38  90.3  99.91 
Ferry  96.91  79.51  96.42  66.43  99.98  99.81  98.58  100  100  99.84  98.58  100 
Domain  60%  80%  100%  
LPGNN  ARMS  LPGNN  ARMS  LPGNN  ARMS  
P(%)  R(%)  P(%)  R(%)  P(%)  R(%)  P(%)  R(%)  P(%)  R(%)  P(%)  R(%)  
Logistics  99.88  98.96  90.21  100  99.85  99.5  90.21  100  99.99  99.83  100  100 
ZenoTravel  99.85  99.79  100  95.96  99.9  99.9  100  95.96  99.83  99.79  100  95.96 
Depot  98.64  99.96  93.46  92.93  98.21  99.89  93.46  92.93  99.97  99.97  93.46  92.93 
Mprime  97.51  96.08  91.29  99.91  98.16  95.89  91.29  99.91  98.29  96.49  97.58  99.98 
Ferry  100  99.84  100  99.06  100  99.83  99.22  100  100  100  100  100 
P = the average precision, R = the average recall, 0%, 20%, 40%, 60%, 80%, 100% are observation percentages.
Table 2 shows the learning performances of our approach LPGNN and ARMS on the testing set. With the observation percentage increasing, both the approaches have better and better performances on estimating states. The results show that LPGNN and ARMS are comparable on the learning performance. In LPGNN , the loss on the training set are various observation percentages less than , which means that the learned domain models almost interpret all training plan traces and can be considered as solutions to the learning problems. While in ARMS, all plan traces are interpreted and it is because it is based on a MAXSAT solver.
RF are our approaches with replacing action selection MLP by SVM and Random Forest. Instances solved are the testing instances which are solved under the original domain model by the plans computed according to the learned domain model.
To evaluate the real problem solving ability, we compare our approach LPGNN against ARMS on the percentages of instance solved on the testing set, whose experimental results are shown in Figure 2. As ARMS outputs domain models in STRIPS, we call FF planer to generate plans. Obviously, our approach significantly outperforms ARMS on the ability of solving real problems and the domain model learned by ARMS fails to solve any real problems except for the ZenoTravel domain.
For the model ablation, in LPGNN we replace the action selection MLP by SVM and Random Forest, and modify the GNNPlan planning algorithm accordingly. We evaluate the effectiveness, on the real planning problems, of plans computed by these three algorithms under the same learned domain models. The results show that learning action selection policy via MLP outperforms other two approaches. Actually, for the solved instances, the GNNPlan planning algorithm with MLP almost generates plans identical with the plans generated by the original domain model using FF planner, which shows the excellent ability on guiding planning of our heuristics learning approach.
6.5 Analysis
The reason why the learned domain models by ARMS hardly solve any real problems should be rooted in the fact that the plansearching is extremely sensitive to its accuracy in the declarative language. From the experimental results, we observe that in Logistics domain, the proposition ‘(airport ?location)’ is learned as an effect of the action ‘(loadtruck ?object ?truck ?location)’ in the domain model learned by ARMS. Once the action is executed, the city center where the package is loaded into the truck becomes an airport, resulting in that the airplane can fly to the city center. So, the plan including the action that the plane flies to the city center is generated via the FF planner, but it is not allowed in the artificial domain model. Then, mostly the plans generated under such a learned domain model are ineffective and no instances are solved.
The failure of the plans generated by LPGNN on some instances is blamed for the precondition learned. Because actions are not sufficiently occurred in the plan traces, if we consider the intersection of the false propositions into the action precondition, it will make the precondition too strong to be satisfied by other states. So, we only focus on the true propositions which, on the other hand, is too weak so that the planning algorithm may execute a unapplicable action.
7 Related Work
Domainmodel learning has been obtained a lot of attention and there exist a number of approaches (Arora et al., 2018). In this paper, we focus on the learning approaches which return domain models in a declarative language, such as PDDL and its fragments. LOCM (Cresswell et al., 2013) and its successor LOCM2 (Cresswell and Gregory, 2011) learn the objectcentered representation based on a set of parameterized finite state machines. But these two approaches only can learn action effects on dynamic predicates and fail to handle static predicates which do not change due to action executions. NLOCM (Gregory and Lindsay, 2016) extends finite state machines with numeric weights to learn action costs. PELA (Martínez et al., 2016)
refines the input domain model based on topdown induction of decision trees but assume the input domain model to be correct. OBSERVER
(Wang, 1994) is an incrementally learning system which refines the learned domain model by observing the execution traces for the sampled problems. Whereas, its performance is sensitive to the sampled problems and it may suffer from the incomplete or incorrect domain knowledge. LAMP (Zhuo et al., 2010) is a framework to learn more complex domain models with quantifiers and logical implication. (Aineto et al., 2018) proposes an approach to compile the learning problem into a classical planning problem, which may suffer from a scale issue.Another related work is (Mourão et al., 2010)
, which considers actioneffect learning problems as classifier problems and proposes a learning approach based on a bank of kernel perceptrons. But it only learns action effects and needs a good number of training examples for good performance.
Some approaches require a fully observed environment where we consider a partially observed one. LOPE (GarcíaMartínez and Borrajo, 2000)
learns domain models in STRIPS by repeatedly executing actions based on reinforcement learning.
(Stern and Juba, 2017) provides a safe domainmodel learning approach which guarantees the output domain model to generate safe plans.There are learning approaches taking noisy plan traces as input which suppose that the input actions may be incorrect. AMAN (Zhuo and Kambhampati, 2013) learns domain models from noisy planning traces via probabilistic graphical models and reinforcement learning. The line of works by Pasula et al. [2004; 2007] focus on learning STRIPSlike planning rules via adding noisy outcome in their probabilistic model but fail to handle incomplete observations.
As we mention before, ARMS (Yang et al., 2007)
is one of the most classical domainmodel learning approaches which have inspired a series of learning approaches. For example, from the perspective of transfer learning, LAWS
(Zhuo et al., 2011b) takes other domain models into account and measures the similarity between the source domains and the target domain via web searching. For another example, Lammas (Zhuo et al., 2011a) learns multiagent domain models by constructing constraints about agent actions and invoking a MAXSAT solver. Besides, CAMA (Zhuo, 2015) integrates intelligence of crowds into actionmodel acquisition based on a MAXSAT solver. Later (Zhuo et al., 2014) proposed a learning system HTNLearner to learn hierarchical task network planning domain models based on a weighted MAXSAT solver.Other domainmodel learning approaches also concentrate on various inputs. TRAMP (Zhuo and Yang, 2014) and tLAMP (Zhuo et al., 2008) use the transfer learning technique and require other domains as inputs, as well as LAWS. LatPlan (Asai and Fukunaga, 2018) proposes an approach to learn action models from fully observed images.
Approaches  Input  LimitationsFeatures  

LOCM, LOCM2  Action sequences  Only handle dynamic predicates  
NLOCM  Action sequences with costs  Can learn action costs  
PELA 



OBSERVER 



LOPE  repeated action executions  Requires FO environment  
(Stern and Juba, 2017)  FO plan traces  Requires FO plan traces  
LAMP  PO plan traces  learns ADL domain models  
(Aineto et al., 2018)  compiles into a planning problem  
(Mourão et al., 2010) 


AMAN  Noisy plan traces  No background knowledge needed  
ARMS  plan traces  Call a MAXSAT solver  
Lammas  multiagent plan traces  
CAMA 


LAWS,TRAMP,tLAMP  Plan traces and other domains  Use the transfer learning technique  
LatPlan  Action sequences and images  Requires FO images  
8 Discussion and Conclusion
Similar with the perspective of (Stern and Juba, 2017) on the safety of the plans generated by the learned domain model, in this paper we focus on the effectiveness on the real problems of the plans. It motivates us to find another way to model domain models which is distinct from the classical declarative language. Indeed, we aim to learn the vectorization representation of actions, states and propositions in GNN, which actually provides an interpretation for state changes caused by action executions. By embedding propositions and actions in a graph, the latent relationship between them is explored to form a domainspecific heuristics. Its excellent strength on guiding planning has been demonstrated by the experiment results and we believe that it opens a line of future work on learning domainspecific heuristic functions.
To sum up, we propose a novel approach LPGNN to learn the domain model based on GNN from a set of partially observed plan traces. We first learn the vectorization representations of propositions, states, and actions by putting them into a propositionstate graph. The representation in the propositionstate graph allows us to denote new states in the domain and further enables us to solve new planning instances. Finally, we propose a more robust planning framework equipped with a domainspecific heuristic function, which is demonstrated to be more effective on solving real planning problems.
References
 Aineto et al. [2018] Diego Aineto, Sergio Jiménez, and Eva Onaindia. Learning STRIPS action models with classical planning. In Proceedings of the TwentyEighth International Conference on Automated Planning and Scheduling, ICAPS 2018, Delft, The Netherlands, June 2429, 2018., pages 399–407, 2018.
 Arora et al. [2018] Ankuj Arora, Humbert Fiorino, Damien Pellier, Marc Métivier, and Sylvie Pesty. A review of learning planning action models. Knowledge Eng. Review, 33:1–25, 2018.

Asai and
Fukunaga [2018]
Masataro Asai and Alex Fukunaga.
Classical planning in deep latent space: Bridging the
subsymbolicsymbolic boundary.
In
Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI18), New Orleans, Louisiana, USA, February 27, 2018
, pages 6094–6101, 2018.  Battaglia et al. [2018] Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro SanchezGonzalez, Vinícius Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Çaglar Gülçehre, Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matthew Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261, 2018.
 Bonet and Geffner [2001] Blai Bonet and Hector Geffner. Planning as heuristic search. Artif. Intell., 129(12):5–33, 2001.
 Bordes et al. [2013] Antoine Bordes, Nicolas Usunier, Alberto GarcíaDurán, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multirelational data. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS13), pages 2787–2795, 2013.
 Cresswell and Gregory [2011] Stephen Cresswell and Peter Gregory. Generalised domain model acquisition from action traces. In Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 1116, 2011, 2011.
 Cresswell et al. [2013] Stephen Cresswell, Thomas Leo McCluskey, and Margaret Mary West. Acquiring planning domain models using LOCM. Knowledge Eng. Review, 28(2):195–213, 2013.
 Edelkamp [2002] Stefan Edelkamp. Symbolic pattern databases in heuristic search planning. In Proceedings of the 6th International Conference on Artificial Intelligence Planning Systems, pages 274–283, 2002.
 Francès et al. [2017] Guillem Francès, Miquel Ramírez, Nir Lipovetzky, and Hector Geffner. Purely declarative action descriptions are overrated: Classical planning with simulators. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI17), pages 4294–4301, 2017.
 GarcíaMartínez and Borrajo [2000] Ramón GarcíaMartínez and Daniel Borrajo. An integrated approach of learning, planning, and execution. Journal of Intelligent and Robotic Systems, 29(1):47–78, 2000.
 Glorot and Bengio [2010] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 1315, 2010, pages 249–256, 2010.
 Gregory and Lindsay [2016] Peter Gregory and Alan Lindsay. Domain model acquisition in domains with action costs. In Proceedings of the TwentySixth International Conference on Automated Planning and Scheduling, ICAPS 2016, London, UK, June 1217, 2016., pages 149–157, 2016.
 Hoffmann and Nebel [2001] Jörg Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuristic search. J. Artif. Intell. Res., 14:253–302, 2001.
 Kingma and Ba [2015] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 79, 2015, Conference Track Proceedings, 2015.
 Martínez et al. [2016] David Martínez, Guillem Alenyà, Carme Torras, Tony Ribeiro, and Katsumi Inoue. Learning relational dynamics of stochastic domains for planning. In Proceedings of the TwentySixth International Conference on Automated Planning and Scheduling, ICAPS 2016, London, UK, June 1217, 2016., pages 235–243, 2016.
 Mikolov et al. [2013] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
 Mourão et al. [2010] Kira Mourão, Ronald P. A. Petrick, and Mark Steedman. Learning action effects in partially observable domains. In ECAI 2010  19th European Conference on Artificial Intelligence, Lisbon, Portugal, August 1620, 2010, Proceedings, pages 973–974, 2010.
 Pasula et al. [2004] Hanna Pasula, Luke S. Zettlemoyer, and Leslie Pack Kaelbling. Learning probabilistic relational planning rules. In Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS 2004), June 37 2004, Whistler, British Columbia, Canada, pages 73–82, 2004.
 Pasula et al. [2007] Hanna M. Pasula, Luke S. Zettlemoyer, and Leslie Pack Kaelbling. Learning symbolic models of stochastic domains. J. Artif. Intell. Res., 29:309–352, 2007.
 Stern and Juba [2017] Roni Stern and Brendan Juba. Efficient, safe, and probably approximately complete learning of action models. In Proceedings of the TwentySixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 1925, 2017, pages 4405–4411, 2017.
 Wang [1994] Xuemei Wang. Learning by observation and practice: A framework for automatic acquisition of planning operators. In Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, USA, July 31  August 4, 1994, Volume 2., page 1496, 1994.
 Yang et al. [2007] Qiang Yang, Kangheng Wu, and Yunfei Jiang. Learning action models from plan examples using weighted MAXSAT. Artif. Intell., 171(23):107–143, 2007.
 Zhang and Zhou [2014] MinLing Zhang and ZhiHua Zhou. A review on multilabel learning algorithms. IEEE transactions on knowledge and data engineering, 26(8):1819–1837, 2014.
 Zhuo and Kambhampati [2013] Hankz Hankui Zhuo and Subbarao Kambhampati. Actionmodel acquisition from noisy plan traces. In IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 39, 2013, pages 2444–2450, 2013.
 Zhuo and Yang [2014] Hankz Hankui Zhuo and Qiang Yang. Actionmodel acquisition for planning via transfer learning. Artif. Intell., 212:80–103, 2014.
 Zhuo et al. [2008] Hankui Zhuo, Qiang Yang, Derek Hao Hu, and Lei Li. Transferring knowledge from another domain for learning action models. In PRICAI 2008: Trends in Artificial Intelligence, 10th Pacific Rim International Conference on Artificial Intelligence, Hanoi, Vietnam, December 1519, 2008. Proceedings, pages 1110–1115, 2008.
 Zhuo et al. [2010] Hankz Hankui Zhuo, Qiang Yang, Derek Hao Hu, and Lei Li. Learning complex action models with quantifiers and logical implications. Artif. Intell., 174(18):1540–1569, 2010.
 Zhuo et al. [2011a] Hankz Hankui Zhuo, Hector MuñozAvila, and Qiang Yang. Learning action models for multiagent planning. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, May 26, 2011, Volume 13, pages 217–224, 2011.
 Zhuo et al. [2011b] Hankz Hankui Zhuo, Qiang Yang, Rong Pan, and Lei Li. Crossdomain actionmodel acquisition for planning via web search. In Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 1116, 2011, 2011.
 Zhuo et al. [2014] Hankz Hankui Zhuo, Héctor MuñozAvila, and Qiang Yang. Learning hierarchical task network domains from partially observed plan traces. Artif. Intell., 212:134–157, 2014.
 Zhuo [2015] Hankz Hankui Zhuo. Crowdsourced actionmodel acquisition for planning. In Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence, January 2530, 2015, Austin, Texas, USA., pages 3439–3446, 2015.
Comments
There are no comments yet.