1 Introduction
Research on machine learning has achieved great success on enhancing the models’ accuracy and efficiency. Successful models such as support vector machines (SVMs), random forests, and deep neural nets have been applied to vast industrial applications
Mitchell (1999). However, in many applications, users may need not only a prediction model, but also suggestions on courses of actions to achieve desirable goals. For practitioners, a complex model such as a random forest is often not very useful even if its accuracy is high because of its lack of actionability. Given a learning model, extraction of actionable knowledge entails finding a set of actions to change the input features of a given instance so that it achieves a desired output from the learning model. We elaborate this problem using one example.Example 1. In a credit card company, a key task is to decide on promotion strategies to maximize the longterm profit. The customer relationship management (CRM) department collects data about customers, such as customer education, age, card type, the channel of initiating the card, the number and effect of different kinds of promotions, the number and time of phone contacts, etc.
For data scientists, they need to build models to predict the profit brought by customers. In a real case, a company builds a random forest involving 35 customer features. The model predicts the profit (with probability) for each customer. In addition, a more important task is to extract actionable knowledge to revert “negative profit” customers and retain “positive profit” customers. In general, it is much cheaper to maintain existing “positive profit”customers than to revert “negative profit” ones. It is especially valuable to retain high profit, large, enterpriselevel customers.
There are certain actions that the company can take, such as making phone contacts and sending promotional coupons. Each action can change the value of one or multiple attributes of a customer. Obviously, such actions incur costs for the company. For instance, there are 7 different kinds of promotions and each promotion associates with two features, the number and the accumulation effect of sending this kind of promotion. When performing an action of “sending promotion_amt_N”, it will change features “nbr_promotion_amt_N” and “s_amt_N”, the number and the accumulation effect of sending the sales promotion, respectively. For a customer with “negative profit”, the goal is to extract a sequence of actions that change the customer profile so that the model gives a “positive profit” prediction while minimizing the total action costs. For a customer with “positive profit”, the goal is to find actions so that the customer has a “positive profit” prediction with a higher prediction probability.
Research on extracting actionability from machine learning models is still limited. There are a few existing works. Statisticians have adopted stochastic models to find specific rules of the response behavior of customer DeSarbo and Ramaswamy (1994); Levin and Zahavi (1996). There have also been efforts on the development of ranking mechanisms with business interests Hilderman and Hamilton (2000); Cao et al. (2007a) and pruning and summarizing learnt rules by considering similarity Liu and Hsu (1996); Liu et al. (1999); Cao et al. (2007b, 2010). However, such approaches are not suitable for the problems studied in this paper due to two major drawbacks. First, they can not provide customized actionable knowledge for each individual since the rules or rankings are derived from the entire population of training data. Second, they did not consider the action costs while building the rules or rankings. For example, a low income housewife may be more sensitive to sales promotion driven by consumption target, while a social housewife may be more interested in promotions related to social networks. Thus, these rulebased and ranking algorithms cannot tackle these problems very well since they are not personalized for each customer.
Another related work is extracting actionable knowledge from decision tree and additive tree models by bounded tree search and integer linear programming
Yang et al. (2003, 2007); Cui et al. (2015). Yang’s work focuses on finding optimal strategies by using a greedy strategy to search on one or multiple decision trees Yang et al. (2003, 2007). Cui et al. use an integer linear programming (ILP) method to find actions changing sample membership on an ensemble of trees Cui et al. (2015). A limitation of these works is that the actions are assumed to change only one attribute each time. As we discussed above, actions like “sending promotion_amt_N” may change multiple features, such as “nbr_promotion_amt_N” and “s_amt_N”. Moreover, Yang’s greedy method is fast but cannot give optimal solution Yang et al. (2003), and Cui’s optimization method is optimal but very slow Cui et al. (2015).In order to address these challenges, we propose a novel approach to extract actionable knowledge from random forests, one of the most popular learning models. Our approach leverages planning, one of the core and extensively researched areas of AI. We first rigorously formulate the knowledge extracting problem to a suboptimal actionable planning (SOAP) problem which is defined as finding a sequence of actions transferring a given input to a desirable goal while minimizing the total action costs. Then, our approach consists of two phases. In the offline preprocessing phase, we use an anytime statespace search on an action graph to find a preferred goal for each instance in the training dataset and store the results in a database. In the online phase, for any given input, we translate the SOAP problem into a SAS+ planning problem. The SAS+ planning problem is solved by an efficient MaxSATbased approach capable of optimizing plan metrics.
We perform empirical studies to evaluate our approach. We use a realworld credit card company dataset obtained through an industrial research collaboration. We also evaluate some other standard benchmark datasets. We compare the quality and efficiency of our method to several other stateoftheart methods. The experimental results show that our method achieves a nearoptimal quality and realtime online search as compared to other existing methods.
2 Preliminaries
2.1 Random forest
Random forest is a popular model for classification, one of the main tasks of learning. The reasons why we choose Random forest are: 1) In addition to superior classification/regression performance, Random forest enjoys many appealing properties many other models lack Friedman et al. (2001), including the support for multiclass classification and natural handling of missing values and data of mixed types. 2) Often referred to as one of the best offtheshelf classifier Friedman et al. (2001), Random forest has been widely deployed in many industrial products such as Kinect Shotton et al. (2013)
and face detection in camera
Viola and Jones (2004), and is the popular method for some competitions such as web search ranking Mohan et al. (2011).Consider a dataset , where is the set of training samples and is the set of classification labels. Each vector consists of attributes, where each attribute can be either categorical or numerical and has a finite or infinite domain . Note that we use to represent when there is no confusion. All labels have the same finite categorical domain .
A random forest contains decision trees where each decision tree takes an input x and outputs a label , denoted as . For any label , the probability of output is
(1) 
where are weights of decision trees, is an indicator function which evaluates to 1 if and 0 otherwise. The overall output predicted label is
(2) 
A random forest is generated as follows Breiman (2001). For ,

Sample () instances from the dataset with replacement.

Train an unpruned decision tree on the sampled instances. At each node, choose the split point from a number of randomly selected features rather than all features.
2.2 SAS+ formalism
In classical planning, there are two popular formalisms, STRIPS and PDDL Fox and Long (2003). In recent years, another indirect formalism, SAS+, has attracted increasing uses due to its many favorable features, such as compact encoding with multivalued variables, natural support for invariants, associated domain transition graphs (DTGs) and causal graphs (CGs) which capture vital structural information Bäckström and Nebel (1995); Jonsson and Bäckström (1998); Helmert (2006).
In SAS+ formalism, a planning problem is defined over a set of multivalued state variables . Each variable has a finite domain . A state is a full assignment of all the variables. If a variable is assigned to at a state , we denote it as . We use to represent the set of all states.
Definition 1
(Transition) Given a multivalued state variable with a domain , a transition is defined as a tuple , where , written as . A transition is applicable to a state if and only if . We use to represent applying a transition to a state. Let be the state after applying the transition to , we have . We also simplify the notation as or when there is no confusion.
A transition is a regular transition if or a prevailing transition if . In addition, denotes a mechanical transition, which can be applied to any state and changes the value of to .
For a variable , we denote the set of all transitions that affect as , i.e., for all . We also denote the set of all transitions as , i.e., .
Definition 2
(Transition mutex) For two different transitions and , if at least one of them is a mechanical transition and , they are compatible; otherwise, they are mutually exclusive (mutex).
Definition 3
(Action) An action is a set of transitions , where there do not exist two transitions that are mutually exclusive. An action is applicable to a state if and only if all transitions in are applicable to . Each action has a cost .
Definition 4
(SAS+ planning) A SAS+ planning problem is a tuple defined as follows

is a set of state variables.

is a set of actions.

is the initial state.

is a set of goal conditions, where each goal condition is a partial assignment of some state variables. A state is a goal state if there exists such that agrees with every variable assignment in .
Note that we made a slight generalization of original SAS+ planning, in which includes only one goal condition. For a state with an applicable action , we use to denote the resulting state after applying all the transitions in to (in an arbitrary order since they are mutex free).
Definition 5
(Action mutex) Two different actions and are mutually exclusive if and only if at least one of the following conditions is satisfied:

There exists a nonprevailing transition such that and .

There exist two transitions and such that and are mutually exclusive.
A set of actions is applicable to if each action is applicable to and no two actions in are mutex. We denote the resulting state after applying a set of actions to as .
Definition 6
(Solution plan) For a SAS+ problem
, a solution plan is a sequence , where each , is a set of actions, and there exists ,
.
Note that in a solution plan, multiple nonmutex actions can be applied at the same time step. means applying all actions in in any order to state . In this work, we want to find a solution plan that minimizes a quality metric, the total action cost .
3 SubOptimal Actionable Plan (SOAP) Problem
We first give an intuitive description of the SOAP problem. Given a random forest and an input x, the SOAP problem is to find a sequence of actions that, when applied to x, changes it to a new instance which has a desirable output label from the random forest. Since each action incurs a cost, it also needs to minimize the total action costs. In general, the actions and their costs are determined by domain experts. For example, analysts in a credit card company can decide which actions they can perform and how much each action costs.
There are two kinds of features, soft attributes which can be changed with reasonable costs and hard attributes which cannot be changed with a reasonable cost, such as gender Yang et al. (2003). We only consider actions that change soft attributes.
Definition 7
(SOAP problem) A SOAP problem is a tuple , where is a random forest, is a given input, is a class label, and is a set of actions. The goal is to find a sequence of actions , to solve:
(3)  
subject to:  (4) 
where is the cost of action , is a constant, is the output of as defined in (1), and is the new instance after applying the actions in to .
Example 2. A random forest with two trees and three features is shown in Figure 1. is a hard attribute, and are soft attributes. Given and an input , the output from is 0. The goal is to change x to a new instance that has an output of 1 from . For example, two actions changing from 2 to 5 and from 500 to 1500 is a plan and the new instance is .
4 A Planning Approach to SOAP
The SOAP problem is proven to be an NPhard problem, even when an action can change only one feature Cui et al. (2015). Therefore, we cannot expect any efficient algorithm for optimally solving it. We propose a planningbased approach to solve the SOAP problem. Our approach consists of an offline preprocessing phase that only needs to be run once for a given random forest, and an online phase that is used to solve each SOAP problem instance.
4.1 Action graph and preferred goals
Since there are typically prohibitively high number of possible instances in the feature space, it is too expensive and unnecessary to explore the entire space. We reason that the training dataset for building the random forest gives a representative distribution of the instances. Therefore, in the offline preprocessing, we form an action graph and identify a preferred goal state for each training sample.
Definition 8
(Feature partitions) Given a random forest , we split the domain of each feature () into a number of partitions according to the following rules.

is split into partitions if is categorical and has categories.

is split into partitions if is numerical and has branching nodes in all the decision trees in . Suppose the branching nodes are , the partitions are
.
In Example 2, is splited into , and are splited into and , respectively.
Definition 9
(State transformation) For a given instance , let be the number of partitions and the partition index for feature , we transform it to a SAS+ state , where and .
For simplicity, we use to represent when there is no confusion. Note that if two instances and transform to the same state , then they have the same output from the random forest since they fall within the same partition for every feature. In that case, we can use in place of and .
Given the states, we can define SAS+ transitions and actions according to Definitions 1 and 3. For Example 2, can be transformed to state , . For an input , the corresponding state is . The action changing from 2 to 5 can be represented as . Thus, the resulting state of applying is .
Definition 10
(Action graph) Given a SOAP problem , the action graph is a graph where is the set of transformed states and an edge if and only if there is an action such that . The weight for this edge is .
The SOAP problem in Definition 7 is equivalent to finding the shortest path on the state space graph from a given state to a goal state. A node is a goal state if . Given the training data
, we use a heuristic search to find a
preferred goal state for each that . For each of such x, we find a path in the action graph from to a state such that while minimizing the cost of the path.Algorithm 1 shows the heuristic search. The search uses a standard evaluation function . is the cost of the path leading up to . Let the path be , , , , and for , we have . We define the heuristic function as if , otherwise .
For any state satisfying , . Since the goal is to achieve , measures how far is from the goal. is a controlling parameter. In our experiments, is set to the mean of all the action costs.
Algorithm 1 maintains two data structures, a min heap and a closed list, and performs the following main steps:

Initialize , , and where represent the number of expanded states， is the best goal state ever found, and records the cost of the path leading up to . Add the initial state to the min heap (Lines 12).

Pop the state from the heap with the smallest (Line 4).

If and , update , , and the best goal state (Lines 56).

If the termination condition () is met, stop the search and return (Line 8).

Add to the closed list and for each edge , add to the min heap if is not in the closed list and not a goal state (Lines 1012).

Repeat from Step 2.
The closed list is implemented as a set with highly efficient hashingbased duplicate detection. The search terminates when the search has not found a better plan for a long time (). We set a large value () in our experiments. Note that Algorithm 1 does not have to search all states since it will stop the search once a state s satisfies the termination condition (Line 8).
By the end of the offline phase, for each and the corresponding state , we find a preferred goal state . For an input in Example 2, the corresponding initial state is . An optimal solution is where , , , and the preferred goal state is .
4.2 Online SAS+ planning
Once the offline phase is done, the results can be used to repeatedly solve SOAP instances. We now describe how to handle a new instance and find the actionable plan.
In online SAS+ planning, we will find a number of closest states of and use the combination of their goals to construct the goal
. This is inspired by the idea of similaritybased learning methods such as knearestneighbor (kNN). We first define the similarity between two states.
Definition 11
(Feature similarity) Given two states and , the similarity of the ith feature variable is defined as:

if the ith feature is categorical, if , otherwise .

if the ith feature is numerical, where and are the partition index of features and , and is the number of partitions of the ith feature.
Note that . means they are in the same partition, while means they are totally different.
Definition 12
(State similarity) The similarity between two states and is 0 if there exists , is a hard attribute and and are not in the same partition. Otherwise, the similarity is
(5) 
where is the feature weight in the random forest.
Note that . A larger means higher similarity. Given two vectors and in Example 2, the corresponding states are and . Their feature similarities are , , and . Suppose , then .
Given two vectors and , the corresponding states are and . Since is a hard attribute and , are not in the same parition, .
SAS+ formulation. Given a SOAP problem , we define a SAS+ problem as follows:

is a set of state variables. Each variable has a finite domain where is the number of partitions of the th feature of x.

is a set of SAS+ actions directly mapped from in .

is transformed from according to Definition 9.

Let be the nearest neighbors of ranked by , and their corresponding preferred goal states be , the goal in SAS+ is . is a userdefined integer.
In example 2, if we preprocessed three initial states , , , then three preferred goal states , , and will be found in the offline phase. In the online phase, given a new input , the corresponding state is . Suppose , then , , and . If , the 2 nearest neighbors of are and , and the goal of the SAS+ problem is .
In the online phase, for a given , we solve a SAS+ instance defined above. In addition to classical SAS+ planning, we also want to minimize the total action costs. Since some existing classical planners do not perform well in optimizing the plan quality, we employ a SATbased method.
Our method follows the bounded SAT solving strategy, originally proposed in SATPlan Kautz and Selman (1992) and Graphplan Blum and Furst (1997). It starts from a lower bound of makespan (L=1), encodes the SAS+ problem as a weighted partial MaxSAT (WPMaxSAT) instance Lu et al. (2014), and either proves it unsatisfiable or finds a plan while trying to minimize total action costs at the same time.
For a SAS+ problem , given a makespan , we define a WPMaxSAT problem with the following variable set and clause set . The variable set includes three types of variables:

Transition variables: , and .

Action variables: , and .

Goal variables: , .
Each variable in represents the assignment of a transition or an action at time , or a goal condition .
The clause set has two types of clauses: soft clauses and hard clauses. The soft clause set is constructed as: . For each clause , its weight is defined as . For each clause in the hard clause set , its weight is so that it must be true. has the following hard clauses:

Initial state: ,

Goal state: . It means at leat one goal condition must be true.

Goal condition: , , . If is true, then for each assignment , at least one transition changing variable to value must be true at time .

Progression: and , .

Regression: and , .

Mutually exclusive transitions: for each mutually exclusive transitions pair , , .

Mutually exclusive actions: for each mutually exclusive actions pair , , .

Composition of actions: and , .

Action existence: for each nonprevailing transition , .
There are three main differences between our approach and a related work, SASE encoding Huang et al. (2010, 2012). First, our encoding transforms the SAS+ problem to a WPMaxSAT problem aiming at finding a plan with minimal total action costs while SASE transforms it to a SAT problem which only tries to find a satisfiable plan. Second, besides transition and action variables, our encoding has extra goal variables since the goal definition of our SAS+ problem is a combination of several goal states while in SASE it is a partial assignment of some variables. Third, the goal clauses of our encoding contain two kinds of clauses while SASE has only one since the goal definition of ours is more complicated than SASE.
We can solve the above encoding using any of the MaxSAT solvers, which are extensively studied. Using soft clauses to optimize the plan in our WPMaxSAT encoding is similar to Balyo’s work Balyo et al. (2014) which uses a MAXSAT based approach for plan optimization (removing redundant actions).
5 Experimental Results
To test the proposed approach (denoted as “Planning”), in the offline preprocess, in Algorithm 1 is set to . In the online search, we set neighborhood size and use WPM2014in ^{1}^{1}1http://www.maxsat.udl.cat/ to solve the encoded WPMaxSAT instances. For comparison, we also implement three solvers: 1) An iterative greedy algorithm, denoted as “Greedy” which chooses one action in each iteration that increases while minimizes the total action costs. It keeps iterating until there is no more variables to change. 2) A suboptimal state space method denoted as “NS” Lu et al. (2016). 3) An integer linear programming (ILP) method Cui et al. (2015), one of the stateoftheart algorithms for solving the SOAP problem. ILP gives exact optimal solutions.
Dataset  N  D  C  T (s)  #S  (days) 

Credit  17714  14  2  1.22  3.40E+09  4.81E+01 
A1a  32561  123  2  365.25  1.68E+07  7.09E+01 
Australian  690  14  2  0.06  1.14E+08  7.34E02 
Breast  683  10  2  2.43  7.07E+07  1.99E+00 
Dna scale  2000  180  3  161.89  3.36E+07  6.29E+01 
Heart  270  13  2  0.35  2.07E+08  8.37E01 
Ionosphere scale  351  34  2  64.06  8.39E+06  6.22E+00 
Liver disorders  345  6  2  0.05  2.33E+05  1.40E04 
Mushrooms  8124  112  2  0.01  2.05E+03  1.80E07 
Vowel  990  10  11  0.15  5.96E+08  1.06E+00 
We test these algorithms on a realworld credit card company dataset (“Credit”) and other nine benchmark datasets from the UCI repository^{2}^{2}2https://archive.ics.uci.edu/ml/datasets.html and the LibSVM website^{3}^{3}3http://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/ used in ILP’s original experiments Cui et al. (2015). Information of the datasets is listed in Table 1. N, D, and C are the number of instances, features, and classes, respectively. A random forest is built on the training set using the Random Trees library in OpenCV 2.4.9. GNU C++ 4.8.4 and Python 2.7 runtime systems are used.
In the offline preprocess, we generate all possible initial states and use Algorithm 1 to find a preferred goal state for each initial state. For each dataset, we generate problems with the same parameter settings as in ILP experiments. Specifically, we use a weighted Euclidean distance as the action cost function. For action which changes state to , the cost is
(6) 
where is the cost weight on variable , randomly generated in . Since the offline preprocess works are totally independent, we can parallelly solve them in a large number of workstation nodes. We run the offline preprocess parallelly on a workstation with 125 computational nodes. Each node has a 2.50GHz processor with 8 cores and 64GB memory. For each instance, the time limit is set to 1800 seconds. If the preprocess search does not finish in 1800 seconds, we record the best solution found in terms of net profit and the total search time (1800 seconds).
We show the average preprocessing time (T) on each dataset in seconds and the total number of possible initial states (#S) in Table 1. shows how many days it costs to finish all preprocess works by parallelly solving in 1000 cores. We can see that even though the total number of preprocessed states are very large, the total preprocess time can be extensively reduced to an acceptable range by parallelly solving.
In the offline preprocess, the percentage of actual preprocessed states out of all possible initial states in the transformed state space is a key feature of determing the online search quality. For each preprocessing percentage , we randomly sample instances from all possible initial states and use Algorithm 1 to find preferred goals. Then, in the online search, we randomly sample 100 instances from the test set and generate 100 problems based on these preferred goals. We report the online search time in seconds and total action costs of the solutions, averaged over 100 runs. From Figure 2, we can see that the total offline preprocessing time linearly increases with the percentage. The average total action costs almost linearly decrease with the percentage. Actually, considering the almost unlimited offline preprocessing time, we can always increase the preprocessing percentage and eventually reach 100%.
Dataset  Greedy  NS  Planning  ILP  

T (s)  Cost  L  M (GB)  T (s)  Cost  L  M (GB)  T (s)  Cost  L  M (GB)  T (s)  Cost  L  
Credit  1.06  525.61  12.07  0.01  1.65  33.20  3.37  0.05  0.08  33.20  3.17  15.21  6.59  33.20  3.37 
A1a  1.24  462.07  8.47  0.01  6.56  68.07  3.10  0.11  0.05  62.17  3.40  3.85  7.56  60.60  3.33 
Australian  0.04  215.10  9.30  0.01  0.06  6.03  1.37  0.01  0.03  6.03  1.37  2.98  108.89  6.03  1.37 
Breast  0.02  375.70  16.77  0.01  0.65  74.97  11.70  0.01  0.11  74.97  11.70  1.20  30.58  74.97  11.70 
Dna scale  0.11  775.26  16.68  0.01  4.59  75.30  3.00  0.08  0.05  75.30  3.00  11.26  34.54  75.30  3.00 
Heart  0.02  569.07  9.13  0.01  0.05  83.37  2.03  0.01  0.04  83.37  2.03  5.03  5.54  83.37  2.03 
Ionosphere scale  0.04  1219.12  25.62  0.01  62.33  460.33  12.23  0.52  0.13  445.40  12.17  0.54  47.97  444.90  12.17 
Liver disorders  0.04  212.67  4.90  0.01  0.07  83.17  2.50  0.01  0.04  83.17  2.50  0.01  30.47  83.17  2.50 
Mushrooms  0.00  58.71  1.00  0.01  0.01  30.27  1.13  0.01  0.03  30.27  1.13  0.01  3.74  30.27  1.13 
Vowel  0.02  425.29  9.83  0.01  0.49  61.63  4.20  0.01  0.06  61.63  4.20  11.11  66.92  61.63  4.20 
Table 2 shows a comprehensive comparison in terms of the average search time, the solution quality measured by the total action costs, the action number of solutions, and the memory usage under the preprocessing percentage 100%. We report the search time (T) in seconds, total action costs of the solutions (Cost), action number of solutions (L), and the memory usage (GB), averaged over 100 runs.
From Table 2, we can see that even though our method spends quite a lot of time in the offline processing, its online search is very fast. Since our method finds near optimal plans for all training samples, its solution quality is much better than Greedy while spending almost the same search time. Comparing against NP, our method is much faster in online search and maintains better solution qualities in a1a and ionosphere scale and equal solution qualities in other 8 datasets. Comparing against ILP, our method is much faster in online search with the cost of losing optimality. Typically a trained random forest model will be used for long time. Since our offline preprocessing only needs to be run once, its cost is well amortized over large number of repeated uses of the online search. In short, our planning approach gives a good qualityefficiency tradeoff: it achieves a nearoptimal quality using search time close to greedy search. Note that since we need to store all preprocessed states and their preferred goal states in the online phase, the memory usage of our method is much larger than greedy and NS approaches.
6 Conclusions
We have studied the problem of extracting actionable knowledge from random forest, one of the most widely used and best offtheshelf classifiers. We have formulated the suboptimal actionable plan (SOAP) problem, which aims to find an action sequence that can change an input instance’s prediction label to a desired one with the minimum total action costs. We have then proposed a SAS+ planning approach to solve the SOAP problem. In an offline phase, we construct an action graph and identify a preferred goal for each input instance in the training dataset. In the online planning phase, for each given input, we formulate the SOAP problem as a SAS+ planning instance based on a nearest neighborhood search on the preferred goals, encode the SAS+ problem to a WPMaxSAT instance, and solve it by calling a WPMaxSAT solver.
Our approach is heuristic and suboptimal, but we have leveraged SAS+ planning and carefully engineered the system so that it gives good performance. Empirical results on a credit card company dateset and other nine benchmarks have shown that our algorithm achieves a nearoptimal solution quality and is ultraefficient, representing a much better qualityefficiency tradeoff than some other methods.
With the great advancements in data science, an ultimate goal of extracting patterns from data is to facilitate decision making. We envision that machine learning models will be part of larger AI systems that make rational decisions. The support for actionability by these models will be crucial. Our work represents a novel and deep integration of machine learning and planning, two core areas of AI. We believe that such integration will have broad impacts in the future.
Note that the proposed action extraction algorithm can be easily expanded to other additive tree models (ATMs) Lu et al. (2016), such as adaboost Freund and Schapire (1997)
, gradient boosting trees
Friedman . Thus, the proposed action extraction algorithm has very wide applications.In our SOAP formulation, we only consider actions having deterministic effects. However, in many realistic applications, we may have to tackle some nondeterministic actions. For instance, push a promotional coupon may only have a certain probability to increase the accumulation effect since people do not always accept the coupon. We will consider to add nondeterministic actions to our model in the near future.
This work has been supported in part by National Natural Science Foundation of China (Nos. 61502412, 61033009, and 61175057), Natural Science Foundation of the Jiangsu Province (No. BK20150459), Natural Science Foundation of the Jiangsu Higher Education Institutions (No. 15KJB520036), National Science Foundation, United States (IIS0534699, IIS0713109, CNS1017701), and a Microsoft Research New Faculty Fellowship.
References
 Bäckström and Nebel (1995) C. Bäckström and B. Nebel. Complexity results for sas+ planning. Computational Intelligence, 11(4):625–655, 1995.
 Balyo et al. (2014) Tomáš Balyo, Lukáš Chrpa, and Asma Kilani. On different strategies for eliminating redundant actions from plans. In Seventh Annual Symposium on Combinatorial Search, 2014.
 Blum and Furst (1997) A. Blum and M. L. Furst. Fast planning through planning graph analysis. Artificial Intelligence, 90(12):281–300, 1997.
 Breiman (2001) L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
 Cao et al. (2007a) L. Cao, D. Luo, and C. Zhang. Knowledge actionability: satisfying technical and business interestingness. International Journal of Business Intelligence and Data Mining, 2(4):496–514, 2007a.
 Cao et al. (2007b) L. Cao, C. Zhang, D. Taniar, E. Dubossarsky, W. Graco, Q. Yang, D. Bell, M. Vlachos, B. Taneri, E. Keogh, et al. Domaindriven, actionable knowledge discovery. IEEE Intelligent Systems, (4):78–88, 2007b.
 Cao et al. (2010) L. Cao, Y. Zhao, H. Zhang, D. Luo, C. Zhang, and E. K. Park. Flexible frameworks for actionable knowledge discovery. IEEE Transactions on Knowledge and Data Engineering, 22(9):1299–1312, 2010.
 Cui et al. (2015) Z. Cui, W. Chen, Y. He, and Y. Chen. Optimal action extraction for random forests and boosted trees. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.
 DeSarbo and Ramaswamy (1994) W. S. DeSarbo and V. Ramaswamy. Crisp: customer response based iterative segmentation procedures for response modeling in direct marketing. Journal of Direct Marketing, 8(3):7–20, 1994.
 Fox and Long (2003) M. Fox and D. Long. PDDL2.1: An extension to PDDL for expressing temporal planning domains. Journal of Artificial Intelligence Research, 20:61–124, 2003.
 Freund and Schapire (1997) Y. Freund and R. E. Schapire. A decisiontheoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55:119–139, 1997.
 Friedman et al. (2001) J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning, volume 1. 2001.
 (13) J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29:1189–1232.
 Helmert (2006) M. Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006.
 Hilderman and Hamilton (2000) R. J. Hilderman and H. J. Hamilton. Applying objective interestingness measures in data mining systems. In Proc. Principles of Data Mining and Knowledge Discovery, pages 432–439. Springer, 2000.
 Huang et al. (2010) R. Huang, Y. Chen, and W. Zhang. A novel transition based encoding scheme for planning as satisfiability. In Proc. AAAI Conference on Artificial Intelligence, 2010.
 Huang et al. (2012) R. Huang, Y. Chen, and W. Zhang. SAS+ planning as satisfiability. Journal of Artificial Intelligence Research, 43:293–328, 2012.
 Jonsson and Bäckström (1998) P. Jonsson and C. Bäckström. Statevariable planning under structural restrictions: Algorithms and complexity. Artificial Intelligence, 100(12):125–176, 1998.
 Kautz and Selman (1992) H. Kautz and B. Selman. Planning as satisfiability. In Proc. European Conference on Artificial Intelligence, 1992.
 Levin and Zahavi (1996) N. Levin and J. Zahavi. Segmentation analysis with managerial judgment. Journal of Direct Marketing, 10(3):28–47, 1996.
 Liu and Hsu (1996) B. Liu and W. Hsu. Postanalysis of learned rules. In Proc. AAAI Conference on Artificial Intelligence, pages 828–834, 1996.
 Liu et al. (1999) B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 125–134, 1999.
 Lu et al. (2014) Q. Lu, R. Huang, Y. Chen, Y. Xu, W. Zhang, and G. Chen. A SATbased approach to costsensitive temporally expressive planning. ACM Transactions on Intelligent Systems and Technology, 5(1):18:1–18:35, 2014.
 Lu et al. (2016) Q. Lu, Z. Cui, Y. Chen, and X. Chen. Extracting optimal actionable plans from additive tree models. Frontiers of Computer Science, (In press), 2016.
 Mitchell (1999) T. M. Mitchell. Machine learning and data mining. Communications of the ACM, 42(11):30–36, 1999.
 Mohan et al. (2011) A. Mohan, Z. Chen, and K.Q. Weinberger. Websearch ranking with initialized gradient boosted regression trees. In Journal of Machine Learning Research, volume 14, pages 77–89, 2011.
 Shotton et al. (2013) J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook, and R. Moore. Realtime human pose recognition in parts from single depth images. Communications of the ACM, 56(1):116–124, 2013.

Viola and Jones (2004)
P. Viola and M.J. Jones.
Robust realtime face detection.
International Journal of Computer Vision
, 57(2):137–154, 2004.  Yang et al. (2003) Q. Yang, J. Yin, C. Ling, and T. Chen. Postprocessing decision trees to extract actionable knowledge. In Proc. IEEE International Conference on Data Mining, pages 685–688, 2003.
 Yang et al. (2007) Q. Yang, J. Yin, C. Ling, and R. Pan. Extracting actionable knowledge from decision trees. IEEE Transactions on Knowledge and Data Engineering, (1):43–56, 2007.