Introduction
In recent years, crowdfunding has rapidly developed into a popular way of financial investment. It is an emerging approach that aims to solicit funds from individuals rather than traditional venture investors, such as angel investors and banks. More and more people are willing to launch a project (or named campaign on the Internet) for different purposes. Indeed, tremendous efforts have been made by researchers to comprehend the internal mechanism in crowdfunding.
Most of the existing works focus on analyzing factors that affect the final results and predicting the probability of success. However, dynamics tracking, i.e., predicting the funding progress of campaigns, is still a problem under research. As shown in Figure
1, the funding progress of a campaign means the cumulative funding amount expressed as a percentage concerning the pledged goal (e.g, this ended campaign reached 102% of the goal). The task concentrates on forecasting a series of percentages for campaigns that are still funding (e.g, in the 12th day, the future progress series is 44.7%, 49.2%, 50.9%, …). Actually, it is a meaningful question for funding raisers and investors. For raisers, they may acquire the forthcoming expectations of campaigns and could make quick adaptations to the market. While for potential backers, before making their possible backing decisions, they will get more detailed advice for estimating the following tendencies. Some methods have been explored in the literature, including hierarchical regression
[38] and basissynthesis techniques [24]. Moreover, others turn to predict the backing distribution of campaigns through a Seq2Seq framework [12]. However, there are still challenges on funding process modeling and series pattern utilization.On the one hand, few of these works treat the dynamic tracking as a decisionmaking process. Actually, the transformation of campaigns and decision process of investors affect and depend on each other, which evolves to a complicated system [32]. Hence, compared with viewing dynamics of funding series as a whole process, the inner relationship between investors and campaigns that leads to the exterior results might not be ignored. For instance, the future tendencies of campaigns (e.g., expectation of success) will impact on the investment decision of investors, and vice versa. Nevertheless,while modeling the strong relationship between investors and campaigns, it is difficult to reflect how previous backers’ behaviors affect latecomers’ choices, along with how backers make decisions based on prior contribution performance and future estimates.
On the other hand, it is also significant to combine series patterns while tracking the dynamics in crowdfunding. Though patterndecomposition techniques have been investigated, indepth implications of patterns are not reported. Indeed, the entire pattern of funding series has been examined by kuppuswamy2017crowdfunding kuppuswamy2017crowdfunding, i.e., Ushaped pattern. Figure 1 shows a typical example for explaining what exactly the Ushaped pattern means. In other words, more contributions are likely to occur at the very beginning and ending of the funding period, as compared to the middle time. The sharp increases in the initial stage are partly because of raisers’ social effect and partly due to the irrational investment behaviors which can be explained by the “Herd Effect”. While the rises in the last phase are caused by “Goal Gradient Effect” [15]. Hence, to precisely utilize the entire Ushaped pattern, automatically switching mechanism is required to change between subpatterns in pace with different periods of funding cycles.
To tackle the challenges above, we first propose a model named Trajectorybased Continuous Control for Crowdfunding (TC3). Specifically, we adopt a Markov decision process (MDP) to describe the interactions between investors and campaigns. To clearly indicate all factors that influence the decisions of investors, reinforcement learning methods, especially actorcritic frameworks are employed. In our approach, the inner transformation of campaigns is regarded as an
environment. While the agent, which interacts with the environment, is the union of investors, along with a subcomponent critic to estimate future expectations. Secondly, to explicitly discriminate different subpatterns in the entire Ushaped pattern, we propose to subdivide the entire pattern into fastgrowing and slowgrowing parts. Then, inspired by the idea in the hierarchical reinforcement learning area that segmenting the states and generating corresponding subpolicies, we propose TC3Options to predict the funding progress of campaigns. With the help of a options structure, TC3Options provides the capability of switching between different subpatterns automatically, which means the typical Ushaped pattern behind the funding series could be precisely utilized. Finally, we conduct extensive experiments on a realworld dataset. The experimental results clearly validate that our method can predict more accurately than other stateoftheart methods and can properly select subpolicies according to different subpatterns.Related Work
The related works of our study can be divided into two categories: crowdfunding and reinforcement learning.
Crowdfunding. With the growing popularity of crowdfunding, scholars have done much research and analysis from different perspectives [36, 20, 37, 35]. Most of the previous works could be grouped into three categories: analyzing the influential factors [3, 15, 23, 11], predicting the funding results (i.e., success of failure) [17, 16, 33, 34, 13] and tracking the funding dynamics [38, 24], etc. Among qualitative factors, what should be mentioned is that some scholars are committed to exploring the social effects in crowdfunding, especially the “Herd Effect” and the “Goal Gradient Effect” [26, 9, 15]
, which uncovers a typical and significant pattern in funding series, i.e., Ushaped pattern. For the success rate prediction task, the accuracy can be improved by combining deep learning (DL), natural language processing (NLP) and transfer learning (TL) techniques. However, simply predicting final outcomes can not reveal the detailed process in the rest of the funding cycles. When it comes to dynamics tracking, zhao2017tracking zhao2017tracking employs a hierarchical regression model that could predict funding amounts in both campaignlevel and perklevel, while other researchers adopt Fourier transformation to capture various patterns hidden behind the funding series
[24]. However, it seems that none of these works consider the inner decisionmaking process between investors and campaigns, which leads to exterior funding results.Reinforcement Learning. Developed from Markov decision processes (MDP) [29], deep reinforcement learning (DRL) has been proved to be a huge success in many domains, such as games [22, 10], robotics [14, 7] and recommender systems [4, 19]. Existing methods could be divided into two categories: valuebased methods, where policies are indirectly acquired according to the estimated value function, and policybased methods, where policies are directly parameterized [30]. Gradually, actorcritic (AC) frameworks that incorporate policy gradient methods with value estimation techniques have become a mainstream [6]. Among AC methods, lillicrap2015continuous lillicrap2015continuous proposed Deep Deterministic Policy Gradient (DDPG) algorithm, which is more effective when it comes to continuous action space. While in the area of hierarchical reinforcement learning, options structure is a popular framework for temporal abstraction [31]. In this framework, state, action and policy seem to respectively have a hierarchical structure from different views. Moreover, the optioncritic architecture was proposed under the actorcritic frameworks [1].
Although reinforcement learning technique is suitable for circumstances under which previous outputs affect the following inputs, leading to complex changes in series, it could hardly be directly applied to track funding dynamics due to the following two reasons. First, primitive objective functions of reinforcement learning merely pay attention to maximize future rewards, while the prediction of history still needs to be considered when forecasting funding progress. Secondly, intraoption policy gradient theorem should be adapted for the deterministic case.
TC3 and TC3Options
In this section, we first formally introduce the research problem, followed by the overview of basic TC3 model and final TC3Options model. Then, we introduce the technical details in both of the models.
Symbol  Description 

static features of campaign  
dynamic features of campaign in day  
true funding progress of campaign in day  
estimated funding progress of campaign in day  
state from environment in day  
action from actor in day  
reward from environment in day  
option chosen by actor in day  
deterministic policy that chooses actions  
function that evaluates the action in state  
stochastic policy that chooses options  
termination probability in state and option 
Problem Statement
First, we assume the process of decisionmaking in crowdfunding as follows. Before an investor determines whether she would contribute or not, she is likely to watch a detailed description of the campaign, including the whole story. Along with static information, some changeable information such as current funding progress, number of backers, all the updates and comments are also visible. Furthermore, the estimate of future trend is also a crucial factor that deserved to be taken into account. Finally, if the investor makes up her mind to support the campaign, she could select one perk, of which funds needed and return gained vary.
Specifically, campaign can be represented by a tuple . Precisely, denotes static features which consist of basic information of a campaign, i.e., campaign description, perk information, a pledged goal, etc. and stand for dynamic features and cumulative funding progress respectively. They are both sequential data. For example, , where is the funding duration that the campaign pledges. Given the previous trajectory of campaign (i.e., and ), the goal is to predict the series of funding progress in the following (e.g., ) days (i.e., ). Here, is a percentage between 0 and 1. Moreover, the dynamic features of campaign (i.e., ) in the
th day are composed of a comments vector
and a day information vector .An Overview of TC3 and TC3Options
The overview of our basic TC3 is shown in Figure 2. After modeling the problem with a MDP, our approach could be generally viewed as two parts, namely an environment and an agent, along with reward signals to measure the prediction results. Specifically, the environment applies a GRU layer to integrate heterogeneous feature from campaigns while the agent includes the components of an actor and a critic. The predictions of funding progress are the outputs of the actor. While the critic is able to estimate future trends that could instruct the actor and improves the accuracy of estimates through the reward signals. Finally, we propose the TC3Options to capture the Ushaped pattern, in which the actor is specially designed with a structure of options.
MDP Formulation
To particularly model the influence between behaviors of investors and dynamics of campaigns, we regard the whole of the former as one agent while thinking of the latter as an environment that can be changed by the agent. Then we apply singleagent reinforcement learning techniques to let them interact with each other. In particular, the environment is simulated from true transformation of campaigns and could be partly unchangeable (e.g., comments) and partly variable (e.g., funding progress).Therefore, we could define reward function from the errors between true and estimated dynamics. Specifically, we model the problem described above as a MDP which comprises: a state space , an action space and a reward function . Without defining state transition distribution , we adopt modelfree methods. In addition, a policy is directly applied to select actions according to states, which is also our learning goal. Formally, we define the state, action, reward in this problem as follows.
State
. Here, we use a Gated Recurrent Unit (GRU)
[5] layer to capture the information of dynamic inputs (i.e., day information and comments). We denote the hidden states represented by GRU layer as , which is also the defined state of the environment, i.e., . An extra explanation is needed that only dynamic features of the current day would be inputted and the GRU layer would aggregate useful information since the first day of the campaign, which might not contradict with the Markov assumption in the decisionmaking process of investors.Action. The possible percentages of the pledged goals make up the action space, which is a continuous one. Due to the unbalanced popularity of campaigns, some may rise to hundreds of times of the goals while some only achieve less than one percent. Applying deterministic policy, the output of the actor component directly means the estimated funding progress in the next day, i.e., . Then, to learn from quite various results that come from a series of changes, we replace the true funding progress in the dynamic features of the next day as the estimated one.
Reward. After observing state and taking action , immediate reward with respect to needs to be returned, for measuring the error between the selected action (i.e., estimated funding progress) and the optimal action (i.e., true funding progress) in the current day. The primitive goal of reinforcement learning is to maximize discounted return from the current th day to the end of funding cycles, i.e., , where is the discounted factor. Hence, we select a positive, continuous and differentiable function that decreases monotonically as the absolute error increases. In addition, to avoid violent fluctuation, the moving average technique might be required.
Components of TC3 and TC3Options
In this subsection, we will first introduce the loss functions of the actor and critic components in the basic TC3. They are denoted by
and respectively. Furthermore, the actor is extended with a structure of options and Intraoption deterministic policy gradient is derived in that sense.Basic TC3.
Here, we derive how the actor predicts the funding progress based on the loss functions with respect to future estimates (i.e., ) and past experiences (i.e., ).
Actually, the actor and critic component are both parametric neural networks. While the learning goal of our models is exactly the policy, namely the function that approximated by the actor, which directly maps hidden state space to action space, i.e.,
where denotes the parameters in the actor component. Meanwhile, the critic component evaluates the policy with respect to stateaction pairs, denoted by . Equally, it learns to estimate the future expectations of accumulated rewards, which measures the errors between possible and true future transformation after taking current action. In the th day, while the agent receives the immediate reward , it could be updated by minimizing the mean square of onestep temporal differences , as shown in the following equations:(1) 
On the other hand, taking advantage of estimated values from the critic, the actor is partly aimed at taking actions that could maximize discounted return , which equally means selecting actions that can minimize the after the th day, where
(2) 
Here, means the distribution of state under the policy and there is no need to compute the gradient of discounted return with respect to this state distribution [27]. Due to the reward is determined by the action and , the is finally with respect to the policy , of which the parameters are denoted by .
While according to Deterministic Policy Gradient Theorem [27], the actor can be improved in the direction of the gradient of the critic, i.e.,
(3) 
As shown in Equation 3, the gradient of is the negative expectation of the product of two gradients. While the former is the gradient of policy with respect to , the latter is the gradient of value function with respect to action , where is determined by . Here, all expectations are actually realized by MonteCarlo sampling.
Additionally, the actor should not only estimate the future influence but also predict the funding progress based on experienced real trajectories, namely minimizing the mean square errors between actual funding progress and estimated one before the th day. Considering the relationship between and that , this part of loss function can be written as:
(4) 
From a practical viewpoint, it could also be regarded as a form of regularization because we want the agent to simulate future changes as closely as possible to the real ones. In other words, the actor should be updated in the direction with the maximum likelihood, especially in the initial stage where the gradient direction of the critic is uncertain.
Actor with Options.
Here, we introduce how the actor component in TC3 is extended with a structure of options. After adapting previous loss functions, a defined termination loss function is added to the final .
In the beginning, we provide an informal intuition on utilizing the entire Ushaped pattern under the framework of options. After judging the stage where a campaign is in, the actor will take a subpolicy to capture those fastgrowing subpatterns (i.e., gain an optimistic prediction) if the campaign is in the beginning period or the ending period but close to the goals. On the contrary, it will switch to another subpolicy to capture slowgrowing subpatterns (i.e., obtain a smoother result) if the campaign is in the phase where the increase is gentle. To that end, primitive policy defined in the previous subsection is diversified with separate parameters and a highlevel policy is needed to select proper .
Formally, an option could be represented by a tuple . Specifically, , and denote initial states, lowlevel policy and termination function of option respectively. The set of initial states with respect to option is a subset of the state space. In this work, we follow the assumption that all states are available to every option [1]. The primitive policy is the deterministic lowlevel policy, opposite to the stochastic highlevel policy . The termination function decides the probability whether the agent will quit the current option.
As shown in Figure 3, in the th day, the observation is the state and the option of last day . If terminating, the agent would select a new option according to stochastic highlevel policy , otherwise . Then, the action to be taken is determined by . With next state received, the agent terminates this option in the next day with the probability of .
Under specific circumstances where lowlevel policy is stochastic, IntraOption Policy Gradient Theorem has been derived [1]. However, we employ a deterministic one here. The corresponding loss functions should be modified. The basic idea behind the following equations is that stateoption pair now performs as an extension of the primitive state. Hence, , and are adjusted to , and respectively.
We first modify the loss function of . The idea of onestep temporaldifference is still effective, only if the probability of termination with respect to the current option and the next state is considered. Specifically, is introduced to compute onestep estimated values. If the termination does not happen, the original estimated value can be directly applied. However, if terminating, the greedy approach is employed to estimate through the maximum of all options, i.e.,
(5) 
As a result, could still be represented by the mean square error of modified :
(6) 
When it comes to , analogous to Equation 3, deterministic form of Intraoption (i.e., lowlevel) policy gradient with respect to the extended state could be written as:
(7) 
where the gradient of with respect to should be computed in the case of .
Training Strategy
While the final in the TC3 and TC3Options are directly described in the Equation 1 and 6 respectively, the in both of our models are composed of different parts.
Considering another obvious prior that the funding progress of a campaign increases monotonously, we add the following restriction:
(8) 
where the square errors of adjacent predicted funding progress are penalized only when is greater than .
As a result, when it comes to in the basic TC3, the losses of future estimates, i.e., and past experiences, i.e., are combined, along with :
(9) 
While in the TC3Options still needs to integrate the loss of termination function, i.e., :
(10) 
Here, we do not prescribe how to acquire highlevel policy since many approaches could be utilized such as primitive policy gradient, planning or temporal difference updates. However, computing in addition to seems to be wasteful. Therefore, we obtain according to , with adaptive epsilon greedy policy adopted to keep the balance between exploration and exploitation.
Some other tricks may need to be explained. The first one is copying the actor and critic network to the target ones. While the update of parameters in the target networks is delayed [18]. Secondly, we store complete trajectories in the experience replay (i.e., ) instead of onestep interactions (i.e., ) [8]. The common purposes are for learning more stably in addition to accelerating the convergence.
Experiments
In this section, we first introduce the dataset we collect from Indiegogo. Then, the detailed experimental setup follows. Finally, the results of experiments are demonstrated, especially the validation of the Ushaped pattern.
Dataset Description
We collect a realworld dataset from Indiegogo, which is a famous rewardbased crowdfunding platform. The dataset includes 14,143 launched campaigns from July 2011 to May 2016, soliciting over 18 billion funds from 217,156 backers. In addition, there are totally 98,923 perks and 240,922 comments, along with 1,862,097 backing records. According to the statistics, in our dataset, 62.54% of campaigns have pledged funding duration between 30 and 60 days. However, there are still 7.14% of campaigns whose funding duration is between 15 and 25 days.
Level  Features  Type 
Static  campaign description  textual 
perk description  textual  
campaign’s category  categorical  
creator’s type  categorical  
funding duration  numerical  
pledged goal  numerical  
number of perks  numerical  
number of comments  numerical  
max/min/avg price of perks  numerical  
Dynamic  comments  textual 
number of day started  numerical  
number of day left  numerical  
current schedule  numerical 
Experimental Setup
Parameter Setting.
For the static features of campaigns, we adopt onehot encoding for categorical features and word2vec embedding
[21] for textual features (each with a 50dimensional vector). Finally, all kinds of static features are concatenated to 182dimensional vectors. While for the dynamic features, they are 19dimensional vectors composed of textual comments (16dimensional by word2vec embedding), day information (2dimensional) and current funding progress. Specially, all kinds of features are scaled by MinMax normalization.Additionally, considering the shortest funding duration of campaigns in our training data and testing data is 15, we set the length of days to be predicted to be 6, 7, 8, 9, 10 respectively. With respect to the coefficients of regularization terms, we set , and to be 100, 1 and 1 respectively.
Evaluation Metrics. First we randomly select 10% of all campaigns in our dataset as the testing set. Then, considering the task is to predict the series of funding progress for campaigns in the future, we adopt the following three metrics to evaluate the performance, i.e., root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). Specifically, for campaign with pledged duration of days, given its series of real funding process in the final days (i.e., ) and the series of predicted funding process (i.e., ), the performance could be measured by:
(11)  
(12)  
(13) 
Benchmark Methods.

VAR (Vector Autoregression) [28] models generalize the univariate autoregressive (AR) model by allowing for more than one evolving variable.

RFR
(Random Forest Regression)
is one of the ensemble methods that could balance different regression results of all decision trees.

SWR (Switching Regression) [38] is a variant of regression model that combines campaignlevel and perklevel regression results.

MLP
Multilayer Perceptron
[2] is a kind of artificial neural network that performs well in dealing with highdimensional features. 
SMPA [12] is a variant of Seq2Seq model that using an encoder to track the history dynamics and a decoder to predict the future dynamics, along with the monotonously increasing prior.

TC3 is our proposed basic model to utilize actorcritic architecture to simulate decisionmaking process between investors and campaigns.

TC3Options is the complete model that combine basic TC3 with a structure of options to utilize the Ushaped pattern in crowdfunding.
Test Length  Metric  VAR  RFR  SWR  MLP  SMPA  TC3  TC3Options 

6day  MAE  0.1245  0.1392  0.1104  0.0648  0.0372  0.0234  0.0201 
RMSE  0.1935  0.2115  0.1695  0.1368  0.0927  0.0681  0.0435  
MAPE  41.23%  40.23%  35.03%  22.54%  12.82%  11.72%  9.05%  
7day  MAE  0.1668  0.1533  0.1365  0.0990  0.0399  0.0258  0.0237 
RMSE  0.2790  0.2535  0.2046  0.1671  0.0963  0.0771  0.0495  
MAPE  52.65%  46.15%  36.21%  26.97%  13.66%  12.27%  9.27%  
8day  MAE  0.1515  0.1443  0.1317  0.0921  0.0420  0.0267  0.0252 
RMSE  0.2574  0.2401  0.1989  0.1542  0.1080  0.0792  0.0504  
MAPE  47.84%  43.63%  35.65%  23.82%  12.58%  11.26%  8.76%  
9day  MAE  0.1494  0.1437  0.1257  0.0960  0.0414  0.0258  0.0249 
RMSE  0.2466  0.2268  0.1944  0.1581  0.0942  0.0759  0.0483  
MAPE  44.84%  40.96%  34.15%  24.69%  11.91%  10.37%  8.18%  
10day  MAE  0.1467  0.1338  0.1209  0.0987  0.0408  0.0249  0.0237 
RMSE  0.2394  0.2197  0.1905  0.1647  0.1011  0.0741  0.0465  
MAPE  42.38%  38.73%  32.20%  25.91%  11.31%  9.71%  7.60% 
Experimental Results
Performance on Funding Progress Prediction.
Here, we demonstrate the performance comparisons on the funding progress prediction task. Table 3 provides the results on RMSE, MAE, MAPE metrics, testing through last 6, 7, 8, 9, 10 days respectively. Overall, it could be observed that both of our proposed models (TC3, TC3Options) outperform the other baselines in all cases, which indicates that modeling the decision process between investors and campaigns might be helpful when tracking the dynamics in crowdfunding, especially when future tendencies are specially considered. Secondly, compared with TC3, TC3Options performs better, which suggests that utilizing a welllearned pattern would improve the accuracy of prediction. However, improvements between TC3 and TC3Options are more evident in RMSE and MAPE metrics instead of MAE metrics. It is possible that the RMSE metric decreases because of the smoother distribution of errors while the MAPE metric falls due to the more accurate prediction when it comes to campaigns with fewer contributions. Thirdly, neural network models (MLP, SMPA, TC3, TC3Options) outperform the regressionbased models (VAR, RFR, SWR) in a whole, which confirms that this kind of methods could better deal with highdimensional features.
Furthermore, as the length of test days to be longer, the metrics do not show the monotonous increasing tendency, which does not agree with intuition. To further explore this, we specially measure the everyday performance of our proposed TC3 and TC3Options (the number of option is 2) from the 1st to the 5th day when the test length is 10 days. The results are shown in the Figure 4, which demonstrates that the error of the first and second day are smaller than the other days evidently. A likely explanation may be that since the length of test days is over one week, most campaigns are going through the gentle raise phrase at the beginning of the test period. As a result, smaller fluctuations seem to make it easier for the algorithm to predict. Actually, the MAPE of the first 5 days is 5.62%, compared with the following 5 days of which the value is 9.54%.
#options  1  2  3  4  5 

MAE  0.0234  0.0201  0.0237  0.0246  0.0261 
RMSE  0.0681  0.0435  0.0468  0.0531  0.0582 
MAPE  11.72%  9.05%  10.40%  11.52%  11.06% 
Parameters Effects.
In this subsection, we conduct a group of experiments to test the influence of numbers of options, where other parameters (length of test day, learning rate, training steps et al.) are kept the same. It should be reminded that when the number of option is 1, the TC3Options model degenerates to the basic TC3 model. The results are shown in Table 4. It is obvious that our model performs best when the number of options is 2. However, the model does not learn better when the number increases, despite it still outperforms the basic TC3 model. A likely explanation is that the model is forced to learn more subpatterns hidden behind the data, while the data would not be complicated enough for so many patterns, hence, the model could be confused to select the proper option.
Ushaped Verification.
Before we validate the learned Ushaped pattern, we conduct a statistical experiment to show the Ushaped pattern hidden behind the dataset. Concretely, we divide the whole funding cycle into eight parts according to the trajectory length and then calculate the mean percentage of the contributions. The results are shown on the Figure 4(a), in which Ushaped pattern is obvious.
Finally, we verify whether or not the pattern learned by our model is the Ushaped pattern when the number of options is 2. Ideally, a welltrained highlevel policy could select proper options according to input states, in addition to the termination function could instruct the appropriate probability to terminate in the current option. Significantly, it is not in every step that the highlevel policy would select the option. Hence, a more effective approach is to observe the values of the termination function in every step. Owing to the experimental setting, the options would terminate with the probability of . If the termination value of the current option is low, the model is more likely to terminate and select a new option in the next day.
The average values of and in different periods are shown on the Figure 4(b). The partition of the whole funding cycle is the same as above. Obviously, the results illustrate that the first option has high termination probability in the start and end stages while the second option shows relatively high termination value in the middle phase, which implies that the lowlevel policy of the first option learns from the fastgrowing subpatterns while the policy of another option learns from the slowgrowing subpatterns. This could also be proved by the average difference of outputs between the two subpolicies described in the other yaxis, which discloses that mean outputs from of Option 1 are all greater than those from of Option 2.
Conclusions
In this paper, we presented a focused study on forecasting dynamics in crowdfunding with an exploratory insight. Inspired by techniques of reinforcement learning, especially hierarchical reinforcement learning, we first propose a basic model which could forecast the funding progress based on the decisionmaking process between investors and campaigns. Then, through observing the typical Ushaped pattern behind the crowdfunding series, we design a specific actor component with a structure of options to fit for various subpatterns in different stages of funding cycles. As a result, we validated the effectiveness of our proposed TC3 and TC3Options models by comparing with other stateoftheart methods. Moreover, extra experiments are conducted to demonstrate the entire pattern learned by TC3Options is exactly the Ushaped one.
In the future, we will generalize this framework to capture dynamic patternswitching process in other tasks that could be modeled as sequential decisionmaking processes.
Acknowledgements
This research was partially supported by grants from the National Key Research and Development Program of China (No. 2016YFB1000904), the National Natural Science Foundation of China (Grants No. 61672483, 61922073), and the Science Foundation of Ministry of Education of China & China Mobile (No. MCM20170507). Qi Liu acknowledges the support of the Young Elite Scientist Sponsorship Program of CAST and the Youth Innovation Promotion Association of CAS (No. 2014299).
References

[1]
(2017)
The optioncritic architecture.
In
ThirtyFirst AAAI Conference on Artificial Intelligence
, Cited by: Related Work, Actor with Options., Actor with Options., Actor with Options.. 
[2]
(2009)
Learning deep architectures for ai.
Foundations and trends® in Machine Learning
2 (1), pp. 1–127. Cited by: 4th item.  [3] (2013) An empirical examination of the antecedents and consequences of contribution patterns in crowdfunded markets. Information Systems Research 24 (3), pp. 499–519. Cited by: Related Work.
 [4] (2019) Topk offpolicy correction for a reinforce recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 456–464. Cited by: Related Work.

[5]
(2014)
Empirical evaluation of gated recurrent neural networks on sequence modeling
. arXiv preprint arXiv:1412.3555. Cited by: MDP Formulation.  [6] (2012) Modelfree reinforcement learning with continuous action in practice. In 2012 American Control Conference (ACC), pp. 2177–2182. Cited by: Related Work.
 [7] (2018) Soft actorcritic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290. Cited by: Related Work.
 [8] (2015) Memorybased control with recurrent neural networks. arXiv preprint arXiv:1512.04455. Cited by: Training Strategy.
 [9] (2011) Strategic herding behavior in peertopeer loan auctions. Journal of Interactive Marketing 25 (1), pp. 27–36. Cited by: Related Work.
 [10] (2018) Rainbow: combining improvements in deep reinforcement learning. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: Related Work.
 [11] (2018) How do investors decide? an interdisciplinary review of decisionmaking in crowdfunding. Electronic Markets 28 (3), pp. 339–365. Cited by: Related Work.
 [12] (2019) Estimating the days to success of campaigns in crowdfunding: a deep survival perspective. In ThirtyThird AAAI Conference on Artificial Intelligence, Cited by: Introduction, 5th item.
 [13] (2019) Predicting outcomes in crowdfunding campaigns with textual, visual, and linguistic signals. Small Business Economics, pp. 1–23. Cited by: Related Work.
 [14] (2013) Reinforcement learning in robotics: a survey. The International Journal of Robotics Research 32 (11), pp. 1238–1274. Cited by: Related Work.
 [15] (2017) Crowdfunding creative ideas: the dynamics of project backers in kickstarter. A shorter version of this paper is in” The Economics of Crowdfunding: Startups, Portals, and Investor Behavior”L. Hornuf and D. Cumming (eds.). Cited by: Introduction, Related Work.
 [16] (2018) Contentbased success prediction of crowdfunding campaigns: a deep learning approach. In Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing, pp. 193–196. Cited by: Related Work.
 [17] (2016) Project success prediction in crowdfunding environments. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 247–256. Cited by: Related Work.
 [18] (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Cited by: Training Strategy.
 [19] (2019) Exploiting cognitive structure for adaptive learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 48, 2019, pp. 627–635. Cited by: Related Work.
 [20] (2017) Enhancing campaign design in crowdfunding: a product supply optimization perspective.. In IJCAI, pp. 695–702. Cited by: Related Work.
 [21] (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: Experimental Setup.
 [22] (2015) Humanlevel control through deep reinforcement learning. Nature 518 (7540), pp. 529. Cited by: Related Work.
 [23] (2014) The dynamics of crowdfunding: an exploratory study. Journal of business venturing 29 (1), pp. 1–16. Cited by: Related Work.
 [24] (2018) Tracking and forecasting dynamics in crowdfunding: a basissynthesis approach. In 2018 IEEE International Conference on Data Mining (ICDM), pp. 1212–1217. Cited by: Introduction, Related Work.
 [25] (2015) Highdimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438. Cited by: Actor with Options..
 [26] (2010) Follow the profit or the herd? exploring social effects in peertopeer lending. In 2010 IEEE Second International Conference on Social Computing, pp. 137–144. Cited by: Related Work.
 [27] (2014) Deterministic policy gradient algorithms. In International Conference on Machine Learning, pp. 387–395. Cited by: Basic TC3., Basic TC3..
 [28] (1980) Macroeconomics and reality. Econometrica: journal of the Econometric Society, pp. 1–48. Cited by: 1st item.
 [29] (2018) Reinforcement learning: an introduction. MIT press. Cited by: Related Work.
 [30] (2000) Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pp. 1057–1063. Cited by: Related Work.
 [31] (1999) Between mdps and semimdps: a framework for temporal abstraction in reinforcement learning. Artificial intelligence 112 (12), pp. 181–211. Cited by: Related Work.
 [32] (2019) Success factors and complex dynamics of crowdfunding: an empirical research on taobao platform in china. Electronic Markets 29 (2), pp. 187–199. Cited by: Introduction.
 [33] (2018) Prediction of crowdfunding project success with deep learning. In 2018 IEEE 15th International Conference on eBusiness Engineering (ICEBE), pp. 1–8. Cited by: Related Work.
 [34] (2019) Interactive attention transfer network for crossdomain sentiment classification. In ThirtyThird AAAI Conference on Artificial Intelligence, Cited by: Related Work.

[35]
(2019)
Personalized recommendation for crowdfunding platform: a multiobjective approach.
In
2019 IEEE Congress on Evolutionary Computation (CEC)
, pp. 3316–3324. Cited by: Related Work.  [36] (2019) Voice of charity: prospecting the donation recurrence & donor retention in crowdfunding. IEEE Transactions on Knowledge and Data Engineering. Cited by: Related Work.
 [37] (2017) A sequential approach to market state modeling and analysis in online p2p lending. IEEE Transactions on Systems, Man, and Cybernetics: Systems 48 (1), pp. 21–33. Cited by: Related Work.
 [38] (2017) Tracking the dynamics in crowdfunding. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 625–634. Cited by: Introduction, Related Work, 3rd item.
Comments
There are no comments yet.