Computational cognitive modeling is an approach in cognitive sciences which explores human cognition by implementing detailed computational models. This enables researchers to execute their models and simulate human behavior [Sun (2008)]. Due to their executability, computational models have to be defined precisely. Thereby ambiguities appearing in verbal-conceptual models can be eliminated. By conducting the same experiments with humans and an executable cognitive model, the plausibility of a model can be verified and gradually improved.
To implement cognitive models, it is helpful to introduce cognitive architectures which bundle well-investigated research results from several disciplines of psychology to a unified theory. On the basis of such an architecture, researchers are able to implement domain-specific computational models without having to deal with the remodeling of fundamental psychological results. Additionally, cognitive architectures ideally constrain modeling to plausible models which facilitates the modeling process [Taatgen et al. (2006)].
One of the most popular cognitive architectures is Adaptive Control of Thought – Rational (ACT-R), a production rule system introduced by John R. Anderson [Anderson and Lebiere (1998), Anderson et al. (2004)]. It has been used to model cognitive tasks like learning the past tense [Taatgen and Anderson (2002)], but is also used in human-computer interaction or to improve educational software by simulating human students [Anderson et al. (2004), p. 1045 sqq.]. Although providing a theory of the psychological foundations, ACT-R lacks a formal definition of its underlying concepts from a mathematical-computational point of view. This led to a reference implementation full of assumptions and technical artifacts beyond the theory making it difficult to overlook and inhibiting adaptability and extensibility. The situation improved with the modularization of the psychological theory, but it is still difficult to exchange more central parts of the implementation like conflict resolution [Stewart and West (2007)].
To overcome these drawbacks, we have formalized parts of the implementation closing the gap between the psychological theory and the technical implementation. We describe an implementation of ACT-R which has been derived from our formalization using Constraint Handling Rules (CHR). Due to the power of logic programming, our implementation is very close to the formalization and leads to short and concise code covering the fundamental parts of the ACT-R theory. For the compilation of ACT-R models to CHR programs, source-to-source transformation is used. Our implementation is highly adaptable. In this paper, this is demonstrated by integrating four different conflict resolution strategies. Despite its proximity to the theory, the implementation can reproduce the results of the original implementation as exemplified in the evaluation of our work. The formalization may support the understanding of the details of our implementation, hence we refer to [Gall (2013)] and and the online appendix (A).
In section 2, we give an overview of the fundamental concepts of ACT-R and shortly describe their implementation in CHR. Section 3 describes the general conflict resolution process of ACT-R. Then the implementation of four different conflict resolution strategies proposed in the literature is presented. To evaluate our implementations, we use an example to compare the results of our implementation with those of the reference implementations where available in section 4. Eventually, in section 5 some related work is presented and a conclusion is given in section 6.
2 A CHR implementation of ACT-R
In the following, a short overview of the fundamental concepts of the ACT-R theory and their transfer to CHR is given. For reasons of space, we refer to the literature for an introduction to CHR [Frühwirth (2009)]. For a more detailed introduction to ACT-R, see [Anderson et al. (2004)] and [Taatgen et al. (2006)]. The reference implementation of ACT-R is written in Lisp and can be obtained from the ACT-R website [ACT-R (2014)]. Details of our implementation including the formalization it is based on can be found in [Gall (2013)]. Parts of the formalization are located in the online appendix (A).
ACT-R is a production rule system which distinguishes two types of knowledge: declarative knowledge holding static facts and procedural knowledge representing processes controlling human cognition. For example, in a model of the game rock, paper, scissors, a declarative fact could be “The opponent played scissors”, whereas a procedural information could be that a round is won, if we played rock and the opponent played scissors. Declarative knowledge is represented as chunks. Each chunk consists of a symbolic name and labeled slots which hold symbolic values. The values can refer to other chunk names, i.e. chunks can be connected. Chunks are typed, i.e. the number and names of the slots provided by a chunk are determined by a type. As usual for production rule systems, procedural knowledge is represented as rules of the form IF conditions THEN actions. Conditions match values of chunks, actions modify them.
The psychological theory of ACT-R is modular: There are modules for each function of the human mind like a declarative module holding the declarative facts, a goal module taking track of the current goal of a task and buffering information and a procedural module holding the procedural information and controlling the cognitive process. There are also modules to interact with the environment like a visual module perceiving the visual field. The modules are independent from each other, i.e. there is no direct communication between them. Each module has a fixed number of buffers associated with it. The buffers can hold at most one single piece of information a time, i.e. one chunk. Modules can put chunks into their associated buffers.
The core of the system is the procedural module which can access the buffers of all other modules but does not have an own buffer. It consists of a procedural memory with a set of production rules. The conditions of a production rule refer to the contents of the buffers, i.e. they match the values of the chunk’s slots. The formal applicability condition of rules can be found in the online appendix (A).
There are three types of actions whose arguments are encoded as chunks as well: First of all, buffer modifications change the content of a buffer, i.e. the values of some of the slots of a chunk in a buffer. Secondly, the procedural module can state requests to other modules which then change the contents of their buffers. Eventually, buffer clearings remove the chunk from a buffer. Although our implementation can handle requests and clearings, we only regard buffer modifications in this work for the sake of simplicity.
Consider the following rule:
(p recognize-win =goal> isa game me rock opponent scissors ==> =goal> result win)
It recognizes a win situation in the game rock, paper, scissors if the model has realized that the opponent played scissors and the agent played rock (which could be accomplished by a corresponding production rule interacting with the visual module). The situation is represented by a chunk of type
game providing the slots
result. As a result, it adds the information that the round has been won by modifying the
result-slot of the goal buffer.
Furthermore, the procedural module controls the match-select-apply cycle of the production rule system. It searches for matching rules. As soon as a matching rule has been selected to fire, it takes 50 ms for the rule to fire based on theories of human cognition [Anderson (2007), p. 54]. During this time, the matching process is inhibited and no other rule can be selected until the selected rule is applied. Hence, the productions are executed serially. The production system is called free, if no rule is selected and waiting for execution. As long as the procedural module is free, it searches for matching rules.
The modules act in parallel. When a request is sent to a module by a production, the procedural module becomes free while the request is completed. Hence, new production rules can match while other modules might be busy with requests.
ACT-R can be extended by arbitrary modules communicating through buffers with the procedural system. However, to exchange more fundamental parts of the architecture it needs more than only architectural modules as shown in section 3.
2.2 The Procedural Module in CHR
The procedural module is the core of ACT-R’s production rule system. Our implementation is based on the translation of production rule systems to CHR as presented in [Frühwirth (2009), chapter 6.1]. However, we have to account for the concepts of chunks and buffers, since ACT-R differs in those particular points from other production systems. Details of the implementation can be found in [Gall (2013)].
The set of chunks can be represented in CHR by a constraint
C is the name of the chunk and
T its type. The slots provided by this chunk and their values can be stored in constraints
chunk_has_slot(C,S,V) denoting that chunk
C has the value
V in slot
S. With special consistency rules it can be assured, that no chunk has two values in its slots and that it only provides the slots allowed by its type. Analogously, a buffer is represented by a constraint
buffer(B,M,C) denoting that the buffer
B is affiliated with the module
M and holds chunk
C. The formal definitions of chunks and buffers can be found in the online appendix (A).
A production rule can now match and modify the information of the buffer system. The actions are implemented by trigger constraints
buffer_action(B,C) which get the name of the buffer
B and a chunk description
C represented by a term
chunk(C,T,[(S,V),...]) which describes a chunk with name
T and a list of slot-value pairs representing the values of the chunk’s slots. Note that such chunk descriptions can be incomplete in some arguments by simply letting them unspecified.
The rule from example 1 can be translated to the following CHR rule:
buffer(goal,_,C), chunk(C,game), chunk_has_slot(C,me,rock), chunk_has_slot(C,opponent,scissors) ==> buffer_modification(goal,chunk(_,_,[(result,win)])).
The name and type of the chunk in the modification are not specified in the original rule and therefore left blank as well as the
2.3 Timing and Phases
As mentioned before, the production system of ACT-R is occupied for 50 ms after a rule has been selected. To model such latencies, an event queue has to be added. It keeps track of the current time and holds an ordered set of events which can be dequeued one after another according to their scheduled times. In our implementation, the event queue is implemented as a priority queue sorting its elements after the time and a priority determining the order of application for simultaneous events. Events are arbitrary Prolog goals and can be added by
add_q(Time,Priority,Event). The current time can be queried by
To ensure that a production rule only matches when the module is free, we replace each CHR rule of the form
C ==> A according to the following scheme consisting of two rules:
C \ match <=> add_q(Now + 0.05,0,apply_rule(rule(r,C))). C \ apply_rule(rule(r,C)) <=> A, get_time(Now), add_q(Now,-10,match).
match indicates that the procedural module is free and searches for a matching rule. For the matching rule, an
apply_rule event is scheduled 50 ms from the current time. This event will actually fire the rule. The actions
A schedule their effects on the buffers at the current time with different priorities. Requests are only sent to the corresponding module. Its effects on the requested buffer are scheduled at a later time. Finally, a new
match event is scheduled at the current time
Now but with low priority of . This ensures that all current actions are performed before the next rule is scheduled to fire.
Otherwise, if no rule matches and the procedural module is free (i.e. a
match constraint is present), a rule can only become matching if the content of the buffers change. Hence, a new
match constraint is added directly after the next event in the queue. This models the fact that the procedural module is searching permanently for matching rules when it is free without adding unnecessary
3 Conflict Resolution
Only one matching production rule can fire at a time. Hence, if there are multiple applicable productions, the system has to decide which to fire. This process is called conflict resolution [McDermott and Forgy (1977)]. In most implementations, CHR simply chooses the rule to fire by textual order, which is a valid conflict resolution mechanism. However, in ACT-R a more advanced approach using subsymbolic concepts is needed to faithfully model human cognition.
3.1 General Conflict Resolution Process
In [Frühwirth (2009), p. 151] a general method to implement different conflict resolution mechanisms in CHR is given. This method is adapted to our CHR implementation of ACT-R. The first rule of each CHR rule pair from section 2.3 can be replaced by:
match, C ==> G | conflict_set(rule(r,C)).
Hence, the application of a matching production is delayed by adding the rule to the conflict set instead of choosing the first matching rule to be applied by scheduling
apply_rule/1 as explained in section 2.3. Thereby all matching rules are collected in
conflict_set/1 constraints which then can be reduced to one single constraint containing only the rule to be applied according to an arbitrary strategy.
As a last production rule, the rule
match <=> select. occurs in the program. This rule will always be applied last (since rules are applied in textual order in CHR). It removes the remaining
match constraint and adds a constraint
select which triggers the selection process. This means that the conflict resolution is performed by choosing one rule from the conflict set constraints and removing all other such constraints. If no rule matches, a new
match constraint is scheduled after the next event.
With the introduction of the
select constraint, the system commits to the rule to be applied by scheduling the corresponding
apply_rule/1 event as explained in section 2.3. This leads the chosen production to perform its actions since its second CHR rule is applicable. After the actions are performed, the next matching phase is scheduled.
The strategy of how the conflict set is eliminated to one single rule which will be applied may vary and is exchangeable. In the following section, several strategies are presented and implemented.
3.2 Conflict Resolution Strategies
There have been several conflict resolution strategies proposed for ACT-R over time. To demonstrate the adaptability of our CHR implementation, we implement some of those strategies. In the reference implementation of ACT-R, such adaptations might need a lot of knowledge about its internal structures [Stewart and West (2007)].
In general, ACT-R conflict resolution strategies usually use the subsymbolic concept of production utilities. The production utility for a production is the function which expresses the value of utility of a particular production at its th application which may be adapted according to a learning strategy. In the conflict resolution process, the current utility values are compared for all matching functions and the production with the highest utility is chosen. The production utility can therefore be seen as a dynamic rule priority which is adapted according to a certain strategy.
In the following, we present some different learning strategies to adapt the utility of a production. Eventually, the concept of rule refraction is introduced, which is a general conflict resolution concept and can be applied for all of the presented learning strategies.
3.2.1 Reinforcement-Learning-Based Utility Learning
The current implementation of ACT-R 6.0 uses a conflict resolution mechanism which is motivated by the Rescorla-Wagner learning equation [Rescorla and Wagner (1972)]. The basic concept is that there are special production rules which recognize a successful state (by some model-specific definition) and then trigger a certain amount of reward measured in units of time as a representation of the effort a person is willing to spend to receive a certain reward [Anderson (2007), p. 161]. All productions which lead to the successful state, i.e. all productions which have been applied, receive a part of the triggered amount of reward which demounts the more time lies between the application of the production rule and the triggering of the reward. The utility of a production then is adapted as follows:
The reward for the th application of the rule is the difference of the external reward and the time between the selection of the rule and the triggering of the reward. The utility adapts gradually to the average reward a rule receives. Its calculation can be extended by noise to enable rules with initally low utilities to fire. This then may boost their utility values.
In CHR, this strategy can be implemented as follows: For each production rule, a
utility/2 constraint is stored holding its current utility value. For rules marked with a reward, a
reward/2 constraint holds the amount of reward. When a production rule is applied, this information is stored with the application time in a constraint by the rule
apply_rule(rule(P,_,_)) ==> get_time(Now), applied([(P,Now)]). With a corresponding rule, the
applied/1 constraints are merged respecting the application time of the rules, since the adaptation strategy depends on the last utility value of a rule and rules might be applied more than once until they receive a reward. This leads to one
applied/1 constraint containing a sorted list of rules and their application time.
If a rule which is marked with a reward is going to be applied, the reward can be triggered by
apply_rule(rule(P,_)), reward(P,R) ==> trigger_reward(R). The triggering of the reward simply adapts the utilities according to equation 1 for all productions which have been applied indicated by the
applied/1 constraint respecting the order of application. Afterwards, this constraint is deleted because after a reward has been received, the rule is not considered in the next adaptation.
3.2.2 Success-/Cost-Based Utility Learning
In prior implementations of ACT-R, the utility learning is based on a success-/cost approach [Anderson et al. (2004), Taatgen et al. (2006)]. A detailed description can be found in [ACT-R Tutorial (2004), unit 6]. Each production rule is associated to the values
denoting the success probability of the production anddenoting its costs. In this approach, the utility of a production rule is defined as:
Note that the current utility does not depend on the value of the last utility, but can be calculated by the current values of the parameters instead. Hence, the order of application does not play a role. Usually, is measured in units of time to achieve a goal whereas – the goal value – is an architectural parameter and usually set to 20 s. The parameters and are obtained by the following equations:
The values and
count all applications of a rule which have been identified as a success or a failure respectively. Similarly to the reinforcement-based learning, some productions which identify a success or failure trigger an event which adapts the counters of successes or failures of all production rules which have been applied since the last triggering. The efforts are estimated by the difference of the time of the triggering and the selection of a rule. The values are initialized withand which is the selection time of one firing. Analogously to the reward-based strategy, utilities can be extended by noise.
Similarly to the implementation of the reinforcement learning rule, the triggering of a success or failure can be achieved by a constraint
failure(P), which encode that a production
Pis marked as success or failure respectively. Combined with the
failure/0constraint can be propagated which trigger the utility adaptation. The following rules show the adaptation of and when a success is triggered and rule has been applied before:
success \ applied(P,T), efforts(P,E), successes(P,S) <=> get_time(Now), efforts(P,E+Now-T), successes(P,S+1). success <=> true.
The number of successes or failures are stored in the respective binary constraints and if a success is triggered, they are incremented for all applied production rules and efforts are adjusted. The rules for failures are analogous. The adaptation of one of those parameters triggers the rules which replace the constraints holding the old and values by new values. When a or constraint is replaced, the calculation of the new utility value is triggered. To ensure that only one utility value is in the store, a destructive update rule is used.
3.2.3 Random Estimated Costs
In [Belavkin and Ritter (2004)], a conflict resolution strategy motivated by research results in decision-making is presented. The current implementation varies slightly from this description [Belavkin (2005)] and we stick to this most recent approach for a better comparability of the results. The strategy is based on the success-/cost-based utility learning from section 3.2.2 and uses the same subsymbolic information (the counts of successes and failures and the efforts). However, instead of calculating the average cost , the expected costs of achieving a success by a rule are estimated:
From the expected costs of a rule , the random estimated costs are derived by by drawing a random number
from a uniform distributionand setting . Eventually, production utilities are calculated analogously to the success-/cost-based strategy: . The influence of the random estimated costs can be varied by adapting the parameter . If , the production rule with minimal random estimated costs will be fired (as suggested in [Belavkin and Ritter (2004)]).
Since this method uses the same parameters as the success-/cost-based variant, almost all of the code can be reused for an implementation. However, instead of the costs, the expected costs are computed and saved in a constraint whenever the success/failure ratio changes. Additionally, the random costs must be calculated in every conflict resolution step and not only when the parameters change since they vary each time due to randomization. Hence, a rule must be added which calculates the utility value as soon as a production rule enters the conflict set:
conflict_set(rule(P,_)), theta(P,T), succ_prob(P,SP) ==> random(R), Z is -T * log(1 - R), U is SP*20-Z, set_utility(P,U).
The rest of the implementation like the calculation of the success/failure counters, efforts or the pruning of the conflict set is identical to the success-/cost-based strategy.
3.2.4 Production Rule Refraction
In contrast to the previous strategies which only exchange the utility learning part, production rule refraction adapts the general conflict resolution mechanism and can be combined with all of the other presented strategies. It was first suggested in [Young (2003)] to avoid over-programming of models in the sense that the order of application of a set of rules is fixed in advance by adding artificial signals to ensure the desired order. Rule refraction can avoid such operational concepts by inhibiting the application of the same rule instantiation more than once. To the best of our knowledge, our implementation is the first of its kind for ACT-R.
Refraction can be implemented by saving the instantiation of each applied production using the rule
apply_rule(R) ==> instantiation(R). When building the conflict set, the following rule eliminates all productions which already have been applied from the set:
instantiation(R) \ conflict_set(R) <=> true. This pruning rule must be performed before the rule selection process, so that such productions are never considered as fire candidates.
After having implemented some different conflict resolution strategies, we test their validity with an example model of the game rock, paper, scissors. The idea is that the model simulates a player playing against three opponents with different preferences on the three choices in the game. We then want to observe, how the model adapts its strategy under the different conflict resolution mechanisms and test if the results of the ACT-R implementation and our CHR implementation match.
The player is basically modeled by the production rules
play-scissors standing for the three choices a player has in the game. At the beginning, the production rules have equal utilities which are then adapted by the utility learning mechanisms of the three conflict resolution strategies. Since we only want to test our conflict resolution implementations, we try to rule out all other factors which could influence the behavior of our model. Hence, we only use the procedural module with the goal buffer and do not simulate any declarative knowledge or even perceptual and motor modules. I.e. the model is not a realistic psychological hypothesis of the game play, but only a test of our implementation. Furthermore, we disable noise where possible to better compare our results. In ACT-R, the canonical parameter setting is not recommended to change without justification [Stewart and
West (2007), sec. 1.1]. For our experiment, we used this setting.
The moves of the opponents are randomly generated in advance according to their defined preferences: Player 1 simply chooses rock for every move, player 2 chooses only between rock and paper and player 3 chooses equally between all three possibilities. For each player, we produced 20 samples of 20 moves (except for player 1 with only one sample of 20 moves). Their choices are put into the goal buffer one after another by host-language instructions (Lisp and Prolog/CHR). The game is played for 20 rounds until a restart with a new sample which corresponds to 2 s simulation time. Finally, the utility values at the end of each run (for rock, paper and scissors respectively) are collected and compared to the reference implementation. We use the notation to denote the average of those values over all 20 samples. In the following the implementation of the production rule
(p play-rock =goal> isa game me nil opponent nil ==> =goal> me rock opponent =x !output! (rock =x) )
This rule simply puts the symbol
rock into the goal buffer indicating that the model chose rock. The variable
=x is set by built-in functions of the host language (omitted in the listing) modeling the choice of the opponent derived from a given list of moves. The rules for paper and scissors can be defined analogously. The model has been translated to CHR by our compiler. We performed the translation of Lisp built-ins to Prolog built-ins by hand.
Furthermore, the model contains production rules detecting a win, draw or defeat situation (similar to example 1) and resetting the choices of the two players in the goal buffer to indicate that the next round begins. Those rules are marked with a reward (positive or negative) or as a success/failure respectively. In the case of a draw, no reward, success or failure will be triggered. Hence, the utility learning algorithms will adapt the values of the fired rules depending on their success.
If the highest utilities in the conflict set are equal, the strategy of ACT-R is undocumented. It depends on the order of the rules in the source code and may vary between the implementations (e.g. the strategy of ACT-R 6.0 differs from ACT-R 5.0 as we found in our experiments). We adapted the order of rules in our translated CHR model to match the strategy of ACT-R. Usually, noise would rule out such differences.
For the reference implementations, we used Clozure Common Lisp version 1.9-r15757. The CHR implementation has been run on SWI-Prolog version 6.2.6. The relevant data collected in our experiments can be found in the online appendix (B).
4.2 Availability of the Strategies
Our approach enables the user to exchange the complete conflict resolution strategy without relying on provided interfaces and hooks except for the very basic information that a rule is part of the conflict set or about to be applied. This information relies on the fundamental concept of the match-select-apply cycle of ACT-R. In the reference implementations of the strategies, there are deeper dependencies and assumptions on when and how subsymbolic information is adapted and stored.
This leads to incompatibilities: The reinforcement-learning-based strategy is only available for ACT-R 6.0. Although the success-/cost-based strategy is shipped with ACT-R 6.0, it was not executable for us and hence we had to use ACT-R 5.0 to run it. This leads to further incompatibility problems when using modules not available for ACT-R 5.0 (which is in general difficult to extend due to the lack of architectural modules). Since the method of random-estimated costs relies on the success-/cost-based strategy, it is also only available for ACT-R 5.0.
Our implementation of the refraction-based method is to the best of our knowledge the only existing implementation for ACT-R, although it has been suggested in [Young (2003)].
4.3 Reinforcement-Learning-Based Utility Learning
For the reinforcement-learning-based strategy, we marked the win-detecting production rules with a reward of 2 and the defeat-detecting rules with 0 which leads to negative rewards for all applied rules when a defeat is detected. Draws do not lead to adjustments of the strategy in our configuration. We executed the model on ACT-R 6.0 version 1.5-r1451 and our CHR implementation.
Our implementation matches the results of the reference implementation exactly when rounded to the same decimal precision (see online appendix B.2). Differences of floating point precision did not influence the results, since ACT-R does round the final results to the one-thousandths. As expected, the model usually rewards the paper rule most when playing against player 1 and 2 (average utility at end of round for player 1: ; player 2: (0, 0.81, 0.49)). Exceptions are rounds where the opponent chooses paper above average especially as first moves (e.g. sample 10: 75% rate of paper; first 9 moves; ). In such cases, scissors has the highest utility. This is reinforced by the relatively high reward of successes compared to the punishment of defeats. However, the winning rate is still very high (15 wins, 5 defeats, no draws). Overall, the behavior of the model is very successful (average: 10.4 wins, 3.9 draws and 5.7 defeats in each sample). For player 3 – as expected – no unique result can be learned; wins, draws and defeats are very close in average (6.6 wins, 6.7 draws, 6.7 defeats).
4.4 Success-/Cost-Based Utility Learning
For the success-/cost-based strategy, the production rules recognizing a win situation are marked as a success and analogously the production rules for the defeat situations as a failure. We used ACT-R 5.0 to test our implementation against the reference implementation, since it is not available for ACT-R 6.0. Again, noise is disabled for better comparability. Because the selection mechanism for rules with same utility differs from ACT-R 6.0, we adapted the order in which the rules appear in the source code.
Our implementation matches the results of the reference implementation exactly (see online appendix B.3). It can be seen that this strategy is not able to detect the optimal moves for player 1. Analyses showed that due to the order of the rules, the model first selects to play rock. This leads to a draw and hence no adaptation of the utilities. Hence, rock is played repeatedly. In real-world models, noise would help to overcome such problems. For player 2, the model correctly chose to play paper in average even for the samples where the opponent chooses paper more often than rock. However, in average, the model did only win 8.9 out of 20 rounds in a sample and produced 9.1 draws. For each of the samples, only two rounds were lost.
4.5 Random Estimated Costs
Due to the randomness of this strategy, no exact matches of results can be expected. Hence, we executed the models on 3 samples (the first of each opponent) with 50 runs for each sample. The reference implementation has been run on ACT-R 5.0.
The average utilities are close to the reference implementation (error squares of average utilities player 1: (; player 2: (0.850, 0.000, 0.098); player 3: (2.823, 0.503, 0.003), see online appendix B.4 for details). It can be seen that for most runs the production with the highest, medium and lowest utility value coincide. For player 1, the random estimated costs overcome the problem of the success-/cost-based implementation as discussed in section 4.4.
5 Related Work
There are several implementations of the ACT-R theory in different programming languages. First of all, there is the official ACT-R implementation in Lisp [ACT-R (2014)] which we used as a reference. There are a lot of extensions to this implementation which partly have been included to the original package in later versions like the ACT-R/PM extension included in ACT-R 6.0 [Bothell (Bothell), p. 264]. The implementation comes with an experiment environment offering a graphical user interface to load, execute and observe models.
In [Stewart and West (2006), Stewart and West (2007)], a Python implementation is presented which also has the aim to simplify and harmonize parts of the ACT-R theory by finding the central components of the theory. The architecture has been reduced to only the procedural and the declarative memory which are used to build other models combining and adapting them in different ways. However, there is no possibility to translate traditional ACT-R models automatically to Python code since the way of modeling differs too much from the original implementation.
Furthermore, there are two different implementations in Java: jACT-R [jACT-R (b)] and ACT-R: The Java Simulation & Development Environment [Salvucci (b)]. The latter one is capable of executing original ACT-R models and offers an advanced graphical user interface. The focus of the project was to make ACT-R more portable with the help of Java [Salvucci (a)]. In jACT-R, the focus was to offer a clean and exchangeable interface to all the components, so different versions of the ACT-R theory can be mixed [jACT-R (a)] and models are defined using XML. There is no compiler from original ACT-R models to XML models of jACT-R. Due to the modular design defining various interfaces which can be exchanged, jACT-R is highly adaptable to personal needs. However, both approaches are missing the proximity to a formal representation.
In this work, we have presented an implementation of ACT-R using Constraint Handling Rules which is capable of closing the gap between the theory of ACT-R and its technical realization. Our implementation abstracts from technical artifacts and is near to the theory but can reproduce the results of the reference implementation. Furthermore, the formalization itself enables implementations to check against this reference. The implementation of the different conflict resolution strategies has shown the adaptability of our approach. Most of the implemented strategies are not available for the current implementation of ACT-R and our implementation of production rule refraction is unique.
For the future, the implementation can be extended by other modules like the perceptive/motor modules provided by ACT-R. Currently, there is a running student project on implementing a temporal module which may be used to investigate time perception. The formalization and CHR translation pave the way to develop analysis tools (e.g. a confluence test) on the basis of the results for CHR programs.
- ACT-R (2014) ACT-R 2014. The ACT-R Homepage. http://act-r.psy.cmu.edu/.
- ACT-R Tutorial (2004) ACT-R Tutorial 2004. The ACT-R 5.0 tutorial. http://act-r.psy.cmu.edu/tutorials-5-0/.
- Anderson (2007) Anderson, J. R. 2007. How can the human mind occur in the physical universe? Oxford University Press.
- Anderson et al. (2004) Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., and Qin, Y. 2004. An integrated theory of the mind. Psychological Review 111, 4, 1036–1060.
- Anderson and Lebiere (1998) Anderson, J. R. and Lebiere, C. 1998. The Atomic Components of Thought. Lawrence Erlbaum Associates, Inc.
- Belavkin (2005) Belavkin, R. 2005. Optimist conflict resolution overlay for the ACT–R cognitive architecture. http://www.eis.mdx.ac.uk/staffpages/rvb/software/optimist/optimist-for-actr.pdf.
- Belavkin and Ritter (2004) Belavkin, R. and Ritter, F. E. 2004. Optimist: A new conflict resolution algorithm for act-r. In ICCM. 40–45.
- Bothell (Bothell) Bothell, D. ACT-R 6.0 Reference Manual – Working Draft. Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213.
- Frühwirth (2009) Frühwirth, T. 2009. Constraint Handling Rules. Cambridge University Press.
- Gall (2013) Gall, D. 2013. A rule-based implementation of ACT-R using constraint handling rules. Master Thesis, Ulm University.
- jACT-R (a) jACT-R. Benefits of jACT-R (part of the FAQ section of the homepage). http://jactr.org/node/50.
- jACT-R (b) jACT-R. The Homepage of jACT-R. http://jactr.org/.
- McDermott and Forgy (1977) McDermott, J. and Forgy, C. 1977. Production system conflict resolution strategies. SIGART Bull. 63 (June), 37–37.
- Rescorla and Wagner (1972) Rescorla, R. A. and Wagner, A. W. 1972. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Appleton-Century-Crofts, New York, Chapter 3, 64–99.
- Salvucci (a) Salvucci, D. About ACT-R: The Java Simulation & Development Environment. http://cog.cs.drexel.edu/act-r/about.html.
- Salvucci (b) Salvucci, D. ACT-R: The Java Simulation & Development Environment – Homepage. http://cog.cs.drexel.edu/act-r/.
- Sarna-Starosta and Ramakrishnan (2007) Sarna-Starosta, B. and Ramakrishnan, C. R. 2007. Compiling constraint handling rules for efficient tabled evaluation. In In 9th International Symposium on Practical Aspects of Declarative Languages (PADL).
- Stewart and West (2006) Stewart, T. C. and West, R. L. 2006. Deconstructing ACT-R. In Proceedings of the Seventh International Conference on Cognitive Modeling. 298–303.
- Stewart and West (2007) Stewart, T. C. and West, R. L. 2007. Deconstructing and reconstructing ACT-R: exploring the architectural space. Cognitive Systems Research 8, 3 (Sept.), 227–236.
- Sun (2008) Sun, R. 2008. Introduction to computational cognitive modeling. In The Cambridge Handbook of Computational Psychology, R. Sun, Ed. Cambridge University Press, New York, 3–19.
- Taatgen and Anderson (2002) Taatgen, N. A. and Anderson, J. R. 2002. Why do children learn to say “broke”? a model of learning the past tense without feedback. Cognition 86, 2, 123–155.
- Taatgen et al. (2006) Taatgen, N. A., Lebiere, C., and Anderson, J. 2006. Modeling paradigms in ACT-R. In Cognition and Multi-Agent Interaction: From Cognitive Modeling to Social Simulation. Cambridge University Press, 29–52.
- Young (2003) Young, R. M. 2003. Should ACT-R include production refraction? In Proceedings of 10th Annual ACT-R Workshop.
Appendix A Formalization of ACT-R
In this section, the fundamental concepts of ACT-R are formalized and transferred to CHR.
a.1 The Basic Unit of Knowledge: Chunks
ACT-R is a symbolic production rule system, i.e. all declarative information is represented in form of symbols and associations of symbols and the procedural information is stored in form of production rules transforming the declarative information. Hence, the ACT-R production system is defined over a set of symbols . The smallest unit of declarative information is a chunk, which basically is a structured assembly of symbols. It has a unique name and a number of labeled slots which can hold one single symbol. The chunk names and the slot labels are symbols themselves. If a chunk has a symbol naming a chunk in its slot, the two chunks are connected. We require the unique-name assumption for symbols. The concept of chunks and their connections in form of chunk stores is defined in section A.2.
a.2 Chunk Stores
We extend the abstract notion of chunks given in section A.1 to a definition of chunk descriptions embedded into chunk stores which represent a network of chunks with the help of three relations.
Definition 1 (Chunk Description)
A chunk with name and type and corresponding slots and values can be represented as a term .
Definition 2 (Chunk Store)
A chunk-store over a set of symbols is a tuple , where is a set of chunk identifiers and a set of primitive elements both identified by unique names. The values of are defined by the set . is a set of chunk-types. The set then denotes the set of all type names. A chunk-type is a tuple with a unique type name and a set of slots where is the set of all slot names. The sets , , and are disjoint: .
and are relations and are defined as follows:
The relation has to be right-unique and left-total, so each chunk has to have exactly one type. A chunk-store is type-consistent, iff the following two conditions hold:
With this definition, a chunk store can be implemented directly in CHR by defining the sets and relations as constraints. The constraint
chunk(C,T) is a condensed representation of the set of chunk symbols and the relation. This is possible, since each chunk in has exactly one type. The ternary relation is represented by constraints of the form
chunk_has_slot(C,S,V) stating that .
Chunk types can be represented by a constraint
T is the symbol denoting the chunk type and
S is a list of symbols for the slots. Note that there can be added rules to ensure type-consistency and uniqueness of the relations as defined in definition 2.
a.3 Buffer Systems
Definition 3 (buffer system)
A buffer system is a tuple , where is a set of buffer names, a type-consistent chunk-store and a right-unique relation that assigns every buffer at most one chunk that it holds. Buffers which do not appear in the relation are called empty.
A buffer system is consistent, if every chunk that appears in is a member of and is a type-consistent chunk-store. It is clean, if its chunk-store only holds chunks which appear in .
In CHR, the set and the relation can be represented as a constraint
buffer/3 which holds the name of the buffer, the corresponding module (needed for requests) and the name of the chunk it holds as a reference to the chunk store. This is possible since each buffer holds at most one chunk. Empty buffers can be represented by the empty symbol
nil. For each buffer, there must be exactly one
buffer constraint. This transforms the relation to a left-total and right-unique relation.
a.3.1 Production Rules
Definition 4 (Production Rules)
An ACT-R production rule is of the form
(p name buffer_test* ==> action*) where
name is a unique symbol indicating the name of the rule. Buffer tests are also denoted as the left-hand-side (LHS), actions as the right-hand-side (RHS) of a rule. A buffer test has the form
=buffer> isa t s1 v1 ... sn vn where the symbol
buffer references the name of the tested buffer and the rest stands for a chunk description for a chunk with arbitrary name . The values can be symbols or variable symbols, where variable symbols are indicated by the prefix
An action has the form
#buffer> s1 v1 ... sn vn where the
# is a place-holder for the available actions
- denoting modifications, requests and clearings respectively. The other symbols are defined as for the buffer tests. Note that for requests, the first slot symbol must be
isa followed by a chunk type as value. The values might be variables again, but have to be bound on the left-hand-side of the rule i.e. appear on LHS.
Definition 5 (Applicability of a Production Rule)
A production rule with buffer tests
> isa is applicable in a buffer system iff where denotes a set of values in and the variable symbols used on the LHS.
a.3.2 Translation of rules
The production rules as defined in definition 4 operate on the buffer system: They match the content of the buffers and transform it with a defined set of actions. Hence, an ACT-R rule can be transferred to a CHR rule
H ==> G | B, where the head
H and guard
G represent the applicability condition of the rule as defined in definition 5 and the body
B contains the actions.
The applicability of a rule in definition 5 can be translated directly to the CHR counterparts of the relations. I.e. each relational condition in the applicability condition is expressed by the respective constraint in the head of the rule. The guard is filled with the conditions from . Note that the condition has a set-based semantics (since idempotency can be reduced in classical logic). I.e., for the special case of duplicate tests on the LHS of a production rule, additional rules have to be generated with all possible combinations of unifications of duplicate pairs to implement the set-based semantics of CHR as shown in [Sarna-Starosta and Ramakrishnan (2007)]. In the following, we assume that the production rules are duplicate-free.
The actions of a production rule transform the buffer system in the way as they have been defined in section 2.1. The transformations of the buffer system can be realized in CHR by using destructive update as described in [Frühwirth (2009), p. 32]. I.e. each action has a trigger constraint
action/2 which gets the name of the buffer and the specification of the action encoded as a chunk-description (see definitions 1 and 4). The trigger constraints then use abstract methods to access the buffer system like
set_buffer to set the content of a buffer. This simplifies the compilation and the form of the resulting rules, since the constraints representing the relations of the buffer system only appear in the kept head of the resulting CHR rules and never in the removed head. Additionally, it simplifies extensions and adaptations of the actions, since the compiler must not be changed but only the framework implementing the actions. One adaptation of the simplest form of actions which only apply the changes to the buffers is shown in section 2.3 when we introduce scheduling to postpone the actual application of the changes an action performs.
Appendix B Evaluation Results
In this appendix we list the results of our experiments as described in section 4.
In table 1 and 3, the used samples of player 2 and player 3 are listed. Table 2 and 4 show the frequencies of rock, paper and scissors within one sample. The sum is a control value and ensures that 20 moves have been produced per sample. The values denote the probabilities of rock, paper and scissors respectively.
b.2 Reinforcement-Learning-Based Utility Learning
Tables 5, 7 and 9 show the results of the ACT-R implementation of player 1, 2 and 3 respectively. In tables 6, 8 and 10 the results of our CHR implementation can be found. The values denote the utilities for rock, paper and scissors respectively, where the other values show the performance of the model in the corresponding sample as a control of equal program flows of the two implementations.
b.3 Success-/Cost-Based Utility Learning
Tables 11, 13 and 15 show the results of the ACT-R implementation of player 1, 2 and 3 respectively. In tables 12, 14 and 16 the results of our CHR implementation can be found. The meaning of the values corresponds to Appendix B.2.
b.4 Random Estimated Costs
Tables 17, 19 and 21 show the results of the ACT-R implementation of player 1, 2 and 3 respectively. In tables 18, 20 and 22 the results of our CHR implementation can be found. The results have been produced by the first sample of each player and have been run 50 times. The meaning of the values corresponds to Appendix B.2. In the tables containing the results of the CHR implementation, we added the error squares of the averages over all runs to compare them to the reference implementation.
|Error Square of Average||0.145||0.000||0.000||0.10||0.10||0.00|
|Error Square of Average||0.850||0.000||0.098||0.032||0.090||0.014|
|Error Square of Average||2.823||0.503||0.003||0.014||0.048||0.010|