We investigate the trade-off between the processing and storage functions of working memory in Sudoku, one of the most popular puzzles. During a game, players visually scan the Sudoku grid in a continuous manner searching for cells containing information that can be propagated throughout the grid to narrow down the degrees of freedom of empty cells. These scanning skills are an integral part of the skill set required for solving the Sudoku puzzle, and the players learn gradually how to use them properly.
Since our interest is to understand the cognitive functions and working memory within Sudoku, we only consider players playing the game without note-taking abilities. External note-taking extends players’ working memory with an external holder of information. This type of game play is not suitable for this investigation because it does not give us the right insight into the player’s cognitive capacity.
Depending on the complexity of the scanning patterns used for improving situation awareness, the scanning activity can become strenuous. This is because the player must use the working memory for handling both the current scanning task (space for executing skills), and the information resulted from scanning (situation awareness). Between the two functions of the working memory, there is always a trade-off , given its finite capacity. Understanding how these two functions interact would help us in designing training and education programs that are cognitively plausible and that account for this trade-off.
2 Background on working memory
Early memory models emphasized one of the two functions of working memory. On one side, the multi-store model , suggested that memory was a series of stores, i.e. the sensory memory, the short-term memory, and the long term memory, focusing on the storage function of memory. On the other side, the “levels of processing” model  concentrates on the processes involved in memory. It considers memory as a consequence of the depth of information processing, with no clear distinction between short-term memory, long-term memory or other stores.
Later, Baddeley et al.  introduced the concept of Working Memory (WM) and showed that short-term memory is more than just one simple store. WM is still short-term memory, but instead of all information going into one single store, there are different subsystems with different functions (e.g. visual, auditory, etc.). The resource-sharing approach on memory adds on Baddeley’s work and emphasizes the “storage versus processing” paradigm. Case et al.  consider that WM accounts for the processing resources of an individual, and is the sum of a storage space and an operating space. They show that the more complex the current processing task is, the more operating space it uses, leaving less space for information storage. This was further supported by other studies which assumed that human performance in various cognitive tasks is strongly related to working memory [12, 3].
Another aspect involved in the study of working memory is the forgetting mechanism associated with its normal operation. The displacement theory  assumes the working memory is a first-in-first-out (FIFO) queue with limited capacity, and explains how the most recent information stored in the memory is easier to be recalled (recency effect). The trace decay theory  assumes that the events between learning and recall have no effect on recall, the essential influencing factor being the period of time the information has been retained. The interference theory challenges the decay concepts, and attributes forgetting to the existence of interference [8, 5]. A review of the forgetting theories can be found in .
From a computational perspective, numerous studies proposed various combinations of working memory models and theories of forgetting, in order to instantiate plausible behaviors. Two major approaches have been proposed over the years: the localist and the distributed representation of working memory. In the localist approach, the memory is seen from an item-unit perspective, in which the informational items are stored in corresponding units situated in certain locations of the memory. The Competitive Queueing and the Primacy  models propose time-related displacement and displacement only, respectively, whereas the Start-End  model proposes a multi-level displacement-based approach. The distributed approach sees an informational item as distributed throughout the memory space rather than as in the item-unit equality view. Thus, in the distributed approaches, the focus shifts from the localist practice of retrieving items from their corresponding locations, to identifying patterns of activation which recompose informational items from their parts distributed through multiple layers of interconnected units. A displacement-based distributed approach was proposed by Lewandowsky and colleagues  based on the Theory of Distributed Associative Memory. Later, models such as OSCAR  and SIMPLE  considered interference-based implementations in which the working memory is hierarchical in structure, oscillatory in time, and contextually activated.
From the Sudoku puzzle perspective the scanning skills are forming situation awareness picture required to solve the Sudoku game, which is further stored in the WM. The scanning pattern is based on Sudoku cells belonging to the grid. This suggests a localist approach for the WM, in which each step in the scanning pattern produces a scanned item corresponding to a memory unit. We consider that the items are stored in a FIFO queue, as in the displacement theory.
Figure 1 presents the social learning framework and the internal structure of an agent. The agents in the society are endowed with fixed working memory and adaptive ability to choose the scanning skills to be loaded into memory and get executed. Their resultant Sudoku proficiency is tested in a tournament with several rounds of increasing difficulty. In each round one game is proposed, which all agents try to solve. If no agent is able to successfully fill the grid at the end of the game in the first attempt, the agents learn socially from each-other how to adapt their skill selection towards a better proficiency, and the game is replayed in the new conditions. The learning process continues until at least one agent becomes proficient enough to complete the grid or until learning does not bring any more improvement. The tournament continues with the next round where a game with increased difficulty is proposed and the process is repeated.
3.1 The agent
The scanning skill set: Each agent can choose from a set of scanning skills, constant over the whole society. The complexity of a skill can be related to the amount of information that must be stored in the working memory to describe the scanning pattern. The skill set shown below presents the number of scanned cells and the number of memory units needed for storing them. Since we adopt a localist approach on working memory, the two numbers are equal. Each skill is named as COL for scanning a column, ROW for scanning a row, or BOX for scanning a box. The subsequent digit in the name represents the number of dimension of this scanning activity; thus, the size of the storage required to perform this scanning task. Skill code 1 2 3 4 5 6 7 8 9 10 Skill name ROW3 ROW5 ROW7 ROW9 COL3 COL5 COL7 COL9 BOX5 BOX9 Grid cells 3 5 7 9 3 5 7 9 5 9 Mem. units 3 5 7 9 3 5 7 9 5 9
The skill selector is a vectorcontaining the weights associated with each skill in the skill set, where . The agent loads the first skills with the highest weight that fit in the skill memory . The selection vector is initialized at the beginning of the simulation for each agent, then it is updated during the tournament as part of the learning process.
The working memory: The working memory has two components: the skill memory , and the situation awareness memory . The sizes of the two memory components are predefined for each agent throughout the tournament, and they differ from one agent to another, but their sum is identical for all agents in the society, as shown in equation 1. In other words, the total working memory space is maintained constant for each agent in the society.
The skill memory stores the representation of the skills selected by the skill selector. At the end of each game or round, the skill memory is erased and then reloaded with the new skills selected by the skill selector as a result of the learning process. The new skills are to be used in the next game if the game must be replayed or in the next round if a new round must start. Thus, the skill memory is rewritten through incremental social learning, each time a game is replayed in a round of the tournament.
The situation awareness memory stores the results of the scanning process and is modelled from a displacement theory perspective, as a FIFO queue of size . In this queue, the information discovered by applying the skills currently stored in skill memory are stored in continuation of those stored at previous step. Thus, older information, which exceeds the queue size is pushed out and lost.
3.2 The tournament
Playing a game: During a game an agent scans the Sudoku grid in order to form its situation awareness picture which allows the propagation of Sudoku constraints/rules. The agent visits each empty cell of the grid and applies to it the selected skills. After the agent applies the skills on all empty cells and propagates the Sudoku rules, a score is given to the game reflecting on agent’s proficiency.
Scoring a game: Agents receive scores based on the remaining degrees of freedom of empty cells in the grid after a fixed number of allowed steps. The degree of freedom for an empty cell, , is the number of possible candidates found after propagation of domains. If the Sudoku grid is complete at the end of the game, there is no degree of freedom left. If the grid still has empty cells, the degrees of freedom in each empty cell are added, generating , the total degree of freedom for agent . The performance of an agent is inversely proportional to : the less degrees of freedom remaining at the end of a game, the better the agent performs. Thus, the score is defined in Equation 2, where is the total degrees of freedom for agent , and is the maximum total degrees of freedom over the society.
Learning: At the end of a game, with or without completion of the grid, each agent updates its skill selection vector using the experience of the other agents. The update can be viewed from two perspectives, which we combine: the current agent searches for the agent with the highest score and most similar memory. Thus, the agents in the society learn from other agents with as similar memory (cognitive capacity) as possible, and score as high as possible.
First, we define a similarity metric for working memory of agents, as in Equation 3, where and are the awareness memory size of agents (current agent) and (an agent from the society) respectively, and and are the maximum and minimum sizes of the awareness memory. If then the memory similarity of an agent to itself .
Then, we couple the memory similarity metric with the score, as in Equation 4, where is the fitness between current agent and an agent from the society. The agent corresponding to the will be the agent in the society which is best fit for participating in agent learning process. We note that the fitness of an agent to itself equals its score in the Sudoku game, since .
The amount of participation of agent in agent ’s learning is given by the actual value of the maximum fitness coefficient, hence, the update function becomes as in Equation 5 and applies when .
3.3 The experimental setup
The skill selection vector is initialized with random weights for each agent at the beginning of tournament. Ten sets of experiments are run with different seeds.
The total working memory for each agent in the society is 54 units, corresponding to 54 digits associated with the Sudoku puzzle. Within the total memory, the skill memory and the situation awareness memory can be tuned in the range between 9 and 45. The agents in the society are endowed with a situation awareness memory ranging between 9 to 45 progressively with an increment of 2. The skill memory follows the opposite variation pattern. Consequently, the society has 19 agents.
The tournament consists of nine rounds of increasing level of difficulty, where difficulty is associated with the initial number of non-empty cells existent in the grid. In this study, grids with 76, 74, 71, 67, 62, 56, 49, 41 and 32 initial non-empty cells are used for the 9 tournament rounds in that order.
4 Results and discussion
Figure 2(a), shows that skill 8 (COL9) is the most used by the agent society throughout the Sudoku tournament, followed by skill 1 (ROW5). However, a skill being chosen may not imply it produces high scores in Sudoku games. We test this by investigating what are the skills most used for obtaining the highest scores. Figure 2(b) demonstrates that indeed skill 8 and skill 1 are also the most effective scanning skills, given the experimental setup used in this paper. We understand that skill 8 is the most effective skill, followed by skill 1.
Further, we select the skill with highest proficiency (skill 8) and investigate for which size of the it is most used. This is equivalent to searching which agent used this skill most, since the agents differentiate from each other through the ratio between skill and situation awareness memory. Figure 2(c) shows that the highest effective scanning skill is used more by agents with high amount of memory reserved for situation awareness storage, and less used by other agents.
We continue to investigate the overall influence on performance of the scanning skills and situation awareness components, in order to see if skills (and subsequently the processing function) prevail over situation awareness (the storage function) or otherwise. Figure 2(d) displays the score at the end of the games for all agents (clue memory sizes) and all difficulty levels throughout the tournament. For low difficulty games, the score is always maximum, with the games being completed regardless of the size of memory or skill complexity. As the game difficulty increases, the difference in score between agents with low and high situation awareness memory becomes significant. The score drops significantly for agents with high situation awareness memory, with the drop starting early in the tournament at difficulty 5. Recalling that high leaves a low amount of working memory to be used for loading skills (), only limited amount, number and/or complexity, of skills can be used. On the other hand, results show that the performance is less affected in the opposite situation, when the situation awareness memory is low. This suggests that the scarcity of skills can jeopardize entirely the ability of an agent to complete the game, whereas severe limitation of situation awareness memory still allows a certain level of performance. We conclude that the presence of scanning skills (inherent ability to process) prevails the storage of situation awareness (ability to store) in the working memory.
We investigated the trade-off between processing and storage functions of working memory in Sudoku. We used a society of agents capable of learning from each-other how to effectively use their existing skills in conjunction with their working memory in order to solve Sudoku games of various difficulty levels. The most used skills, and most importantly the effective skills, i.e. the ones that contribute to acquiring high scores in a Sudoku, have been established. The main finding is that, the scanning skills tend to be more important than the space available for situation awareness.
This project is supported by the Australian Research Council Discovery Grant DP140102590, entitled “Challenging systems to discover vulnerabilities using computational red teaming”.
This is a pre-print of an article published in Lecture Notes in Computer Science, vol 8836, Springer. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-12643-2_69
-  Atkinson, R.C., Shiffrin, R.M.: The psychology of learning and motivation, vol. 2, chap. Human memory: A proposed system and its control processes., pp. 89–195. New York: Academic Press (1968)
-  Baddeley, A.D., Hitch, G.: Working memory. In: Bower, G.H. (ed.) The psychology of learning and motivation: Advances in research and theory., vol. 8, pp. 47–89. New York: Academic Press (1974)
-  Baddeley, A., Chincotta, D., Adlam, A.: Working memory and the control of action: evidence from task switching. J. of Exper. Psyc.: General 130(4), 641 (2001)
-  Brown, G.D.A., T., P., C., H.: Oscillator-based memory for serial order. Psyc. Rev. 107(1), 127–181 (2000)
-  Brown, G.D.A., Neath, I., Chater, N.: A temporal ratio model of memory. Psyc. Rev. 114(3), 539–576 (2007)
-  Burgess, N., Hitch, G.: Computational models of working memory: putting long-term memory into context. Trends in Cog. Sc. 9(11), 535 – 541 (2005)
-  Case, R., Kurland, D.M., Goldberg, J.: Operational efficiency and the growth of short-term memory span. J. of exper. child psyc. 33(3), 386–404 (1982)
-  Chandler, C.C.: Specific retroactive interference in modified recognition tests: Evidence for an unknown cause of interference. J. of Exper. Psyc. 15, 256–265 (1989)
-  Craik, F.I.M., Lockhart, R.S.: Levels of processing: A framework for memory research. J. of Verbal Learning and Verbal behavior 11, 671–684 (1972)
-  Henson, R.N.A.: Short-term memory for serial order: the start-end model. Cog. Psyc. 36, 73–137 (1998)
-  Lewandowsky: Redintegration and response suppression in serial recall: A dynamic network model. Int. J. of Psyc. 34, 434–446 (1999)
-  Lovett, M.C., Reder, L.M., Lebiere, C.: Models of working memory: Mechanisms of active maintenance and executive control, chap. Modeling working memory in a unified architecture, pp. 135–182. NY: Cambridge University Press (1999)
-  Page, M.P.A., D., N.: The primcay model: A new model of immediate serial recall. Psyc. Rev. 105, 761–781 (1998)
-  Towse, J., Hitch, G.: Is there a relationship between task demand and storage space in tests of working memory capacity? Q. J. of Exp. Psyc. 48, 108–124 (1995)
-  Wixted, J.T.: The psychology and neuroscience of forgetting. Annual Rev. of Psyc. 55(1), 235–269 (2004)