Al Aïn et al. (2008) measured the discrimination abilities between discrete and continuous quantities of three African Grey parrots (Psittacus erithacus), showing that their accuracy in choosing between two small quantities was inversely correlated with the ratio between the difference between the two quantities and the largest quantity.
Generalizing the experimental protocol described and implemented by Al Aïn et al. (2008) to other subjects or species present some difficulties. The fact that the experimenter knows which answer was expected from the subjects is not an issue in their study because it was previously verified that the three subjects were unable to read such cues from human experimenters, but it means that the replication of such protocol is limited to individuals (from the same or from other species) which inability to read cues has been previously demonstrated. Beyond such a weakness, the cost of the experimental set-up and of the analysis of the video recordings of the experiments reduces the probability that such a protocol will be replicated with other subjects from the same species, or with subjects from the many other species of parrots existing around the world.
Touchscreens have been successfully used for experiments in life enrichment (Perdue et al., 2012; Kohn, 1994; Coghlan et al., 2021a) and in Comparative Psychology (Egelkamp and Ross, 2018), with individuals from various nonhuman species. Could digital life enrichment techniques allow to replicate Al Aïn et al. (2008)’s results at a lower cost, but with a better precision, and less potential experimental bias? Which additional advantages could a digital variant bring?
Inspired by previous informal Digital Life Enrichment experiments such as a Cockatoo playing the video game Candy Crush (Figure 8), or Monk Parakeets learning to use touch interfaces by playing music on it (Figures 8 and 8), we designed, tested and used a web application InCA-WhatIsMore to digitally replicate and extend Al Aïn et al. (2008)’s experimental setup. We obtained similar results to that of Ain et al. for two individuals of a distinct species of parrots, Monk Parakeets (Myiopsitta Monachus), using an experimental protocol with increased guarantees against potential experimental biases, at a lower set-up cost, with additional advantages brought by the digital context, such as automatic logging and increased subject’s agency. After describing a selection of concepts and results in the research area of comparative psychology (Section 2), we describe the application InCA-WhatIsMore (Section 3), an experimental protocol (including separate development, training and testing phases) based upon it (Section 4), an implementation of this protocol and an analysis of its results (Section 5), and we conclude with a recapitulation of our results, a discussion of their potential weaknesses and a perspective on future research (Section 6).
2. Nonhuman Cognition
The cognitive abilities of nonhuman animals, traditionally less studied than that of human ones, has been receiving more attention in the last half century. Such studies started with the animals perceived to be “closest” to humankind, such as apes, and has spread more recently to birds (Pepperberg, 1999; Al Aïn et al., 2008; Cunha and Clubb, 2018). We describe a general overview of some projects and results about the cognitive abilities of some ape and bird species (Section 2.1); Al Aïn et al. (2008)’s study of the discrimination abilities of some parrots (Section 2.2); how devices (analogical and digital) were introduced to nonhumans in order to improve their well being, and often study their abilities at the same time (Section 2.3); how the distrust in results obtained by improper experimental protocols has plagued scientific research in this area in the past (Section 2.4); and how some general guiding principles in the design of experimental protocols permit scientists to avoid such experimental biases (Section 2.5).
2.1. Comparative Psychology
Comparative psychology refers to the scientific study of the behavior and mental processes of non-human animals (referred to as “nonhumans” thereafter), especially as these relate to the phylogenetic history, adaptive significance, and development of behavior in many different species, from insects to primates. Pepperberg (2020) describes the history of the field of Comparative Psychology of Intelligence in the last 30 years.
Cunha and Clubb (2018)
2.2. Discrimination Abilities in African Grey parrots
Al Aïn et al. (2008) tested the discrimination abilities of African Grey (Psittacus erithacus) parrots on discrete and continuous amounts. More precisely, they investigated the ability of three African grey parrots to select the largest amount of food between two sets, in two types of experiments. In the first experiment type, the subjects were tested on discrete quantities via the presentation of two quantities of sunflower seeds (Deli nature Beyers Belgium), between 1,2,3,4 and 5 seeds. In the second experiment type, the subjects were tested on continuous quantities via the presentation of two quantities of parrot formula, with amounts between 0.2,0.4,0.6,0.8 and 1 ml. For each experiment, the two amounts were presented simultaneously and were visible at the time of choice. Albeit the subjects sometimes failed to choose the largest value, they always performed above chance, their performance improving when the difference between amounts was the greatest.
The experimental setup was completely analogical. A permanent table was set-up in the aviary, and two black pieces of cardboard were used to present food item (sunflower seeds or parrot formula). For each experiment, different amounts of either seeds or parrot formula were placed on each piece of cardboard. The experimenter put the subject for 5 seconds in a position from which they could observe the two sets, then placed them on the table at equal distances from the two sets, letting them chose one set to it while removing the ignored set. The position of the sets (small and large) was pseudo-randomized: the larger set was never presented more than two times on the same side and was presented as often on the right side as on the left side.
In the experimental setup described by Al Aïn et al. (2008), subjects could eventually read involuntarily cues from the experimenter: even though the experimenter was standing behind the subject, at equal distances from each set, not pointing to it, looking at the subject, aiming to avoid communicating any cue to the subject, the experimenter knew where the largest quantity was. While it was not an issue in Al Aïn et al. (2008)’s study because the authors demonstrated in a previous study that the subjects were not able to use any gazing cue, the protocol should not be applied as such to other subjects without verifying their inability to read such cues, adding to the cost of implementing such protocol.
Avoiding giving cues to the subject is hard even for a professionally trained experimenter (Trestman, 2015). Requiring either such training or a separate study to insure that the subject cannot read cues from the experimenter restricts the applicability of a protocol to laboratories. For example, in the context of citizen science projects (Gura, 2013) where non professional experimenters (such as zoo personal or simple citizen) guide the experiments, a masked protocol (defined in Section 2.5) where the experimenters ignore what the correct answer is (because they did not receive the information that the subject did) would be more robust against subjects reading cues from the experimenter. We describe in section 3 an application allowing for such an alternate experimental setup which, if not exactly equivalent to that of Al Aïn et al. (2008) (e.g. the reward is not proportional to the quantity selected), presents the advantage of being “experimenter-masked”, inspired by some of the life enrichment experiences described in the next section.
2.3. Life Enrichment and Cognition studies
One can study cognitive abilities of nonhumans through life enrichment activities in general, and through digital ones in particular. General preoccupation for the welfare of captive nonhumans is at least 150 years old. Kohn (1994) dates the first legislation about zoo animal welfare to 1876, with the “Cruelty to Animals Act” in the “Criminal Code of Canada”. Since then, the list of duties of such institutions has grown to include not only the basic welfare tenets of adequate feed, water, shelter, sanitation and veterinary care of their nonhuman residents, but also higher level concerns such as the handling and training of the nonhuman residents, their psychological well-being, the design of their enclosures, the preservation of their species, issues of environmental and conservation, and programs to breed captive nonhumans. Kohn (1994) mentions (in 1994) the “emerging field of psychological well-being in captive animals”, incorporating physical health, normal and captive behavior, and interactions with the enclosure environments and mentioning how environmental enrichment is an important component of this issue. He goes on to list innovations in life enrichment such as specialized toys and puzzle feed boxes (but no digital applications).
Yet, the use of digital applications to measure nonhuman abilities seems to predate Kohn (1994)’s report by at least 10 years. In his discussion of the impact of game-like computerized tasks designed to promote and assess the psychological well-being of captive nonhuman, Washburn (2015) refers to a three decade old history in 2015, placing the beginning of such use sometimes around 1985. In 1990, Richardson et al. (1990) describe a quite complete Computerized Test System. They tested their system with a population of rhesus monkeys, but defend its potential as a “rich and robust testing environment for the cognitive and neuro-psychological capacities of great apes, rhesus monkeys, mentally retarded and normally developing children, and adults”, so that subjects from various populations can be tested under comparable conditions in such a way that “control is increased over testing conditions”. They mention that “the animals readily started to work even when the reward was a small pellet of chow very similar in composition to the chow just removed from the cage”, and that “the tasks have some motivating or rewarding of their own”.
Nonhuman subjects seem to enjoy participating in cognitive studies involving game-like digital applications. Washburn (2015) describes, among various other anecdotes, how game-like application for apes were developed as early as 1984, and how the subjects “chose to work on joystick-based tasks, even though they did not need to perform the game-like tests in order to receive food”, and “opted for computer task activity over other potential activities that were available to them”. He goes on to mention how such game-like activities have been used to study various cognitive phenomena such as the ability to learn, memory, attention, perception, categorization, numerical cognition, problem solving, the ability to reason, the ability to make decisions, meta-cognition, social cognition and language. Among the details reported on the methodology, he mentions that incorrect responses typically produced auditory feedback, frequently accompanied by a time-out period, but that no other punitive method was used to promote productivity, accuracy or rapid responding. Lastly, he describes evidence that the nonhumans are not only motivated by food rewards, but also by the enjoyment of the tasks themselves: when given a choice between completing trials for pellets or receiving pellets for free but not being able to play the game-like tasks during the free-pellet period, the monkeys chose to work for their reward.
The use of digital applications might benefit nonhuman in less directed ways too, by raising awareness and respect of t their cognitive abilities among the public. Coghlan et al. (2021b) examined how digital technologies can be used to improve ethical attitudes towards nonhumans (focusing on nonhuman apes kept in zoos) by introducing digital technologies in zoos for both animal enrichment and visitor education. Both analogical and digital setups must be careful to avoid experimental biases: we describe two particularly relevant ones to this work in the next section.
2.4. Experimental Biases
The history of Comparative Psychology has been prone with fights about the validity of methodologies and results: Pepperberg (2016) describes various such tensions between researchers about the cognition of animals, with some accusing other researchers in the field to be “liars, cheats and frauds”, and she highlights how sign language researchers were accused of “cuing their apes by ostensive signals” and of “consistently over-interpreting the animals’ signs”. We explore here two issues relevant to the experimentation protocol described in this work, namely selective reporting bias (Section 2.4.1) and the “Clever Hans” effect (Section 2.4.2).
2.4.1. Selective Reporting Bias
Selection biases occur in a survey or experimental data when the selection of data points is not sufficiently random to draw a general conclusion. Selective reporting biases are a specific form of selection bias whereby only interesting or relevant examples are cited. Cognitive skills can be particularly hard to study in nonhumans, requiring unconventional approaches but often presenting the risk of such biases. For example, an experimenter who would present a subject repeatedly with the same exercise could be tempted to omit or exclude bad performances (eventually attributing them to a “bad mood” of the subject, which stays a real possibility) and report only on good performances, creating a biased representation of the abilities of the subject, a selective reporting bias.
Whereas Bates and Byrne (2007) defends the use of anecdotes in comparative psychology, he does so “provided certain conditions are met” so that to avoid such biases. defining an anecdotal method in five steps:
Source Material Assembly;
Definition of the extraction process;
Categorization of extracted records;
Labeling of each record with a level of evidence (from ambiguous to highly suggestive);
Hypothesis generation to guide future studies.
He emphasises the “need to use neutral descriptions of behaviour that avoid implicit interpretation, recorded by experienced and knowledgeable observers immediately after the occurrence of the event.”, and that all observations of rare events should be made available for later analysis by the anecdotal method. We describe how to use a digital application to systematically log the result and easily avoid such bias in Section 3. Another type of bias is that of the subject reading cues from the experimenter, which we describe in the next section.
2.4.2. “Clever Hans” effect
Among such methodological issues resulting in experimental biases, the most iconic one might be the eponymous horse nicknamed “Clever Hans” which appeared to be able to perform simple intellectual tasks, but in reality relied on involuntary cues given by not only by their human handler, but also a variety of human experimenters, as related by Trestman (2015):
“In the early years of the 20th century an unlikely and controversial celebrity rose to fame in Germany and internationally: a horse, aptly named Clever Hans, who apparently displayed a startling aptitude in mathematics as well as music theory, not to mention the ability to identify colors and individual people by name, read German, and answer a variety of questions in normal spoken German. He responded to questions primarily via a code based on repeatedly tapping his right hoof, in combination with other responses such as nodding or shaking his head to indicate yes or no, and pointing to objects or photographs with his nose.”
The story is famous of course for how it illustrates how nonhumans can often more easily learn to read cues from the experimenter than to solve the problem asked from them. One ignores such rules not only at its own risk, but at the risk of hurting a whole research area: in her recapitulation of the history of animal language studies, Pepperberg (2016) describes how coverage of issues about the validity of Comparative Psychology methodologies and results in the public media in 1980 moved government agencies to respond to the blow-back by cutting off the funding for all of the related studies. While issues such as over interpreting subject’s signs or selectively reporting experimental reports can be avoided with the appropriate amount of rigor (eventually with some computerized help, as discussed in Section 6.1.8), avoiding having subjects reading experimenter’s cues requires special care to be taken when designing the experimental protocol: we describe in the next section some guiding principles which exclude the very possibility of such biases from experimentation results.
2.5. Masked Experimental Protocols
It is possible to avoid the confusion between a subject’s ability to read cues from the experimenter from its ability to answer the tests presented to them by such an experimenter. The principle is quite simple: make sure that the experimenter does not know the test, by having a third party out of reach from the subject’s reading to prepare the test. Whereas such experimental setup was historically referred to as a ”Blind Setup” or a ”Blinded Setup”, we follow the recommendations of Moris et al. (Morris D, 2007) and prefer the term of ”masked” to the term ”blind” when describing the temporary and purposeful restricted access of the experimenter to the testing information.
In an analogical setting, the set-up of a masked experimental protocol is more costly than that of less careful ones. For example, Cunha and Clubb (2018) describe an experimental protocol where an assistant prepares a pile of prompt cards, which the experimenter presents to the subject without looking at their content until after the subject responded, in order to know whether to praise and reward them or not. We describe our digital set-up for a masked experimental protocol in Section 3.2: the digital device completely replaces the assistant, and assists the experimenters, telling him whether the subject answered correctly or not. As long as the device is placed in such a position that the subject can see the visual display but the experimenter cannot, there is physically no way for the subject to read cues from the experimenter, hence avoiding the “Clever Hans” effect.
In the next section, we describe an application designed so that to facilitate a type of “masked” experimental set-up, in which it is guaranteed that the ability of the subject to read cues from the experimenter does not affect the result of the experiment, as the experimenter himself ignores the question (and hence its correct answer) being asked to the subject.
3.1. Application’s Structure
The web application is composed of four views. The first two, the Main Menu (described in Figures 12 and 16) and the Gaming View (which can be seen in Figures 26 and 19 among others), are especially designed to be navigable by nonhuman subjects. The access to the two others, the settings (see Figures 16 to 16) and the information views are tentatively restricted to the experimenters by requesting the long pressing of a button.
3.1.1. Main Menu
The view of the main menu is accessed when the application is opened: see Figure 12 for a screenshot, and Figure 12 for a picture of a subject using it to select a display mode. From this view, the user can navigate to the other views of the application. On the center of the screen are four figures, each one representing a different visualisation mode used on a random value. Two of such modes are discrete: one is representing the value as a number of dots on a dice face, the other on a 3 by 3 grid (as that of the dice) but as a heap of dots. The two other modes are more continuous: one is representing each value by a recipient more or less filled with liquid according to the value, the other by a circle of radius growing with the value. The user (let it be the experimenter or the subject) can pick any of the four display modes in order to start the game in it. At the bottom of the screen stands a button to access the settings section, activated after a long press (which length is set up in the setting view).
3.1.2. Gaming view
The most important view is the gaming view, allowing the subject to “play”: see Figure 26 for a screenshot, and Figures 2, 2, 23, 23, and 26 for pictures of subjects playing the game. The view displays a set of values in some display mode, requesting the user to choose the largest one. Each action triggers an audio feedback, indicating if it was correct or wrong. After a given number of exercises, the game ends and give an audio feedback about how the score for this game placed with two boundaries (boundaries which can be modified in the settings page, as well as the words being vocalized in each audio feedback). The view has also an exit button on the top left corner intended to be usable by the user, and a settings button actionable by long pressing it by a parameterized amount of time.
Whereas the subject can choose the display mode in which they prefer to play, and exit the game view to change it at any time, other settings are accessible on a more technical view, designed for the experimenter to set-up other aspects of the software: the settings view (Figures 16 and 16). As the visual and sound outputs of touch screen devices were designed for humans, and as very little is known about the adequateness of such outputs for nonhumans, the software was designed so that to maximize the control given to each experimenter on the visual and sound output of the application, so that each experimenter can find the most adapted settings to the characteristics of the species and/or subjects with which the software is used. As such, the software permits the experimenter to change, among other things, the color schemes and the number of the values displayed, the domain from which such values are randomly chosen, and the number of questions before a game is ended: it is hoped that such parameters will be useful in future studies.
This last view, a priory accessed only by the experimenters, displays various information about the application, such as its version, an overview of features soon to be added and recently added, instructions of usage use, references and acknowledgments to collaborators.
We describe how to use such a web application so that to implement a masked experimentation protocol where the subject cannot read any clue from the experimenter, because the experimenter ignores the question given to the subject.
3.2. Masked Experimental Setup
Among other features, the web application was designed to facilitate digital experiments similar to that performed by Al Aïn et al. (2008) but in a way such that the experimenter does not know what side the “correct” answer is, a masked experimental setup. This insures that the subject cannot receive any voluntary or involuntary cue from the experimenter. Such a purpose is achieved through the extensive audio feedback system, which aims at notifying the experimenter about any event which requires their intervention (e.g. rewarding or encouraging the subject, or acknowledging that the subject does not want to play this game any more), so that they do not need to check the screen of the device at any point.
3.3. Logging structure
In non digital experiments in comparative psychology, the experiments is usually recorded on video so that the video recording can be later processed in order to generate an extensive log of the interactions of the subject during the experiment. Such a task is long and tedious, and no video processing software is yet able to automatize such a process. An important advantage of a digital experimental set-up such as that allowed by the software InCA-WhatIsMore is the ability to automatically log the interactions of the subject with the application.
The logs are exported in the top part of the setting page of the application (previously described in Figure 16, on page 16). Two formats are available: the first format, .txt is designed to be easily readable by humans; while the second format, .csv is more adequate for machine processing. The software InCA-WhatIsMore generates logs with data to be analyzed by researchers, including information on both the test performed and the subject’s performance (see Figure 21 for a short extract, and Figure 27 for a longer one).
|Test no||Test Name||C0||C1||C2||C3||C4||Value selected||Correction||Date||Other Parameters|
|1||dice||1||4||4||true||[2022-05-19 17:02(25.981)]||Value Set [1,2,3,4,5]|
|81||rect||4||2||3||3||false||[2022-05-19 17:26(55.124)]||Value Set [1,2,3,4,5]|
|180||heap||3||2||1||2||false||[2022-05-19 17:35(06.6)]||Value Set [1,2,3,4,5]|
We describe here the format of the logs in versions of the software. The first three rows generated within the log indicate information about the test:
Test Name: The test performed, the value is the type of representation, this can be ”dice”, ”heap”, ”rect” or ”disc”.
Learner: The name of the test subject, used to subsequently run analyses such as performance over time or to differentiate its statistics.
The name of the experimenter. This could be used in later studies where various experimenters apply the same test to the same subject, to check for variance in performance from one experimenter to the other.
The following columns indicate quantitative information about the distribution of quantities within the test, as well as information about the test subject’s performance.
C0,C1,C2,C3: The qualitative representation of the quantities delivered, these can be discrete or their representation in continuous quantities, the order of distribution of these quantities also indicates the order deployed within the application, the values being ordered from left to right.
Value Selected: The value chosen by the test subject.
Correction: The correctness of the value selected by the test subject, being ”true” if it is the largest amount and ”false” otherwise.
The last columns indicate qualitative values about the test, these values provide information about both the performance and the setup of the test.
Date: The date on which the test was performed, in timestamp format, including the precise time in milliseconds.
Answering Time (ms): The time it takes for the test subject to respond from the display of the quantities to be chosen, represented in milliseconds. Note that this is more precise than the simple difference between the times of two consecutive time stamps, as the application includes (parameterizable) waiting time between tests, a pause between games, break to return to the menu, etc.
Other Parameters: Parameters that visually describe the display of the quantities, such as the color of the background and the color in which the representations of the quantities are displayed. These parameters were modified only in the development phase in order to find an adequate color scheme for the two subjects in particular (see Section 6.2.3 for a discussion of the sensory variability of general test subjects), but could be used in the future to adapt the software to other individuals, potentially from other species with distinct sensory ranges; and to formally study (ethically, see Section 6.3.5 for a discussion of the related challenges) the limits of the sensory range of any subject.
In the next section, we describe the training and experimental protocol which was used to generate the data measured by such logs.
4. Experimentation Protocol
The experimental protocol was divided in three phases, which we describe in Section 4.1. The precautions taken to protect the well-being of the nonhuman subjects (described in Section 4.2) were validated by the Institutional Animal Care and Use Committee (IACUC) (in Spanish “COMITÉ INSTITUCIONAL de CUIDADO y USO de ANIMALES (CICUA)”) of the researcher’s institution. The statistical analysis (described in Section 4.3) were scheduled as part of the experimental protocol, independently from the results of the experiments.
4.1. Phases of the protocol
The protocol was implemented in three phases: a phase of development (of the software) with only one subject (the first one), a phase of training with two subjects and a mix of unmasked and masked protocols, and a phase of testing using the masked protocol and collecting data with both subjects.
During the development phase, a single subject (hereafter referred to as “the first subject”) interacted with the various prototypes of the software, in a non masked experimental setting where the experimenter could observe the display of the screen. Each time the software was modified, it was tested by two humans subjects before being used by any of the nonhuman subjects, in order to minimize the potential frustration of the nonhuman subjects while using a malfunctioning application.
During the testing phase, both subjects were invited to use the software, each on its own device, this time in a masked experimental setting where the experimenter could not observe the display of the screen, so that they ignored the question asked to each subject and could not cue them, and limiting themselves to encourage and reward each subject according to the feedback vocalized by the application (see Figures 19 to 19, on page 19, for examples of masked experimental setups).
The subjects’ welfare was cared for during each of those three phases: we describe some of the precautions taken in the next section.
4.2. Ethical Precautions
develop how this is in accordance both with the recommendations from Mancini et al.(2022-FVS-TheCaseForAnimalPrivacyInTheDesignOfTechnologicallySupportedEnvironments-PaciManciniNuseibeh; 2022-FAS-RelevanceImpartialityWelfareAndConsentPrinciplesOfAnAnimalCenteredResearchEthics-ManciniNannoni; 2019-ACI-AMethodForEvaluatingAnimalUsability-RugeMancini; Mancini, 2017; 2016-ACMI-IntroductionFrameworksForACIAnimalsAsStakeholdersInTheDesignProcess-NorthMancini) and the ACI board, and in the interest of the validity of the experience and the mental health of the researchers.
Mention 2021-Area-BeyondHumanEthics-Oliver (2019-ACI-AMethodForEvaluatingAnimalUsability-RugeMancini) (Mancini, 2017)
4.2.1. Physical settings
The subjects were hosted in a private residence counting with three aviaries, each large enough to allow some amount of flight: one “laboratory” building with meshed windows containing a “Nest” aviary with a simple door, of size and high, containing a nest, a plant and various toys and nesting material; one “South” aviary, corridor shaped with two sets of double doors, of of surface size and high; and one “North” aviary with one set of double doors, of of surface size and high. The subjects were mostly hosted in the “Nest” aviary, but transported to other aviaries (with their consent) to allow them to fly on slightly larger distances (6m), getting sun exposure, access distinct physical games and more generally to vary their stimuli. The sessions of the development, training and testing phases were almost always realized in a training area next to the opening of the “Nest” aviary, and in a few occasions insider the larger “North” aviary. At no point were the subjects food or water deprived: at any point they could fly to their housing space, where food and water was available. The sessions always occurred on one of three similar wood frames (see Figure 23), so that to offer a familiar setting even when the location of the training changed (e.g. in the “North” aviary). Even though the digital devices had to be replaced at some point, those were always hold on the same wood structure (etched with the name of the subject to which it was assigned), so that to facilitate the recognition of which device was assigned to which subject. The subjects were weighted on a weekly basis to detect any variation which could indicate a potential health issue, and brought to a licensed veterinarian twice a year.
4.2.2. Application Usability
Summarize 2019-ACI-AMethodForEvaluatingAnimalUsability-RugeMancini’s rules. In order to minimize the potential frustration of the subjects when facing inadequate answers from the application, each version of the application was systematically tested by two human subjects, and any issue detected during such a phase corrected, before being presented to the nonhuman subjects. During the phase of software development, when a feature of the application (whether due to an error or to an setting proved to be inadequate) was encountered to frustrate the subjects, the use of this application was replaced by another activity until the software was corrected, tested and separately validated by two human subjects.
4.2.3. Sense of Agency
Both physical and virtual aspects of the protocol were designed so that to maintain a sense of agency in the subjects. READ and maybe mention the last (April 2022) article from 2022-FAS-RelevanceImpartialityWelfareAndConsentPrinciplesOfAnAnimalCenteredResearchEthics-ManciniNannoni about consent The physical setting of the experimentation was designed so that to insure that the subject’s participation was voluntary during all three phases of the process: the subjects were invited to come to the training area (but could, and sometime did, refuse); at any time the subjects could fly from the training area back to their aviary, to a transportation pack with a large amount of seeds suspended above the training area, or to an alternate training area on the side, presenting an alternate choice of training exercises. Concerning the psychological aspects, the main menu of the application was designed so that each subject can choose in which visualisation mode they wish to play (see Figures 12 and 12), and a large orange “exit” button is present on the playing screen allowing the subject to signal that they do not wish to play this game any more, prompting the experimenter to propose alternatives.
The page to adjust parameters controlling the difficulty (e.g. domain and number of values displayed, length of a game, etc.) of the games, more complex display and sound choices (e.g. colors and spaces being used in the display, words pronounced by the software in various situations, etc.), and the details about the application logs, is accessed via a special button requiring a longer press, making it harder to access to nonhuman subjects.
4.2.4. Nonhuman Privacy issues
Summarize (2022-FVS-TheCaseForAnimalPrivacyInTheDesignOfTechnologicallySupportedEnvironments-PaciManciniNuseibeh) and justify that the log data does not constitute information which could be inferred to be considered private by the nonhuman subjects.
4.2.5. Approval of the experimental protocol by CICUA
All interactions with animals were governed by a protocol reviewed and approved by the Institutional Animal Care and Use Committee (IACUC) (in Spanish “COMITÉ INSTITUCIONAL de CUIDADO y USO de ANIMALES (CICUA)”) of the researchers’ institution, through a form of Experimentation Protocol of Management and Care of Animals (“Protocolo de Manejo y Cuidado de Animales”).
4.3. Statistical Analysis Process
The statistical analysis of the experimental results was designed as part of the experimental protocol, with the objectives to compute the accuracy of each subject for each display mode and each size of the set of values presented to the subject (Section 4.3.1), to compare it with the accuracy of selecting a value uniformly at random (Section 4.3.2) and to search for correlation between the answer’s accuracy and some measure on the values presented (Section 4.3.3).
4.3.1. Statistical tools used
The statistical analysis was performed in a python notebook, executed and shared via the collaborative website https://colab.research.google.com: this platform was chosen because it is easy to collaborate among peers as well as to run and replicate statistical analyses. Such python notebook was developed and tested on the logs generated during the (masked and unmasked) training sessions, to be used later without major modification on the logs generating during the masked experimental sessions of the testing phase. The computation made use of the following libraries:
pandas is a library written as an extension of numpy to facilitate the manipulation and analysis of data.
seaborn and matplotlib are libraries for the visualisation of statistical data. seaborn was used for the creation of correlation graphs and matplotlib for heat maps.
is a free and open source library for Python. It consists of mathematical tools and algorithms, from this library we usescipy.stats for the chi-square and binomial tests.
The python notebook operates on the log files via the pandas library. These logs can be worked individually or concatenated to obtain a large overall analysis of the test subject.
4.3.2. Binomial Tests
The average accuracy of each subject for each display mode and each size of the set of values presented to the subject is then the average of the Correction entry in the log (replacing True by and False by ) over all data points matching the criteria. For each such accuracy, we performed a binomial test in order to decide if such accuracy was substantially better than that achieved by selecting a value uniformly at random. To calculate the binomial test we count the ”success” among all the points of the dataset, and apply the binom_test method in scipy. , where is the total number of successes, is the total number of attempts, over tests selecting the maximal value out of two, ; over tests selecting the maximal value out of three, ; and over tests selecting the maximal value out of four, . The greater alternative is used since we are looking for an accuracy greater or equal to , and respectively. We performed such statistical analysis on the data of each particular session and on their union, on each particular visualization mode and on the type of visualisation mode (discrete or continuous) and on all visualisation modes (see Tables 2, 3, 5 and 6).
4.3.3. Pearson Correlation Analysis
In order to compare our results with that of Al Aïn et al. (2008)’s experiments, we performed a Pearson correlation analysis of the relation between the accuracy of the subjects’ answers when asked to select the maximal out of two values on one hand, and the three variables they considered on the other hand:
the sum of the values for each test (e.g. from to ),
the difference between the two extreme values presented within a trial (e.g. from to ) and
the ratio of continuous quantities presented, by dividing the smallest presented value by the largest one (e.g. from to ).
We describe the results of the experiments and their statistical analysis in the next section.
After relatively long phases of development and training (15 months) using various domains of values (from to ), the experimental phase was quite short (one week), with all experiments performed using a masked setup and a domain of values restricted to the set in order to stay as close as possible to the settings of Al Aïn et al. (2008)’s study. We summarize the number and content of the logs obtained (Section 5.1), perform binomial tests on the experimental results when choosing the maximal value out of two for both subjects (Section 5.2), perform binomial tests on the experimental results when choosing the maximal value out of three and four for the first subject (Section 5.3), and perform various correlation tests between the performance of the subjects and the sum, difference and ratio of the values presented (Section 5.2.3).
5.1. Log results
A testing session typically lasts some 5 to 10 games of 20 questions each, resulting into a log of 100 to 200 data points: see Figures 20 and 21 for a shortened example of log, and Figure 27 for a longer one.
The testing phase occurred between the 19th of May 2022 and the 26th of May 2022. These experiments used four different display modes (“Dice”, “Heap”, “Disc” and “Rectangle”), requesting the subject to select the maximal value out of a set of 2, 3 or 4 values, randomly chosen among a set of five values , in order to produce a setup relatively similar to that of Al Aïn et al. (2008), with the vast majority of experiments selecting the maximal out of two values, and only a few out of three or four values. Each log corresponds to a separate training session and device, containing between 80 and 400 entries (each entry being a separate question and answer). In total, 14 logs were collected for the first subject, and 5 logs were collected for the second subject: the first subject was requested to select the maximal value out of 2,3 or 4 values, while the second subject was requested to select the maximal value only out of 2 values. Concerning the selection of the maximal out of 2 values (the setting the most similar to that of Al Aïn et al. (2008)), the first subject answered 449 dice tests, 400 heap tests, 262 rectangle tests and 103 disc tests, making a total of 1214 tests, while the second subject answered 190 dice tests 26 rectangle tests and 193 disc tests making a total of 409 tests. Concerning the selection of the maximal out of 3 values, the first subject answered 249 dice tests, 120 heap tests, 120 rectangle tests, and 99 disc tests, making a total of 588 tests. Concerning the selection of the maximal out of 4 values, the first subject answered 154 dice tests, 51 heap tests, and 13 disc tests, making a total of 218 tests.
See Table 1 for a summary of the number of data points collected separated by display modes (“Dice”, “Heap”, “Disc” and “Rectangle”), accumulated by the type of display mode (“Discrete” or “Continuous”) and accumulated over all display modes (“Total”). Even though the care to respect the agency of the subjects introduced great imbalances between the number of data points collected for each display mode and set size, 2429 data points were gathered in only one week, through voluntary participation in the subject: this is much higher than what could be achieved in the same amount of time via traditional analogical protocols such that of Al Aïn et al. (2008).
We analyze those results statistically in the following sections.
5.2. Selecting the maximal value out of two
Both subjects played the game in the four display modes, the first subject showing much more interest in participating than the second one, but none of them marking a particular preference for any display mode. The first subject showed an average accuracy of (Section 5.2.1), the second subject an average accuracy of (Section 5.2.2). Both performed better when the values were very different and worse when the values were close (Section 5.2.3), exactly as the three African Grey parrots in (Al Aïn et al., 2008)’s study.
5.2.1. First Subject
The results show a clear ability from the first subject to discriminate the maximal value out of two quantities. Over all experimentations requesting to select the maximal value out of two, the first subject responded correctly times out of a total of trials, corresponding to an average accuracy of . A simple binomial tests indicates that the probability to achieve such an accuracy by answering uniformly at random such binary questions is (see Table 2 for a more detailed description of the results, by session and by display mode). It is very likely that the subject did not answer uniformly at random. Lacking a bias in the experimental protocol, this seems to indicate a clear ability to discriminate between the two values being presented.
Analyzing separately the results for each display mode or type of display mode, corroborates the result and points out some interesting facts: First, one does not observe any relevant improvement over time, which is explained by the relatively long period of training before the relatively short period of testing. Second, over all the subject performed with a slightly better accuracy for continuous display modes ( vs ) and, surprisingly (because one would expect the reverse for human, and similar accuracy for nonhumans), a much better accuracy for the “Heap” display mode over the “Dice” display mode ( vs ).
|24,10h||(no data)||(no data)||(no data)||(no data)|
|24,17h||(no data)||(no data)||(no data)||(no data)|
|25,08h||(no data)||(no data)||(no data)|
|25,13h||(no data)||(no data)||(no data)||(no data)|
5.2.2. Second Subject
The second subject was more reluctant to play, but showed a similar ability. Overall experimentations requesting to select the maximal value out of two during the testing phase, the second subject responded correctly times out of a total of trials, corresponding to an average accuracy of . A simple binomial tests indicates that the probability of answering correctly or more such binary questions out of by answering uniformly at random is : here again, it is very likely that the subject did not answer uniformly at random. Lacking a bias in the experimental protocol, this seems to indicate a clear ability to discriminate between the two values being presented.
|21,10h||(no data)||(no data)|
|23,15h||(no data)||(no data)||(no data)||(no data)|
|24,08h||(no data)||(no data)||(no data)||(no data)|
|24,17h||(no data)||(no data)|
5.2.3. Relation between accuracy and variables
When selecting the maximal value out of two, both subjects showed a lower accuracy when the two values were close (difference or ratio close to ): see Table 4 for the percentages of correct answers for each subject and each of the sets of values presented (ignoring the order). Such results corroborate those of the three African Grey parrots in Al Aïn et al. (2008)’s study.
Pearson’s correlation tests for the first subject (see Figure 29 for the corresponding heat map and Figure 29 for the corresponding scatter plots) suggest an inverse correlation between the accuracy of the subject’s selection and the ratio between the two values: for example, for a combination with small ratio , the subject is more likely to correctly select the maximal value.
|1st Subject||2nd Suject|
There is a strong negative correlation ratio of between the accuracy and the ratio, and a positive correlation ratio of between the accuracy and the difference (see the heat map in Figure 29). The scatter plots (in Figure 29) show a decreasing relationship between the accuracy and the ratio, and an increasing relationship between the accuracy and the difference.
There is a similar correlation between accuracy and ratio in the results of the second subject (see the heat-map in Figure 31 and the scatter plots in Figure 31). There is a strong negative correlation ratio of between the ratio and the accuracy. The correlation ratio of between the difference and the accuracy is much weaker.
5.3. Selecting the maximal value out of three and four values
Only the first subject was tested on selecting the maximal value out of three and four values: the second subject chose to stay in the “Nest” aviary or to play the digital piano for the remaining sessions. The subject a lower accuracy when asked to select the maximal value out of 3 or 4, than out of 2: on average the achieved an accuracy of for selecting the maximal out of three and for the maximal out of four, but still much better that what would be expected ( and respectively) if the subject chose uniformly randomly among the values proposed (see Tables 5 and 6 for the detailed performances separated by display mode and sessions).
|25,16h||(no data)||(no data)||(no data)||(no data)|
Two simple binomial tests give a more formal measure of how much better the subject performed compared to someone choosing uniformly at random: the probabilities of obtaining an accuracy equivalent or superior by randomly choosingthe same number of answers is with probability of success (for selecting the maximal out of 3 values times) and with probability of success (for selecting the maximal out of 4 values times): with very high probabilty, the subject showed their ability to discriminate between three and four values.
|25,16h||(no data)||(no data)||(no data)||(no data)|
|26,09h||(no data)||(no data)|
We conclude with a summary of what the project achieved to the date (Section 6.1), a discussion of the potential issues with the results presented (Section 6.2) and some perspective for future research (Section 6.3).
Whereas Al Aïn et al. (2008)’s protocol requested the subject to choose between two pieces of cardboard holding distinct amount of food, for discrete and continuous types of food material; we proposed a protocol which requests the subject to choose the largest among a set of values (of parameterized size) on a visual display, using discrete and continuous representations of values, by touching a touchscreen on the representation of the largest value. By developing a simple but extensively parameterized web application requesting the user to select the largest among two to four values chosen at random, using discrete and continuous representations of values and providing visual and audio feedback about the correctness of the answer, we achieved a solution with various advantages, which we tentatively list as follows.
6.1.1. Better guarantees against subjects reading potential cues from the experimenter
In the context of the measurement of the discrimination abilities between discrete and continuous quantities of subjects, we designed a variant of Al Aïn et al. (2008)’s experimental protocol which presents better guarantees against subjects reading potential cues from the experimenter. Whereas their protocol is performed in presence of a human experimenter who know the complete set-up of the experiment, in our variant the experimenter can ignore the options offered to the subjects and receive audio feedback to indicate whether to reward or not the subject (see Section 2.5 for the definition of a masked experimental set-up).
6.1.2. Generalization of results to Monk Parakeets
Using such protocol, we replicated and generalized the results obtained by Al Aïn et al. (2008) on the discrimination abilities of three African Grey (Psittacus erithacus) parrots to that of of two Monk Parakeets (Myiopsitta Monachus) parrots. Concerning the ability to discriminate the largest between 2 values chosen randomly in a domain of 5 distinct values, in discrete or continuous quantities, the two Monk Parakeets parrots performed as well as the three African Grey parrots from Al Aïn et al. (2008)’s study, with global accuracies of for the first subject and for the second one (see Section 5.2 for the detailed results). Similarly to the results described by Al Aïn et al. (2008), we found a strong correlation between the ratio between the smallest and largest values and the accuracy of the subject: close values are harder to discriminate than others.
6.1.3. Increased agency of the subject
A subject’s sense of agency, defined as the faculty for the subject to take decisions and to act upon them, was proven to be an important factor in the well-being of captive nonhuman animals (Mancini, 2017; Perdue et al., 2012; Kohn, 1994). In addition to features from the experimental protocol aiming to promote the subject’s sense of agency, the web application itself provides various means for the subject to exert its agency, from the ability to choose the mode of display of the values to the ability to interrupt the game at any time and to choose a different mode of display.
6.1.4. Extension to tuples
Taking advantage of the extensive parametrization of the web application, we slightly extended the settings of Al Aïn et al. (2008)’s study from pairs to tuples: whereas their protocol requested the subject to choose between only two quantities, we were able to study the discrimination abilities not only between pairs of values, but also between triple and quadruple sets of values, showing a reduction of accuracy when the size of such set increased.
6.1.5. Diversifying Discrete and Continuous Representations
6.1.6. Increased Number of Experiments
The web application used in the experiments is similar to other digital life enrichment applications made available to nonhuman animals by their guardians. Similarly to the behavior described by Washburn (Washburn, 2015; Richardson et al., 1990) of apes presented with digital life enrichment applications serving as cognition tests, the subjects often chose to play such web application over other toys made available to them, and often asked to continue playing after the end of a game. This allowed for multiple repetitions of the experiments, and to gather a large amount of data points without incommoding the subjects: the two subjects of this study voluntarily answered a total 2429 tests in one week (see Table 1 for a summary), without any observable negative consequences during nor after the end of the testing phase.
6.1.7. True Randomness
The web application generates the instances presented to the subjects uniformly at random, whereas the high organisational cost of Al Aïn et al. (2008)’s protocol limited it to testing the exhaustive enumeration of pairs between values from a specific domain, in a random order. The later could yield some issues if the domain is sufficiently small that a subject could deduce the answer to some questions by an elimination process, based on previous answers. As Al Aïn et al. (2008) considered a domain of values of size , the amount of distinct unordered pairs is , a list which subjects with working memory abilities similar to humans might be able to manage. Beyond the fact that the web application allows the use of a domain of size up to (which brings the amount of distinct unordered pairs to ), and of sets of values of size larger than two, the fact that the sets of values presented to the subject are generated at random completely suppresses the possibility of a subject to deduce the answer to some questions by an elimination process, based on previous answers.
6.1.8. Automatic generation of the experimental logs
The web application automatically generates locally a log of the subject’s interactions with it. This greatly reduces the generation cost of such log, reduces the probability of errors in it, and increases the amount of information captured by it, such as the exact time of each answer, allowing for instance the computation of the amount of time taken to answer each question or studies between the time and/or whether of the day and performance (albeit we did not take advantage of such information in the present study).
6.1.9. Reduction of Experimental Cost
As the web application can be run on a simple pocket device, this reduces the cost of running such experiments to the extreme that it can be run on the experimenter’s shoulder while the device is hold by hand (at the cost of some accuracy in the results of such experiment). Such lowered cost might prove to be key in the design of citizen science projects extending this work.
Our digital adaptation of Al Aïn et al. (2008)’s experimental protocol present some other key difference, which might the result of our study relatively difficult to compare to that of Al Aïn et al. (2008). We attempt to list such difference as follows:
6.2.1. Non proportional rewards and reward withdrawal
The protocol defined by Al Aïn et al. (2008) instructs to reward the subject with the content of the container they chose: the importance of the reward is proportional to the value being selected. The protocol we defined instructs to reward the subject with a single type of reward each time it does select the maximal value of the set, and to withdraw such reward when the subject fails to do so. Such a difference might alter the experiment in at least two distinct ways:
The proportionality of rewards could result in a larger incentive to select the maximal value when the difference between the two values is the largest, and a reduced incentive when the difference is small, and Al Aïn et al. (2008) indeed noticed a correlation between the gap between the two values and the accuracy of the answer from the subjects of their experiment. The absence of such proportionality in our experiments might have reduced such an incentive, but we observed the same correlation than they did (described in Section 5.2.3).
The withdrawal of rewards when the subject fails to select the largest value of the set is likely to affect the motivation of the subject to continue to participate in the exercise on the short term, and in the experiment in the long term. To palliate the frustration caused by such withdrawal, extensive care was taken to progressively increase the difficulty of the exercises (first through the size of the domain from which the values were taken, then through the size of the set of values from which to select the maximal one). No frustration was observed, with both subjects often choosing to continue playing at the end of a game.
Implementing the proportionality of rewards is not incompatible with the use of a digital application. For instance, it would be relatively easy to extend the web application to vocalize the value selected by the subject, so that the experimenter could reward the subject with the corresponding amount of food. Such an extension was not implemented mostly because it would slow down the experimentation, for relatively meagre benefits.
6.2.2. Irregular pairs and tuples
The web application generates the sets of value presented to the subject uniformly at random (without repetitions) from the domain of values set in the parameter page. While such a random generation yields various advantages, it has a major drawback concerning the statistical analysis of the results, as some sets of value might be under-represented. An unbalanced representation of each possible set of values is guaranteed only on average and for a large number of exercises; whereas Al Aïn et al. (2008)’s protocol, using a systematic enumeration of the possible sets of values (presented in a random order to the subject), does not yield such issues. Such issue was deliberately ignored in order to develop a solution able to measure discrimination abilities on values taken from large domains (assuming that some nonhuman species might display abilities superior to that of humans in this regard), and presenting the subject with a systematic enumeration of the possible sets of values is practical only for small domains (e.g. values from 1 to 5), not for large domains. For a domain of size (as that of Al Aïn et al. (2008)), enough datapoints were generated that no pair was under represented (see Table 4).
6.2.3. Extension to sensory diverse species
The colors displayed by digital displays and the sound frequencies played by devices are optimized for the majority of humans. It is not always clear how much and which colours and sound can be seen and heard by individual of each species. The web application presents extensive parameters to vary the colours displayed and the sounds played to the subject. Even less intuitively, species can differ in their Critical Flicker Fusion Frequency (CFFF) (ND et al., 2021), the frequency at which they perceive the world and can react to it (in some species, such frequency even vary depending on the time of the day or of the season (Reas, 2014; Healy et al., 2013)). For instance, dogs have higher CFFF while cats have lower ones, and the CFFF of reptiles vary with the ambient temperature. Such variation might affect not only their ability to comprehend the visual display and sound play from devices, but might also affect how they comprehend some application designs over others. The web application presents extensive parameters to vary the time between each exercise and which game, so that part of the rhythm of the application can be adjusted by the experimenter to the CFFF of the subject, but more research is required in order to automatically adapt the rhythm of such applications to the CFFF of individuals from a variety of species.
6.3. Perspective on future work
Some issues with the results presented in this work are not related to any difference with Al Aïn et al. (2008)’s experimental protocol, but rather with limitations of the current one. We list them along with some tentative solutions, to be implemented in the future.
6.3.1. Random Dice and Heap representations
The discrete representation modes Dice and Heap associate each value with a fixed representation of a number of points corresponding to the value being represented. This differs from what happens in Al Aïn et al. (2008)’s experimental protocol, where the seeds are in no arranged configuration on the cardboard. This might affect the results of the experience in that a subject could learn to select a particular symbol (e.g. the one corresponding to the largest value of the domain) anytime it is present, without any need for any comparison between the presented values. Check in the results if value sets including the largest value of the domain have a better accuracy ratio than others: this could be an indication that the subjects learned to select the corresponding symbol anytime it is present, without any need for comparing values. The development and evaluation of their impact on the discrimination abilities of human and nonhuman subjects will be the topic of a future study, once the corresponding randomized representations have been added to the web application. FABIAN: should you want to work on a second article after this one, this topic might be either to study than the campaign mode, and the random display should be quick enough to program…
6.3.2. Systematic logs
The easiness with which logs are generated tends to make one forget about it, to the point that the bottleneck could become the transfer of the logs from the device used to perform the experience to a central repository. As one guardian might get more excited to transfer the logs of sessions where the subjects excelled at the activities than that of less positive sessions, this might create a bias toward positive results in their report. While not an issue while implemented by personal with a scientific training, such risk of a bias might become more problematic in the context of a citizen science project (Association, 2021). The development of a website serving as a central repository of experimental data sent by web applications such as the one presented in this work will be the topic of a future study. The roles of such a central “back-end” website could include the automatizing of the most frequent statistical tests on the data received; a greater ease of separation between the roles of experimenter and researcher, which will be an important step toward a true citizen science generalisation of this project; and the aggregation of sensory and cognitive data from distinct applications, individuals and species.
6.3.3. Adaptive Difficulty
The great amount of parameters available in the settings page of the web application makes it possible to adapt the difficulty of the activities to the level of abilities of the subject. Such abilities evolve with time, most often advancing and only rarely receding (such as after a long period without using the web application). Choosing which values of the parameters is the most adequate to the current level of abilities of the subject requires an extensive understanding of the mechanisms of the application. An extension of the web application presenting the subject with a sequence of parametrization of increasing difficulty, along with a mechanism raising or lowering the difficulty of the activities presented to the subject would greatly simplify the task of the experimenter, and will be the topic of a future study.
6.3.4. Cardinal Discrimination
Pepperberg (2006) recounts how the African Grey parrot (Psittacus erithacus) Alex, after being trained to identify Arabic numerals from 1 to 6 (but not to associate Arabic numbers with their relevant physical quantities) was able to label which of two Arabic numerals is the biggest, having inferred the relationship between the Arabic number and the quantity, and having understood the ordinal relationship of his numbers. Modifying the web applicationInCA-WhatIsMore
so that to replace the graphical representations of values by ordinal numbers would/will be easy. Testing ethically the ability or inability of subjects to replicatePepperberg (2006)’s results without frustrating those subjects might require more sophistication in the design of the experimental protocol. Such protocols concerning the measurement of skills that subject might lack is the topic of the next section.
6.3.5. Ethical Measurement of Inabilities
The frustration potentially caused by the withdrawal of rewards (described in Section 6.2.1) when measuring skills that a subject might lack (an example of which was given in Section 6.2.1) points out to another issue, of ethical dimensions: how can one ethically demonstrate the inability of subjects to perform a given action through experimentation, without hurting the well-being of the subject by exposing them to the frustration of failing to performed the requested action? Note that such issue is not specific to the action of withdrawing rewards when a subject fails: as proportional rewards can also generate frustration. One solution could be to mix potentially “difficult” requests with other, similar but known to be “easy”, requests, in such a way that the proportion and frequency of the former to be a fraction of the proportion and frequency of “easy” requests that the subject fail (for inattention or other reasons). One can hypothesize that 1) the frustration generated by such questions would be minimum; that 2) a statistical analysis of the correction of the difficult requests would yield useful information about the ability or inability of the subject to answer those; and that 3) a small proportion of “difficult” requests helps to further motivate the subject, making the exercise more of a challenge.
6.3.6. Citizen Science Extensions
The term “Citizen Science” refers to scientific projects conducted, in whole or in part, by amateur (or nonprofessional) scientists (Gura, 2013). It is sometimes described as “public participation in scientific research”, with the dual objectives to improve the scientific community’s capacity, as well as improving the public’s understanding of science and conscience about the research’s themes (Association, 2021). Citizen Science has become a means of encouraging curiosity and greater understanding of science whilst providing an unprecedented engagement between professional scientists and the general public.
Such methodology must be used with care, in particular about the validity of volunteer generated data. Projects using complex research methods or requiring a lot of repetitive work may not be suitable for volunteers, and the lack of proper training in research and monitoring protocols in participants might introduce bias into the data (Thelen and Thiet, 2008). Nevertheless, in many cases the low cost per observation can compensate for the lack of accuracy of the resulting data (Gardiner et al., 2012), especially if using proper data processing methods (McClure et al., 2020).
Scientific researchers in comparative psychology could definitely benefit from some help, with many cognitive aspects to explores for so many species. In the process of defining the anecdotal method of investigation for creative and cognitive processes, Bates and Byrne (2007) mentioned that “collation of records of rare events into data-sets can illustrate much about animal behaviour and cognition”. Now that the technology is ready to analyze extremely large data-sets, what is lacking in comparative psychology are the means to gather such large data-sets.
Delegating part of the experimental process to citizens without proper scientific training is not without risk. Given the conflicted history of Comparative Psychology (Pepperberg, 2020) in general and Animal Language Studies (Pepperberg, 2016) in particular, the challenge of avoiding “Clever Hans” biases and related ones will be of tremendous importance. Could applications and experimental protocols such as the one described in this work help to design citizen science projects for the study of sensory and cognitive abilities in nonhumans species living in close contact with humans?
Jérémy Barbay programmed the first versions of the software, managed the interactions with the subjects during the development, training and testing phases, obtained the approval of the Institutional Animal Care and Use Committee, structured the article and supervised the work of Fabián Jaña Ubal and Cristóbal Sepulveda Álvarez. Fabián Jaña Ubal improved and maintained the software, and described it (Sections 3.1 and 3.2). Cristóbal Sepulveda Álvarez reviewed the experimental results, performed their statistical analysis, and described the log structure (Section 3.3), the statistical analysis process (Section 4.3) and the results (Section 5). All authors are aware of the submission of this work and of its content.
Acknowledgements.We wish to thank Joachim Barbay for his suggestion of using Svelte and his mentoring in the development of the various versions of the application; Jennifer Cunha from Parrot Kindergarten for sharing images and video of parrots using touchscreens, suggesting the (Al Aïn et al., 2008)’s study and for helping during the design and test of the preliminary versions of the software InCA-WhatIsMore (as well as other software projects); Corinne Renguette for her help concerning the bibliography and the ethical aspects of the experiments and of its description; Cristina Doelling for pointing out some of the existing literature about the use of touchscreens by apes in zoo; and Francisco Gutierrez and Jenny Stamm for suggesting alternative names to the problematic term “blind” in expressions such as “blind setup”, and for pointing out some bibliography supporting such replacement.
- Al Aïn et al. (2008) Syrina Al Aïn, Nicolas Giret, Marion Grand, Michel Kreutzer, and Dalila Bovet. 2008. The discrimination of discrete and continuous amounts in African grey parrots (Psittacus erithacus). Animal cognition 12 (09 2008), 145–54. https://doi.org/10.1007/s10071-008-0178-8
- Association (2021) Citizen Science Association. 2021. CitizenScience.org. Website https://citizenscience.org/. Last accessed on [2022-05-27 Fri].
- Bates and Byrne (2007) Lucy Bates and Richard Byrne. 2007. Creative or created: Using anecdotes to investigate animal cognition. Methods (San Diego, Calif.) 42 (06 2007), 12–21. https://doi.org/10.1016/j.ymeth.2006.11.006
- Coghlan et al. (2021a) Simon Coghlan, Sarah Webber, and Marcus Carter. 2021a. Improving ethical attitudes to animals with digital technologies: the case of apes and zoos. Ethics and Information Technology 23 (12 2021), 1–15. https://doi.org/10.1007/s10676-021-09618-7
- Coghlan et al. (2021b) Simon Coghlan, Sarah Webber, and Marcus Carter. 2021b. Improving ethical attitudes to animals with digital technologies: the case of apes and zoos. Ethics and Information Technology 23 (12 2021), 1–15. https://doi.org/10.1007/s10676-021-09618-7
- Cunha and Clubb (2018) Jennifer Cunha and Susan Clubb. 2018. Advancing Communaction with Birds: Can They Learn to Read? https://www.academia.edu/45183882/Advancing_Communication_with_Birds_Can_They_Learn_to_Read
- Egelkamp and Ross (2018) Crystal Egelkamp and Stephen Ross. 2018. A review of zoo-based cognitive research using touchscreen interfaces. Zoo Biology 38 (11 2018), 220–235. https://doi.org/10.1002/zoo.21458
- Gardiner et al. (2012) Mary M Gardiner, Leslie L Allee, Peter MJ Brown, John E Losey, Helen E Roy, and Rebecca Rice Smyth. 2012. Lessons from lady beetles: accuracy of monitoring data from US and UK citizen-science programs. Frontiers in Ecology and the Environment 10, 9 (2012), 471–476. https://doi.org/10.1890/110185 arXiv:https://esajournals.onlinelibrary.wiley.com/doi/pdf/10.1890/110185
- Gura (2013) Trisha Gura. 2013. Citizen science: Amateur experts. Nature volume 496 (2013), 259–261.
- Healy et al. (2013) Kevin Healy, Luke McNally, Graeme D. Ruxton, Natalie Cooper, and Andrew L. Jackson. 2013. Metabolic rate and body size are linked with perception of temporal information. Animal Behaviour 86, 4 (2013), 685–696. DOI: 10.1016/j.anbehav.2013.06.018.
- Kohn (1994) B. Kohn. 1994. Zoo animal Welfaire. Rev Sci Tech 13, 1 (1994), 233–45. doi: 10.20506/rst.13.1.764.
- Mancini (2017) Clara Mancini. 2017. Towards an animal-centred ethics for Animal–Computer Interaction. International Journal of Human-Computer Studies 98 (2017), 221–233. https://doi.org/10.1016/j.ijhcs.2016.04.008
- McClure et al. (2020) Eva C. McClure, Michael Sievers, Christopher J. Brown, Christina A. Buelow, Ellen M. Ditria, Matthew A. Hayes, Ryan M. Pearson, Vivitskaia J.D. Tulloch, Richard K.F. Unsworth, and Rod M. Connolly. 2020. Artificial Intelligence Meets Citizen Science to Supercharge Ecological Monitoring. Patterns 1, 7 (09 Oct 2020). https://doi.org/10.1016/j.patter.2020.100109
- Morris D (2007) Wormald R. Morris D, Fraser S. 2007. Masking is better than blinding. BMJ : British Medical Journal 334, 7597 (Apr 2007).
- ND et al. (2021) Mankowska ND, Marcinkowska AB, Waskow M, Sharma RI, Kot J, and Winklewski PJ. 2021. Critical Flicker Fusion Frequency: A Narrative Review. Medicina 57, 10 (Oct 2021), 1096.
- Pepperberg (2006) Irene Pepperberg. 2006. Ordinality and inferential abilities of a grey parrot (Psittacus erithacus). Journal of comparative psychology (Washington, D.C. : 1983) 120 (09 2006), 205–16. https://doi.org/10.1037/0735-7036.120.3.205
- Pepperberg (2016) Irene Pepperberg. 2016. Animal language studies: What happened? Psychonomic Bulletin & Review 24 (07 2016). https://doi.org/10.3758/s13423-016-1101-y
- Pepperberg (2020) Irene Pepperberg. 2020. The Comparative Psychology of Intelligence: Some Thirty Years Later. Frontiers in Psychology 11 (05 2020). https://doi.org/10.3389/fpsyg.2020.00973
- Pepperberg (1999) Irene Maxine Pepperberg. 1999. The Alex Studies: Cognitive and Communicative Abilities of Grey Parrots. Harvard University Press, Cambridge, Massachusetts and London, England.
- Perdue et al. (2012) Bonnie M. Perdue, Andrea W. Clay, Diann E. Gaalema, Terry L. Maple, and Tara S. Stoinski. 2012. Technology at the Zoo: The Influence of a Touchscreen Computer on Orangutans and Zoo Visitors. Zoo Biology 31, 1 (2012), 27–39. https://doi.org/10.1002/zoo.20378
- Reas (2014) E. Reas. 2014. Small Animals Live in a Slow-Motion World. Scientific American Mind 25, 4 (2014).
- Richardson et al. (1990) W. K. Richardson, D. A. Washburn, W. D. Hopkins, E. S. Savage-Rumbaugh, and D. M. Rumbaugh. 1990. The NASA/LRC computerized test system. Behavior Research Methods, Instruments, and Computers 22 (1990), 127–131. https://doi.org/10.3758/BF03203132.
- Thelen and Thiet (2008) Brett Thelen and Rachel Thiet. 2008. Cultivating connection: Incorporating meaningful citizen science into Cape Cod National Seashore’s estuarine research and monitoring programs. Park Science 25 (06 2008).
- Trestman (2015) Michael Trestman. 2015. Clever Hans, Alex the Parrot, and Kanzi: What can Exceptional Animal Learning Teach us About Human Cognitive Evolution? Biological Theory 10 (03 2015). https://doi.org/10.1007/s13752-014-0199-2
- Washburn (2015) David Washburn. 2015. The Four Cs of Psychological Wellbeing: Lessons from Three Decades of Computer-based Environmental Enrichment. Animal Behavior and Cognition 2 (08 2015), 218–232. https://doi.org/10.12966/abc.08.02.2015