On the notion of number in humans and machines

06/27/2019 ∙ by Norbert Bátfai, et al. ∙ 0

In this paper, we performed two types of software experiments to study the numerosity classification (subitizing) in humans and machines. Experiments focus on a particular kind of task is referred to as Semantic MNIST or simply SMNIST where the numerosity of objects placed in an image must be determined. The experiments called SMNIST for Humans are intended to measure the capacity of the Object File System in humans. In this type of experiment the measurement result is in well agreement with the value known from the cognitive psychology literature. The experiments called SMNIST for Machines serve similar purposes but they investigate existing, well known (but originally developed for other purpose) and under development deep learning computer programs. These measurement results can be interpreted similar to the results from SMNIST for Humans. The main thesis of this paper can be formulated as follows: in machines the image classification artificial neural networks can learn to distinguish numerosities with better accuracy when these numerosities are smaller than the capacity of OFS in humans. Finally, we outline a conceptual framework to investigate the notion of number in humans and machines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the movie Rain Man, there is a scene in which Dustin Hoffman as the autistic Raymond Babbitt can count the exact number of toothpicks on the floor in the blink of an eye. This scene gave the idea to implement it as a machine learning example. However to simplify this task we do not count toothpicks but dots in images. Let us compare this with the classical machine learning problem of recognizing MNIST handwritten digit of numbers [LBBH98], [LCB]

. In the classical MNIST task, a typical classifier program takes images of handwritten digits and recognizes them. An own image of the digit 8 can be seen in Fig.

0(a).

The semantic MNIST, or shortly called SMNIST, program does not take images of digits but images that contain less than 10 dots. An image of 8 dots is shown in Fig 0(b).

(a) If the MNIST classifier takes this image of the digit 8 then it will say it’s 8.
(b) Provided the SMNIST classifier takes this image of 8 dots then it should say it’s 8.
Figure 1: Two typical input images for MNIST and SMNIST.

1.1 Cognitive Neuropsychological and Computer Science Background and Aims

The research of the biological and psychological factors behind the numerical abilities originate from the 1930-s. This extraordinary ability was studied from many angles. In insects, like the honeybee for example, there was found, that they can identify, and by that, count up to four different landmarks, for food reward [DVS08]. In the neuropsychological literature there are two main topics of this type of research, which are the OFS (Object File System or its synonyms like Subitizing, Object Tracking System or Parallel Individuation System) and the ANS (Analogue Number System, Approximate Number System or also known as Analog Magnitude System) [DN16], [Nie16], [Gea00], [Gea95], [Hyd11], [FDS04], [FCH02]. The OFS includes the so-called “numerosity”, which is an ability, when someone just by looking at an object, without counting, can tell exactly how much of that given object is present. The maximum length of this ability in humans is up to four different objects, for a lifetime [Gea00], [Gea95]. Therefore the OFS is a system, that helps us to determine the numerosity for a small quantity of items (maximum four), by using different markers for each object [DN16]. There were many research made with vertebrates (like, for example, cats [RKRC70], [DP88] chimpanzees [BB89], [DM82] or parrots [Pep10]), in which researchers studied the biological and evolutionary features for this particular ability. There are many research, that studied this ability in infants [SD83], [Sta92], [SSSG83], [SSSG90], [Tri92], [VLS90]) and proved the early, innate presence of numerosity [Gea95]. The ANS is a system that is present in a large scale of animals, and humans alike; this helps us to determine the numerosity of a small group of monitored objects, without using, or needing any kind of symbol- or language system. In the course of the biological development, this system is able to advance, and it can be looked at as a main foundation-stone for the progression of the numerical thinking [PIP04], [PPLBD07], [Pia10]. In many cases mathematical simulation models have proven fruitful in cognitive neuropsychology research, for example [DC93] planned a simulation of a natural neural architecture, where the distance [MDM80] and size effect [FBAH66] can be measured. But pure mathematical models are not really rare either. For example [vOGV82] can explain the measured capacity of OFS. The systems OFS and ANS of processing numbers clearly have evolutionary roots [DDLC98], [VF04]. In this light, it should be noticed that while they have presumably evolved over many hundreds of millions of years [Nie16], the mathematics has been developed over just a few thousand years. Of course it is still possible that mathematics has evolutionary roots, see the example about Newton’s second law of [Sza00, 1674] or Darwinian neurodynamics [SZF17].

From the viewpoint of computer science, the numerical abilities of computers are of analogue or digital nature [Neu58]. In today’s digital computers, numbers are represented in either fixed point or floating point format [Knu97]. Obviously, in contrast with previously cited neuropsychological systems, the numerical fundamentals of computers are fully known because they have been developed as results of targeted research and engineering processes as it has also been mentioned by McCulloch in [vN63, 319]. But it should be noticed that it will not be necessarily true for systems that include some deep learning black box AI [Cas16] elements. With this paper we would like to try to extend the above non-exhaustive listing of the cited works from vertebrates through human infants to include such items as that study numerosity classification in machine learning computer programs. In another context, this process has already begun. For example, see https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html

that presents the current state of the art in several standard machine learning tasks. The listed models and their implementations are typically based on artificial neural networks (ANNs) like convolutional neural network (ConvNet or CNN)

[LBD89], [ZF13]

or multilayer perceptron (MLP). We will run some of these well known, for example MNIST

[LBBH98] or CIFAR-10 [Kri09], programs in the second part of this paper. Nowadays, deep learning and artificial neural networks have already surpassed the human performance in several areas like, for example, playing old computer games [MKS15], playing GO [SSS17], playing Quake III Arena Capture the Flag [JCD18] or playing Starcraft [VEB17]

. These three cited milestone works use reinforcement learning. There are many early roots of the success of these projects and deep machine learning in general. Such as the dataflow programming paradigm

[Kah74]

, the mathematical model of a neuron

[MP43] or the concept of the perceptron [Ros58]. By now all key players in AI industry have their own frameworks for researching AGI (Artificial General Intelligence), for example, Microsoft uses MALMO-Minecraft [MKTD16], Google uses DeepMind Lab-Quake III Arena [BLT16] and so on, see [HOBB17]. The games serve as basis of these artificial environments are (or were) typically famous and popular computer games. Finally it should be noted that the Lamarckian evolutionary approach has already been arisen in this field as well [JDO17], [ACT19].

The research experiments undertaken in this paper are divided into two main sections. The first one is the Semantic MNIST for Humans and the second one is the Semantic MNIST for Machines. While the aim of the experiment Semantic MNIST for Humans is clearly to investigate the capacity of OFS, the purpose of the experiment Semantic MNIST for Machines is less clear: we introduce a standard benchmark and several datasets of images for it. We would like to investigate the numerosity classification abilities of deep learning programs that originally developed for other purpose. A similar work can be found in [WZS18] where the notion of number in machines is investigated. If we compare it to our research we can see that our subitizing problem is simpler than theirs. Our more distant and utopian goal is to create a computer program that would be able to simulate the cognitive evolution of numbers, in the sense of Merlin Donald [Don91], and would be able to develop some kind of notion of number.

2 Semantic MNIST for Humans

The Android mobile application called “SMNIST for Humans” is a benchmark program intended to investigate the capacity of the parallel individuation system in humans. It is available in source code form from the GitLab project [Bát19b] under the directory forHumans/SMNISTforHumansExp3. But as a built APK file it can also be downloaded and installed directly on Android devices from http://smartcity.inf.unideb.hu/~norbi/SMNIST/SMNISTforHUMANS/Exp3/.

(a) A first rapid prototype for testing “gaming experience” with SMNIST for Humans. At each step of running the program, a random number of dots are drawn into the central circle which numerosity must be detected by touching the appropriate smaller circle of numerical digits.
(b) This figure shows a screenshot of “SMNIST for Humans, Experiment 3” edition in action. The program displays (below in the second row) the changing of levels and the mean of numerosities of dots in addition (in the first row) the millisecond values corresponded to the changing of levels. Further precise details can be found in text.
Figure 2: SMNIST for Humans screenshots.

As it can be seen in Fig. 1(a), the program draws a given number of dots on the screen then the user must touch the appropriate numerical digit within a certain time window. The players start at level 3 where 0, 1 or 2 dots can appear randomly on the screen. If players can detect the right number of dots for 10 consecutive times then they will move to the next level of the benchmark program. The achieved levels are indicated in the second row of numbers shown in Fig. 1(b)

. Here the (9) 4/0 5/1 6/2 7/2 8/2 9/2 0/0 <0.14078243> row tells that the actual level (between round brackets) is 9. The 4/0 indicates that at the event of changing level from 3 to 4, the integer part of the mean of the randomly picked 10 (consecutive successfully detected) integers (numerosities of dots) was 0. This is possible, for example, if the ten consecutive successfully detected numerosities are the following respectively 0, 0, 2, 1, 1, 1, 0, 1, 2, 1 where the integer part of the mean (0+0+2+1+1+1+0+1+2+1)/10 is equal to 0. At changing from level 4 to 5 it was 1, from level 5 to 6 it was 2 and so on. Finally, the 0/0 shows that the player has not unlocked the level 10 yet. The last value between angle brackets is equal to the heuristic value

where denotes the mean of numerosities of dots of i-th level changing and is the millisecond value corresponded to the level changing. These millisecond values are displayed in the first row of numbers. The computed heuristic value serves only as a simple gamification element of the benchmark program. The greater this value the greater the performance of the player.

Figure 3: This figure shows the relationship between the theoretical and measured mean of number of dots. The label lvl{n} denotes the event of changing level from n-1 to n. For example level 4 means that 0, 1 or 2 dots had already been successfully detected (and now the player is playing with 0, 1, 2 and 3 number of dots). The theoretical value denotes the expected value of the mean of randomly picked 10 integers from 0 to level-2, inclusive, that is For example, it is equal to at level 4 or at level 10, it is equal to , and in general it equals . The measured value denotes the mean of the integer parts of means of the 10 consecutive successfully detected integers.

Fig. 3 shows our measurement results. These results are in accordance with the well known observations from the cognitive psychology literature [Gea00], [Hyd11], [FDS04], [FCH02] that the capacity of the parallel individuation system in humans is smaller than 4. It is demonstrated well by Fig. 3 where the measured average of the integer parts of means of randomly picked (and of course consecutive successfully detected) 10 integers is lagging far behind the theoretical expected value of the mean of randomly picked 10 integers from 0 to level-2, inclusive, that simply grows linearly with level, namely it is equal to (where

starts from 4) because we use uniform distribution that is

, . In this interpretation of levels the level n denotes the event of changing level from n-1 to n. For example level 4 means that 0, 1 or 2 dots had already been successfully detected (and now the 0, 1, 2 and 3 values are being selected randomly).

Data were collected in the closed Facebook group of 680+ actual and former students of the BSc course “High Level Programming Languages” at the University of Debrecen called UDPROG where the students posted their results as screenshots. A total of 104 Android device screenshots were received. One such screenshot can be seen in Fig. 1(b).

3 Semantic MNIST for Machines

The SMINST for Machines is an attemption to develop a standardized task for assessing the ability of computer programs to recognize the numerosity of dots in an image. In the case of SMNIST for Humans, it is obvious that we do not need training dataset, but only test data (the random dots) that can be generated online during the running of the “game”. In contrast, in the case of SMNIST for Machines, we need both training and test datasets.

3.0.1 The Generator Program

The SMNIST datasets used in this study are generated by own generator program. It and its variants generate images that contain less than 10 dots. Their output are fully binary compatible with the format of the original MNIST training and test data [LCB] so we can immediately start the first experiments using the former MNIST programs.

3.0.2 SMNIST datasets

The datasets are organized into two releases (namely SMNIST for Machines and SMNIST for Anyone) and two series per release according to their development. The first series of SMNIST for Machines contains six pairs of train and test sets of images with the following properties

  • Naive: this is a set of 60.000 training and 10.000 test pixels images that contain less than 10 randomly placed then centered dots. Dots are pixels. The histograms of the generated train and test images are 0: 6025, 1: 5977, 2: 5965, 3: 5928, 4: 6075, 5: 6067, 6: 6004, 7: 5930, 8: 6051, 9: 5978 and 0: 986, 1: 1008, 2: 980, 3: 963, 4: 1064, 5: 970, 6: 996, 7: 1010, 8: 1036, 9: 987.

  • No-centering: this set is generated by the same method as the previous one, but the randomly placed dots on the images are no centered.

  • Disjunct: in this set all generated random images are unique images. It follows that training images are excluded from the test images. (Except the special case of 0 dots because there is just only one such image. It occurs several times in both sets.) The histograms of the generated train and test images are 0: 436, 1: 436, 2: 7390, 3: 7166, 4: 7482, 5: 7491, 6: 7299, 7: 7352, 8: 7498, 9: 7450 and 0: 49, 1: 48, 2: 1215, 3: 1213, 4: 1227, 5: 1263, 6: 1282, 7: 1210, 8: 1209, 9: 1284.

  • Disjunct 1PX: in the previous sets dots are pixels, here they are pixel of size.

  • Hard: in this case, the set of all possible coordinate pairs of pixels is divided into two disjoint sets. Then the training images are generated from one set and the test images are generated from the other set. The histograms of the generated train and test images are 0: 6751, 1: 425, 2: 6651, 3: 6656, 4: 6531, 5: 6646, 6: 6678, 7: 6482, 8: 6715, 9: 6465 and 0: 1107, 1: 59, 2: 1146, 3: 1089, 4: 1045, 5: 1101, 6: 1113, 7: 1118, 8: 1141, 9: 1081 where the 22*22=484 pixels are divided into two disjoint sets of sizes 425 and 59.

  • Hard 1PX: this set is generated by the same method as the previous one, but dots are pixel.

The second series contains training and test images only of size pixels with dots of 1x1 pixel described precisely by the following:

  • Disjunct: the same as above but there is exactly one training and exactly one test image that contain no dots. The histograms of the generated train and test images are 0: 1, 1: 90, 2: 4455, 3: 7926, 4: 7806, 5: 8008, 6: 7940, 7: 8069, 8: 7872, 9: 7833 and 0: 1, 1: 10, 2: 438, 3: 1382, 4: 1315, 5: 1352, 6: 1347, 7: 1441, 8: 1379, 9: 1335.

  • Hard: as in the previous case and here also the 0 dots are handled standalone. The histograms of the generated train and test images are 0: 1, 1: 84, 2: 3486, 3: 8126, 4: 7943, 5: 8034, 6: 8061, 7: 8115, 8: 8003, 9: 8147 and 0: 1, 1: 16, 2: 120, 3: 560, 4: 1567, 5: 1571, 6: 1518, 7: 1501, 8: 1534, 9: 1612, where the 10*10=100 pixels are divided into two disjoint sets of sizes 84 and 16.

  • Disjunct pow 102x+, Hard pow 102x+: as in the previous ones but the probability distribution function of generating

    dots in train images is the following

    where denotes the maximum digit. In cases of these training and test sets, is equal to 9. The reason of choosing this distribution is that there are possible images that contain exactly (different, order matters) dots (variations without repetition). To be more precise, in our case it is equal to because all n pixels are the same color (order does not matter, combinations without repetition)111In practice, histograms of the generated images follow the case of combination without repetition due to the uniqueness condition of the Disjunct (and all further) datasets.. It should be noticed that number of dots in the test images still follows uniform distribution. In addition, due to using the above distribution produces histograms for example like this 0: 1, 7: 5, 8: 600, 9: 59394 where the numbers of dots 2, 3, 4, 5, and 6 are missing, therefore the ten percent of generation of training images follows uniform distribution. In the case labelled by “Disjunct pow 102x+” the histograms of the generated train and test images are 0: 1, 1: 71, 2: 719, 3: 717, 4: 720, 5: 770, 6: 763, 7: 792, 8: 1302, 9: 54145 and 0: 1, 1: 29, 2: 1246, 3: 1234, 4: 1222, 5: 1275, 6: 1260, 7: 1248, 8: 1204, 9: 1281. In the other (“Hard pow 102x+”) case the histograms are 0: 1, 1: 84, 2: 732, 3: 751, 4: 711, 5: 724, 6: 773, 7: 766, 8: 1199, 9: 54259 and 0: 1, 1: 16, 2: 120, 3: 560, 4: 1525, 5: 1617, 6: 1523, 7: 1574, 8: 1512, 9: 1552, where the 10*10=100 pixels are divided into two disjoint sets of sizes 84 and 16.

  • From 4H-102x+ to 8H-102x+ the sets are the same as Hard pow 102x+ but respectively. The histograms of one of these cases labelled by “4H-102x+” can be seen in Table. 1.

72/28 theoretical statistics
dots train test train test
0 1 1 1 1
1 72 28 72 28
2 2556 378 1925 378
3 59640 3276 2574 3276
4 1.02879e+06 20475 55428 6317
Table 1: The histogram of the dataset SMNIST for Humans Series 2/4H-102x+. The total 100 pixels of an image of size 10x10 are divided into two disjoint sets where the size of one is 72 and the size of the other one is 28. The column labelled “theoretical” shows the possible number of images that contain dots placed on different places, while the column labelled “statistics” contains the number of generated images.

All data used in this paper can be found at http://smartcity.inf.unideb.hu/~norbi/SMNIST/. The same data can also be found at GitLab [Bát19b] under the directory Datasets/SMNIST.

3.1 Running results

For measurements we have used the following well known programs and models with default or different settings and with minor modifications in some certain cases.

Table 2, 3 and 4 contain the test accuracies of runs of the these investigated programs. All datasets shown is these tables in addition shown in Table 5 and 6 contain 60.000 train and 10.000 test images. Finally, it should be noted that some investigated programs are very similar to each other that also plays a validating role.

3.2 Measurements with SMNIST for Machines Series 1

It is quite obvious that all programs produces good performance on the original MNIST dataset as it can be seen in the first column of Table 2. The running results for Series 1 of our datasets are shown in further columns. In first two rows, it can be seen that softmax regression models do not perform well but it is not surprising if we take a look to Fig 3(a), 3(b) and 3(c) where we can compare for example the visualizations of weights for classification of the digit 3. In contrast the more sophisticated models like the deep CNNs perform on the SMNIST for Machines dataset significantly better than the softmax regression.

Out of curiosity, we transferred the original PyTorch model into a DQN [MKS13]

model and tested its performance. By transferring the model, we guaranteed that the difference between them could only originate from the different approaches (supervised learning vs. reinforcement learning). We implemented our own environment, where at every step, the model had to guess the numbers on a specific amount of images. If the correct guesses were above a certain threshold, we allowed it to continue playing, but at the same time, we increased the threshold. If the model’s performance dropped below this said threshold, the episode ended. The images were all sampled randomly from the original (Series 1/Naive) dataset. The model’s accuracy improved firmly over time, however, despite our efforts, in overall, the DQN model produced significantly worse results, such as accuracies around 0.4, 0.3 or even 0.2. We tried changing the number of episodes, the sampling procedure, and other hyperparameters such as gamma, epsilon, memory size, etc. but all to no avail.

Program MNIST Naive No-Ctrg Disjunct D-1PX Hard H-1PX
Tensorflow 0.9.0, mnist_softmax.py 0.9166 0.6078 0.6233 0.5616 0.3888 0.5779 0.1107
Tensorflow 0.9.0, mnist_softmax.py, UDPROG 0.9187 0.6249 0.6072 0.5959 0.4397 0.6025 0.1107
Tensorflow 1.4, mnist_deep.py 0.9925 0.9787 0.9558 0.9608 0.9903 0.9592 0.9941
Keras 2.2.4, mnist_cnn.py 0.9908 0.9415 0.9268 0.9446 0.9997 0.911 0.9997
Keras/Hierarchical RNN 0.9858 0.965 0.9828 0.9754 0.9974 0.9386 0.9655
PyTorch, cifar10_tutorial.py 0.9907 1.0 0.9932 0.8973 0.9957 0.8661 0.88
deeplearning4j LeNet MNIST 0.9848 0.9929 0.9842 0.9638 0.9886 0.9496 0.9957
MXNet 1.2.1, smnist_mxnet.py 0.991 0.9717 0.9763 0.9436 0.9842 0.8911 0.9843
Lasagne, mnist.py 0.9924 0.9362 0.9238 0.9235 0.9874 0.8970 0.9856
Table 2: Measurements with SMNIST for Machines Series 1.
(a) The well known typical weights for classification of 3 in Tensorflow 0.9.0, mnist_softmax.py (UDPROG) using the classical MNIST dataset. It may be noticed that the positive weight values draw out the silhouette of the digit 3.
(b) Weights for classification of 3 in Tensorflow 0.9.0, mnist_softmax.py (UDPROG) using the SMNIST Series 1/No-Ctrg dataset. The images of Series 1 datasets have a rectangular border of some pixels because the coordinates of dots are generated from range .
(c) Weights for classification of 3 in Tensorflow 0.9.0, mnist_softmax.py (UDPROG) using the SMNIST Series 1/H-1PX dataset. The 59 test pixels (and the 300 pixels of the border) are white.
Figure 4: The well-known visualizations of weights of regression MNIST tutorials.

3.3 Measurements with SMNIST for Machines Series 2

In this series we have moved from investigation of images of 28x28 size to images of 10x10 size. In addition, we manipulate the distribution of images in the train datasets. The probability of generating an image that contains n dots is roughly proportional to how many possibilities there are to place n pixels on an image of 10x10 pixels. The precise details can be found in the previous description of datasets. The performance of most of the tested programs has deteriorated on the data H-102x+ (see the last column of Table 3). Therefore it has been splitted into further five parts (from 4H-102x+ to 8H-102x+) for further investigation that can be found in Table 4. Based on the experiments SMNIST for Humans, we would except intuitively that performance starts to deteriorate with increasing number of dots. Actually in most tested cases this assumption is met but there also are such models where it is not true, see for example the row of Keras/Hierarchical RNN. It can intuitively be summarized that all tested programs show good performance when the number of dots does not exceed the limit of capacity of OFS measured in humans. That is the tested ANNs are supposed to be able to learn to distinguish numerosities with better accuracy when these numerosities are roughly smaller than 4. A direct experiment with smaller number of dots (H3-102x+) can be found in next section.

Program Disjunct Hard D-102x+ H-102x+
Tensorflow 0.9.0, mnist_softmax.py, UDPROG 0.6066 0.056 0.1281 0.1512
Keras 2.2.4, mnist_cnn.py 0.8822 0.7648 0.8145 0.4625
Keras/Hierarchical RNN 0.9995 0.9999 0.9965 0.9897
PyTorch, cifar10_tutorial.py 0.9528 0.6243 0.8776 0.5365
deeplearning4j LeNet MNIST 0.8488 0.4895 0.2757 0.2388
MXNet 1.2.1, smnist_mxnet.py 0.653 0.4013 0.3668 0.3005
Table 3: Measurements with SMNIST for Machines Series 2
Program 4H-102x+ 5H-102x+ 6H-102x+ 7H-102x+ 8H-102x+
Tensorflow 0.9.0, mnist_softmax.py 0.6317 0.3188 0.2334 0.1936 0.1658
Keras 2.2.4, mnist_cnn.py 0.9099 0.7055 0.7599 0.7285 0.6568
Keras/Hierarchical RNN 0.9993 0.9996 0.9442 0.9996 0.9993
PyTorch, cifar10_tutorial.py 0.8758 0.8589 0.723 0.556 0.6733
deeplearning4j LeNet MNIST 0.7743 0.5329 0.4770 0.3671 0.2977
Swift TF MNIST 0.6432 0.4906 0.3102 0.2896 0.1819
Swift TF CIFAR PyTorch 0.6729 0.6218 0.4796 0.4156 0.4802
Table 4: Measurements with SMNIST for Machines Series 2 with particular attention to the further breakdown of the set Hard pow 102x (H-102x+).

4 Semantic MNIST for Anyone

The SMNIST for Anyone is a natural further development of SMNIST for Machines. Machines can perform this test so do humans. But at this moment we have no test filling program for humans (technically, it will be based on the previously presented SMNIST for Humans Android application). The SMNIST for Anyone datasets are organized into two series. They are the same as the 4H-102x+, …, 9H-102x+(=H-102x+) datasets of the previous section where dots are replaced by 3x3 pixels patterns of the objects ’X’, ’O’, ’+’ and square outline (’S’) as it can be seen in Fig.

5 and 6. It is important to highlight that this test is not uniquely determined because in many cases it is not clear how many objects have been placed on the images.

4.1 Measurements with SMNIST for Anyone Series 1

The images of Series 1 contain only 3x3 pixels binary patterns of ’X’s. In all cases the performance has already been deteriorated with increasing number of dots as it can be seen in Table 5.

(a) smnistg-train-6-7
(b) smnistg-train-6-8
Figure 5: SMNIST for Anyone, Series 1. Both images contain exactly 6 ’X’s.
Program H4-102x+ H5-102x+ H6-102x+ H7-102x+ H8-102x+ H9-102x+
Tensorflow 0.9.0, mnist_softmax.py 0.6317 0.3188 0.2334 0.1942 0.1671 0.1402
Keras 2.2.4, mnist_cnn.py 0.836 0.7546 0.6914 0.6702 0.6233 0.5913
Keras/Hierarchical RNN 0.8498 0.7152 0.6896 0.6537 0.5144 0.5498
deeplearning4j LeNet MNIST 0.6862 0.6764 0.3845 0.3394 0.3008 0.2245
Table 5: Measurements with SMNIST for Anyone Series 1.

4.2 Measurements with SMNIST for Anyone Series 2

The images of Series 2 may contain any of the symbols ’X’, ’O’, ’+’ and square outline (’S’). As shown in Table 6 we experience the same performance as observed in the previous series of experiments.

(a) SSOS+O,
(b) SXXO+X,
(c) SSOOXXXSX,
(d) SXXO+X++S,
Figure 6: SMNIST for Anyone, Series 2. Images contain exactly 6 or 9 symbols of the following: ’X’, ’O’, ’+’ and square outline (’S’).
Program H4-102x+ H5-102x+ H6-102x+ H7-102x+ H8-102x+ H9-102x+
Tensorflow 0.9.0, mnist_softmax.py 0.6317 0.3217 0.2388 0.192 0.1638 0.1399
Keras 2.2.4, mnist_cnn.py 0.7959 0.5627 0.6036 0.5504 0.516 0.4696
Keras/Hierarchical RNN 0.7366 0.595 0.5198 0.5014 0.5099 0.4406
deeplearning4j LeNet MNIST 0.6558 0.4166 0.3566 0.3203 0.2743 0.2304
Table 6: Measurements with SMNIST for Anyone Series 2.

Finally, we have also made a direct experiment with maximum 3 number of objects. The properties of its dataset called H3-102x+ can be seen in Table 7. As we expected according to our thesis, the tested programs perform well in this experiment, for example the “Keras 2.2.4, mnist_cnn.py” produces an accuracy of 0.9436 or “Keras/Hierarchical RNN” produces an accuracy of 0.9522.

57/43 theoretical statistics
dots train test train test
0 1 1 1 1
1 57 43 57 43
2 1596 903 1567 903
3 29260 12341 28375 9053
Table 7: The histogram of the dataset Series 2/H3-102x+. This contains 10.000 test and only 30.000 train images. The 10*10=100 pixels are divided into two disjoint sets of sizes 57 and 43. The column labelled “theoretical” shows the possible number of images that contain different dots, while the column labelled “statistics” contains the number of generated images.

5 Conclusion and Further Work

In all software experiments of this study we investigate the numerosity of quantities. The SMNIST for Humans experimental results are well in accordance with observations from cognitive psychology literature. Based on our SMNIST for Humans and SMNIST for Anyone experiences we can intuitively formulate our thesis as follows: image classification (such as MNIST or CIFAR-10) ANNs can learn to distinguish numerosities with better accuracy when these numerosities are smaller than the capacity of OFS measured in humans (that is roughly smaller than 4).

300.000.000

OFS

wolf bone

episodic

mimetic

mythic

theoretic

esport

BC 40.000

400.000

1.500.000

1936

prenatal

newborn

nursery

preschool

school age

university

human+computer

SMNIST programs

BC 25.000

BC 5.000

AlphaStar

2019

digit

machines
Figure 7: This Haeckel-like figure contains four timelines. Intuitively, the first one tries to outline a mental evolutionary phylogeny of humans especially focusing to the notion of number [Kli85]. The second one shows Merlin Donald’s distinguishment between stages of mental evolution [Don91]. The third one presents the stages of ontogeny of the notion of number in humans. And finally, the last timeline tries to introduce ontogeny of the notion of number in machines. It should be noted that timelines are not linear: the concrete dates do not matter, but their order does.

Fig. 7 outlines a conceptual framework for analyzing the notion of number in humans and machines. The first timeline of the figure tells that OFS may have begun evolving 300 million years ago [Nie16]. This is followed by the “wolf bone”, an assumed tally based external counting device [Kli85]. Then appears the first numerical digit [Kli85]. The had already been approximated in sexagesimal arithmetic by ancient Babylonians [FR98]. It stands for the appearance of the notion of numeral systems. The imaginary simply denotes the extension of the notion of number with the complex numbers. Turing’s famous study [Tur36] of 1936 indicates the onset of digital computers. There are numbers such as [Cha04] simply would not exist without digital computers. Finally, the breakthrough machine learning application called AlphaStar [VBC19] represents today’s computer programs. It is important to emphasize that all mentioned devices including software are based on Donaldian external storage systems of theoretical culture as it is expressed by the second timeline of the figure. Here we expand Donald’s three stages of mental evolution [Don91] with an additional stage called “esport”. The expansion of stages is not rare in the literature, see for example [JD02] where some timeline dates are little bit different from ours, but similarly, the additional stage is focused on computer programs. The esport and computer gaming like the computer programs in general have also been implemented in the Donaldian external memory of theoretical culture. Our utopian goal is to create an open source esport game that would be able to function as Leibniz’s “characteristica universalis” and as such can express some notion of number [Bát19a].

The stages of ontogeny of the notion of number in humans is presented on third timeline. By the time children reached school age they had acquired the language. Preschool children have already played electronic games but the minimum age limit of participation in esport tournaments is various in the range from 12 to 18 years old. Moreover our utopian interpretation of the new stage labelled by esport is shifted to university years because here we are thinking of the aforementioned esport game to be developed.

The digital computers are products of pure theoretical culture. For example, in this sense, the “AI winter” can be interpreted as the time required for machines of theoretical culture to learn to work on different lower Donaldian layer of culture [Bát19a]. In this interpretation, it may be a possible solution for the Moravec’s paradox. And it is also clear that it does not work backwards, for example just think of antinomies of naive set theory, where the source of problems is that we have tried to handle entities of theoretical culture at a lower layer of culture specifically with natural spoken (or written) language tools that are specific for the Donaldian mythical culture. From this point of view, the SMNIST programs that are entities of theoretical culture for today seems can solve the problem of subitizing that is part of the episodic culture. It is presented on fourth timeline. Why, for example, the AlphaStar is not shown on right side of this timeline? Due to computer programs of today are using the man’s notion of number rather than their own one because it does not yet exist.

6 Acknowledgment

The authors thank the students of the BSc course titled “High Level Programming Languages” at the University of Debrecen and the members of the UDPROG Facebook community https://www.facebook.com/groups/udprog for performing the SMNIST for Humans test. In addition, the authors thank Krisztina Győri for the proofreading of the manuscript.

This work was supported by the construction EFOP-3.6.3-VEKOP-16-2017-00002. The project was supported by the European Union, co-financed by the European Social Fund.

Author contributions were the following: N. B. conceived the idea of SMNIST for Machines, SMNIST for Humans and SMNIST for Anyone, developed the generator and Android programs, collected the SMNIST for Humans data from the UDPROG community and analyzed the measurements. D. P. provided cognitive psychological background. N. B., G. B., M. Sz., M. B., G. Sz., F. K., E. Sz. V. performed SMNIST for Machines computations. V. Sz. S. and L. K. performed some computations with a previous version of SMNIST for Machines datasets. All authors wrote the paper and discussed the results.

References