Estimation of perceptual scales using ordinal embedding

08/21/2019 ∙ by Siavash Haghiri, et al. ∙ 0

In this paper, we address the problem of measuring and analysing sensation, the subjective magnitude of one's experience. We do this in the context of the method of triads: the sensation of the stimulus is evaluated via relative judgments of the form: "Is stimulus S_i more similar to stimulus S_j or to stimulus S_k?". We propose to use ordinal embedding methods from machine learning to estimate the scaling function from the relative judgments. We review two relevant and well-known methods in psychophysics which are partially applicable in our setting: non-metric multi-dimensional scaling (NMDS) and the method of maximum likelihood difference scaling (MLDS). We perform an extensive set of simulations, considering various scaling functions, to demonstrate the performance of the ordinal embedding methods. We show that in contrast to existing approaches our ordinal embedding approach allows, first, to obtain reasonable scaling function from comparatively few relative judgments, second, the estimation of non-monotonous scaling functions, and, third, multi-dimensional perceptual scales. In addition to the simulations, we analyse data from two real psychophysics experiments using ordinal embedding methods. Our results show that in the one-dimensional, monotonically increasing perceptual scale our ordinal embedding approach works as well as MLDS, while in higher dimensions, only our ordinal embedding methods can produce a desirable scaling function. To make our methods widely accessible, we provide an R-implementation and general rules of thumb on how to use ordinal embedding in the context of psychophysics.



There are no comments yet.


page 3

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The quantitative study of human behavior dates back to at least 1860 when the experimental physicist Gustav Theodor Fechner published Die Elemente der Psychophysik Fechner (1860). Since Fechner’s seminal work the “measurement of sensation magnitude”—nowadays typically referred to as “psychophysical scaling”—has been one of the central aims of psychophysics (Gescheider, 1988)111Other central aims are to measure detection and discrimination thresholds, or just-noticeable-differences (JNDs), reaction times (RT) and confidence ratings, see e.g. Wichmann and Jäkel (2018).. Psychophysical scaling is formally defined as the problem of quantifying the magnitude of sensation induced by a physical stimulus (Marks and Gescheider, 2002, Krantz et al., 2007).

In the following we assume that there exists a physical quantity—the external stimulus—which we can objectively measure. The perception (or sensation, the subjective or internal experience) of the stimulus, however, is usually hard to measure and quantify. The (difference) scaling problem refers to experiments and methods designed to find the functional relation between the perceived (internal) magnitude and the (external) stimulus. An example of a scaling function is shown in Figure 1. In this figure, the physical stimulus and its perceived counterpart are denoted on the X and Y axes, respectively. Throughout the rest of the paper, we refer to this function as scaling function.

Figure 1: An example of a scaling function. The X-axis shows the physical stimulus values () with 10 discrete steps. Y-axis denotes the perceived value ()

1.1 Traditional scaling methods

Early attempts to obtain the scaling function by Fechner were based on the concatenation of just-noticeable-difference (JND), the smallest amount of change in the stimulus level which is noticeable by a human observer. Fechner assumed that each JND in corresponds to one fixed-size unit of the perceptual scale , and attempted to reconstruct the scaling function based on this assumption (Fechner, 1860, Luce and Edwards, 1958). Fechner thus tried to link discriminability and subjective magnitude in a simple way. However, the Fechnerian approach—albeit sometimes successful—has been vigorously criticised for both theoretical and empirical reasons and cannot serve as a generic method to obtain scaling functions, e.g. (Norris and Oliver, 1898, Stevens, 1957, Gescheider, 1988). Thurstonian scaling is an alternative approach proposed to solve the scaling problem in the tradition of linking discriminability to subjective magnitude, incorporating an internally variable mapping from stimulus to sensation (internal “noise” in modern parlance) (Thurstone, 1927)

. Thurstonian scaling is based on discrimination of stimuli pairs. The perceptual distance of two stimuli is determined by the probability that a human observer can discriminate them. However, like Fechner’s JND-approach, Thurstonian scaling is criticised because discriminability is, at best, only

indirectly and in yet to be understood ways related to sensory magnitude (Krantz, 1972, Stevens, 1961).

Another well-known approach to scaling, but this time not based on discriminability, is termed direct magnitude estimation (Stevens, 1957). In this approach a human observer is asked to provide intensity values corresponding to physical stimuli in a way that ratios of given values represent the ratios of perception. However, Shepard pointed out that there might exist an unknown and undesirable response transformation function which the direct magnitude estimation method neglects. (Shepard, 1981).

For a much more detailed and in-depth overview of the traditional psychophysical scaling methods see, e.g. Gescheider (1988).

Degree = 0
Degree = 10
Degree = 20
Deg = 30
Deg = 40
Deg = 50
Deg = 60
Deg = 70
Figure 2: Top: Eight stimuli used in the slant-from-texture experiment (Aguilar et al., 2017). Bottom, left: An example of a triplet question used for the experiment. The triplet question is: “Which of the bottom images, or , is more similar to the top image ?” Bottom, right: The scaling function estimated by the comparison-based embedding method (t-STE). The red points on the X-axis correspond to three stimuli (S), while the yellow points on the Y-axis represents their perceived values (). In Section 2 (Embedding methods), we describe in detail, how the position of yellow points corresponds to the ordinal embedding from the triplet questions.

1.2 Scaling and the method of triads

An alternative approach to data acquisition—neither based on JND-style discrimination nor on direct magnitude estimation—is based on triplet comparisons (Torgerson, 1958). This approach is often referred to as method of triads in the psychophysics literature. Based on a fixed discretization of the physical stimulus, say , the method of triads asks participants to make comparisons of the form ”Is stimulus more similar to stimulus or to stimulus ?". In the computer science and machine learning literature such a question is called a triplet question (or, interchangeably, a triplet comparison).

Rather than attempting accurate quantitative measurements of a particular phenomenon, triplet questions aim at qualitative (ordinal) observations. The obvious potential of such an approach is that the statements do not depend as much on the response transformation function of the observers, and that, e.g. the issue of scaling answers across many observers becomes easier. In addition, there exist studies in the machine learning literature that indicate the robustness of the triplet comparison approach (Demiralp et al., 2014, Li et al., 2016). The obvious challenge of the method of triads is how we can use the participants’ answers to estimate the scaling function. More precisely, we need to estimate the magnitudes of perception in a way that is consistent with the answers to the queried triplet questions.

Let us give an example to clarify the procedure of scaling using the triplet questions. A psychophysical “slant-from-texture” experiment is designed to find the functional relation of the perceived angle with the true angle of a tilted flat plane with a dotted texture (Rosas et al., 2004, 2005, 2007, Aguilar et al., 2017). Figure 2 (Top) shows the various stimuli used in the experiment by Rosas et al. (2004) and Aguilar et al. (2017). The bottom, left image of Figure 2 depicts an example of a triplet question designed for this task. The participant is asked “which of the two bottom images, or , is more similar to the top image ?” Based on the answers to a set of such triplet questions, the goal is then to reconstruct the scaling function that describes the relation of perceived angle  and the slant degree . Figure 2 (Bottom, right) shows the function that has been estimated with the t-STE method described below.

The approach of triplet comparisons—the method of triads—is not new to psychophysics; there has been a very long tradition in psychology to explore methods to estimate perceptual (difference) scales from clearly visible supra-threshold differences in stimulus appearance (Torgerson, 1958, Coombs et al., 1970, Marks and Gescheider, 2002). The earlier approaches are based on inferring a similarity matrix using the triplet comparisons. Recently a more generic approach, called “maximum likelihood difference scaling (MLDS)”, has become popular in vision science (Maloney and Yang, 2003, Knoblauch and Maloney, 2010). There have been reports that both naive, as well as seasoned observers, find the method of triads with supra-threshold stimuli intuitive and fast, requiring less training (Aguilar et al., 2017, Wichmann et al., 2017) than for the more traditional methods in psychophysics such as direct magnitude estimation or in particular methods based on JNDs.

Whilst clearly attractive, MLDS has some limitations, however: First, it makes a strong model assumption, namely that the scaling function is monotonic with respect to the stimulus. Second, the MLDS method can only be used to estimate one-dimensional scaling functions. Thus, it cannot deal with cases when perception is intrinsically multi-dimensional (e.g., color perception). Both issues are of potential relevance in a general psychophysical scaling setting.

On the other hand, the evaluation of comparison-based data has been an active field of research in computer sciences and machine learning (Schultz and Joachims, 2003, Agarwal et al., 2007, Tamuz et al., 2011, Ailon, 2011, Jamieson and Nowak, 2011, Van Der Maaten and Weinberger, 2012, Kleindessner and von Luxburg, 2014, Terada and von Luxburg, 2014, Ukkonen et al., 2015, Arias-Castro, 2015, Jain et al., 2016, Haghiri et al., 2017). The core question of these studies is to use the answers to triplet comparisons to find a Euclidean representation of the items (in our case, psychophysical stimuli). This problem is systematically studied in the machine learning literature under the name of ordinal embedding. A number of fast and accurate algorithms have been developed to solve the ordinal embedding problem (Agarwal et al., 2007, Van Der Maaten and Weinberger, 2012, Terada and von Luxburg, 2014). As we will show in this paper, these algorithms may also be useful in psychophysics, vision science and the cognitive sciences in general.

This paper is organized as follows: In Section 2 (Embedding methods) we review two traditional psychophysical scaling methods, NMDS and MLDS, that are used to analyze data from triplet comparisons222NMDS is not directly based on triplet comparisons; instead it relies on the rank order of dissimilarities, which is, however, closely related to triplet comparisons.. We then introduce the ordinal embedding problem of the machine learning literature and discuss its advantages in comparison to the traditional embedding methods of psychophysics. Section 3 (Simulations) is dedicated to extensive simulations comparing the performance of ordinal embedding to the applicable competitors in the psychophysics. In Section 4 (Experiments) we examine the ordinal embedding methods in two real psychophysics experiments. In Section 5 (How to apply ordinal embedding methods in psychophysics), we provide instructions and rules of thumb on how to use the comparison-based approach and the ordinal embedding algorithms in psychophysics experiments. In the last section, we conclude the paper by discussing the advantages of the ordinal embedding for scaling problem and mentioning the open problems.

2 Embedding methods

2.1 Non-metric multi-dimensional scaling (NMDS)

Non-metric multi-dimensional scaling (NMDS) by Shepard and Kruskal is a well-established method to analyze dissimilarity data (Shepard, 1962, Kruskal, 1964b, a). It assumes that a complete matrix of dissimilarities (not necessarily metric distances) between pairs of items is given. We denote the dissimilarity of items and by . In the context of psychophysics, this matrix usually comes from a human (psychophysical) experiment. Shepard posed the problem of estimating a d-dimensional Euclidean representation of items, say , such that the pairwise distances of estimates are consistent with a monotonic transform of the given dissimilarities. Key to the method is that it only takes the rank order of the dissimilarities into consideration. The reason is that in many psychophysics experiments, the magnitude of dissimilarities cannot be quantitatively measured, whereas the rank order of distances is considered to be more reliable—an argument we have made in favour of ordinal embedding, too (see above).

If is the Euclidean distance of embedded items and in , then the quality of a Euclidean representation is measured by a quantity called stress (Kruskal, 1964a):


where is a monotonic function to be determined. The smaller the stress, the better the Euclidean representation. The numerator measures the squared loss between the transformed input dissimilarities and the Euclidean distances . By minimizing the stress we try to achieve that the distances are as close as possible to the monotonic transform of dissimilarities . The role of the denominator is to prevent the degenerate solution, where and become infinitesimal together.

The goal of NMDS is to find the Euclidean representation of items that minimizes the stress function, where can be chosen from the set of all monotonic transform functions. The approach by Kruskal (Kruskal, 1964b) finds an estimation of the optimal solution through a two-step optimization procedure. In the first step, a configuration of embedding points is fixed; this means that the distance values are also fixed. Then a greedy algorithm is suggested (later called isotonic regression) to find the monotonic function that minimizes the stress function. In the second step of optimization the values of are fixed and the embedding points are adjusted by a gradient descent algorithm to minimize the stress. The two steps are repeated consecutively until the stress value shows no further improvement or it becomes smaller than a certain threshold.

The NMDS algorithm has been used extensively in psychology Reed (1972), Smith and Ellsworth (1985), Barsalou (2014), neuroscience (de Beeck et al., 2001, Kayaert et al., 2005, Kaneshiro et al., 2015) and broader fields (Liberti et al., 2014, Machado et al., 2015). The non-parametric flavor of the method makes it a general purpose algorithm that is easy to apply. In addition, it can find representations in multi-dimensional spaces. However, there are two major drawbacks: First, the proposed optimization algorithm tries to solve a highly non-convex optimization problem, and typically gets stuck in a local, but no the global minimum of the stress function. This local optimum can be arbitrarily far off from the global optimum. The second issue is the requirement on the input data: as described above, the algorithm needs the full dissimilarity matrix as input. Alternatively, in a setting of triplet comparisons one can also implement the algorithm with just the knowledge on the ranking (ordering) of all the distance values . This ordering can be computed from triplet questions, but it requires in order of triplet questions to sort all pairwise distances. This property makes NMDS infeasible for many applications in psychophysics, as the number of required triplet comparisons get too large.

To get a feeling, consider the cases of and stimuli. Suppose that we have stimuli. There exist dissimilarity values. The NMDS method requires full order of distances. This means on average it needs to ask triplet questions. The amount of triplet comparisons for embedding methods depends on the embedding dimension. Let assume the embedding is in 2-dimensional space. Then, the embedding methods require in order of triplet comparisons. In this example, this amount would be . Figure 6 shows the result of a simulation in a similar setting, where . Even though the embedding method (Figure 6 (c)) uses less information, it produces an embedding with higher quality. The number of stimuli is and embedding dimension is 2.The difference is even more drastic with larger . If we assume , with the same calculations, NMDS requires about 12570 triplet comparisons, whereas the ordinal embedding methods require about 570 triplet comparisons.

2.2 Maximum likelihood difference scaling (MLDS)

Decades after the introduction of NMDS, maximum likelihood difference scaling (MLDS) was proposed to solve a specific instance of the difference scaling problem (Knoblauch et al., 1998, Maloney and Yang, 2003, Krantz et al., 2007). Originally MLDS asked quadruplet questions which involve four stimulus levels. If we denote the perceptual scale of four stimuli by , then a quadruplet question asks whether the difference in perception is larger or smaller than the difference of perception . Note, however, that triplet questions are indeed a subset of quadruplet questions, implying that the MLDS method is also applicable to triplet questions.

There are two main assumptions in the MLDS model. First, it assumes that the perceptual scale is a scalar (one-dimensional) denoted by . Second, the MLDS method assumes that the perceptual scale grows monotonically with respect to the stimulus. In particular, it assumes that the order of two stimuli in the physical space implies the same order in the perceptual scale: .

In contrast to NMDS, the MLDS method uses a parametric model; for a quadruplet of stimulus levels

, for simplicity denoted by

, a decision random variable is defined as


is a zero-mean Gaussian noise with standard deviation

. If , then the observer would respond that the pair has a larger difference than the pair . In this case the response to the quadruplet is set to , otherwise the response is . The goal of the MLDS is now to estimate the perception scale that maximizes the likelihood of the observed quadruplet answers. We first set to remove degenerate solutions. Now, assuming that denote the independent responses to quadruplet questions, the likelihood of the perceptual scales given the quadruplet answers is


denotes the cumulative distribution function of

, and for the quadruplet . This likelihood is not convex with respect to the perceptual scale values . Thus, the proposed numerical methods to maximize this likelihood might end get stuck in only a local maximum.

There are a number of advantages of the MLDS method: The maximum likelihood estimator is unbiased and has minimum variance among the unbiased estimators. As a practical advantage, it has been shown empirically that a small subset of quadruplets is enough for the convergence of the algorithm. Finally, it has been shown that the variance of the output behaves reasonable with respect to the input noise level 

(Maloney and Yang, 2003).

However, MLDS also has a some drawbacks. First, the algorithm only works for a one-dimensional perceptual spaces. In some cases (see the examples of color and pitch perception in Figure 3) the scales need more than one dimension, however. Second, even in the one-dimensional case, the assumption on the monotonicity of the scaling function is restrictive and may not hold for all psychophysical settings. Finally, the nice theoretical properties (unbiasedness, minimum variance solution) only hold for the global optimum of the MLDS likelihood function, but not for the local optima that are realistically obtained by the optimization algorithm.

2.3 Ordinal embedding

2.3.1 General setup

The comparison-based setting has recently become popular in machine learning literature (Schultz and Joachims, 2003, Agarwal et al., 2007, Van Der Maaten and Weinberger, 2012, Amid and Ukkonen, 2015, Ukkonen et al., 2015, Balcan et al., 2016). Instead of stimulus levels, in machine learning we deal with a set of abstract items, say that come from some abstract space . Furthermore, we assume that there exists a dissimilarity function that describes the dissimilarity of the items. Often, in machine learning we assume that is symmetric, but not necessarily a metric. In our current setting in psychophysics, we now assume that the function is not available, yet we have access to an oracle which responds to a triplet question , based on the unknown dissimilarity. The triplet question will be “Is item more similar to item or item ”? We denote this triplet question by . The response to the triplet is denoted by and stored as the following:


Assume that the answers to a subset of triplet questions are collected from the oracle. Given an embedding dimension and the answers to the triplet questions , the ordinal embedding aims to find points in a d-dimensional Euclidean space whose Euclidean distances are consistent with the answers of the queried triplet questions. The consistency of an embedding with respect to triplet can be judged as following:

where function returns the sign of a real value. The goal of ordinal embedding is to find an embedding that maximizes the number of consistent triplets. Intuitively, we would like to solve the following optimization problem:


However, there are major algorithmic obstacles. It is not always possible to find a perfect -dimensional embedding for an arbitrary dissimilarity function . Moreover, in a practical setting the answers to the triplets might be noisy. Therefore, the optimal solution is not necessarily consistent with the full set of triplets

. And finally, as written above the objective function is discrete-valued, which makes it even harder to optimize. Hence, various adaptations of the stress function and optimization heuristics are used to address these problems. For the purpose of this exposition, we want to keep it at this intuitive level, below we describe one particular algorithm in more detail.

2.3.2 Connection to the scaling problem

One can see that ordinal embedding solves the scaling problem of psychophysics in the following way: the different stimuli play the same role as the abstract items in the ordinal embedding problem, and the perception values correspond to the embeddings . Concretely, given a standard scaling function as in Figure 2 (bottom right), the ordinal embedding output corresponds to the positions of the perception values on the y-axis (yellow points in Figure 2, bottom right). Thus, given the ordinal embedding output (y-values) and the values of the physical stimuli, we can reconstruct the scaling function.

To make it concrete once more in the specific example of the slant-from-texture problem in Figure 2): Given the slant stimuli , participants were asked a number of triplet questions involving the stimuli . Then we fed the answers of these triplet questions to an ordinal embedding algorithm and asked the algorithm to construct a 1-dimensional embedding. This resulted in the yellow points on the y-axis. We only depicted three stimuli out of eight with yellow points, in order to keep the plot neat. These points can now be identified as the perception values , so we can finally draw the scaling function by connecting the points . More details on this experiment are provided in the Experiments section.

While in the example of the slant experiment we used a one-dimensional embedding, ordinal embedding methods can also construct a multi-dimensional embedding that describes the perceptual space of humans. Let us discuss two examples that demonstrate why this might be important. piOne almost “famous” example is color perception. Figure 3 (Left) shows the two-dimensional color circle proposed by Shepard and Ekman (Shepard, 1962, Ekman, 1954). The figure has been constructed with the NMDS algorithm based on a 1414 similarity judgment matrix. The wavelength of each color is written at the right side of each colored dot. In our context the important observation is that human observers perceive the the violet colors with low wavelengths as similar to the red colors with high wavelengths. This suggests a circular perceptual internal space, which can only be realized in at least two dimensions. A second example is pitch perception of sounds. Even though auditory frequency is again one-dimensional, the pitch is perceived along a three-dimensional helix (Shepard, 1982, Houtsma, 1995). Figure 3 (Right) shows the proposed perception space by Shepard. In both cases, pitch and color, the multi-dimensional ordinal embedding can enable the researcher to find perceived values in a higher-dimensional Euclidean spaces that properly capture the similarity-structure of perception.

Figure 3: Left : The two-dimensional circle of color perception gathered by similarity measurements between 14 colors (Shepard, 1962). The wavelength of each color is written on the right side of the colored dot. Note that we have reconstructed the color circle with the non-metric MDS method based on the original dissimilarity data. Right: The helix proposed by Shepard for the pitch perception. The physical stimulus, i.e. pitch, varies along the spiral path of the curve and the three-dimensional space describes the perception (Shepard, 1982). Note that the visualization is done in three dimensions, however, two parameters, called chroma and height by Shapard, are enough to describe the perception.

2.3.3 Stochastic triplet embedding

In recent years, there has been a surge of methods to address the ordinal embedding problem in the machine learning community, for example generalized non-metric multidimensional scaling (GNMDS) (Agarwal et al., 2007), the crowd-median kernel (Tamuz et al., 2011), stochastic triplet embedding (STE) (Van Der Maaten and Weinberger, 2012) and local ordinal embedding(LOE) (Terada and von Luxburg, 2014). In general, the focus of the machine learning community is to build methods that require only a small number of triplets to embed a large number of items, make as few assumptions as possible, and to be robust towards noise in the data.

In the following we focus on one particular class of methods, stochastic triplet embedding (STE) and its variant t-distributed stochastic triplet embedding (t-STE), because in our experience they work very well and are based on a simple model that is also plausible in a psychophysics setting. The STE method introduces the probabilistic model defined in Equation (4) to solve the ordinal embedding problem. Assume that were the correct representations of our objects. The model assumes that if a participant is being asked whether is closer to or to , then he gives a positive answer with probability


Intuitively, “easy” triplet questions (where the distances and are very different) will be answered correctly in most of the cases, whereas difficult triplet questions (where is about as large as ) can easily be mixed up. Given the answers to a set of triplets, the STE algorithm attempts to maximize the likelihood of the embedding point configuration with respect to the answered triplets. If the answer to a triplet question is given according to Equation 2, and if we assume that triplet questions are answered independently, the likelihood of an embedding given the answers to a set of triplets is given as

The log-likelihood is maximized to find the solution of ordinal embedding. In the above formulation, the probability of satisfying a triplet goes rapidly to zero when the difficulty of a triplet question increases. As a result, a severe and a slight violation of a triplet are penalized almost the same. To make the statistic more robust, the authors propose to replace the Gaussian functions with Student-t functions with a heavier tail kernel (Van Der Maaten and Weinberger, 2012). The modified method is called t-distributed STE (t-STE).

This algorithm can deal with a large number of items (stimulus levels) and reasonable number of triplets, and it is robust to noise, which is an important characteristic when dealing with psychophysics data. Unlike MLDS, one-dimensional functional relations (embeddings) are not restricted to monotonic functions, and the algorithm is capable of embedding in higher dimensional Euclidean spaces. However, as all other ordinal embedding methods, the proposed optimization problem is not convex, which makes it vulnerable to inappropriate local optima.

2.4 Summary of embedding methods

In Table 1, we summarize the properties of the different embedding methods. The ordinal embedding methods can produce high quality results with a small set of triplet answers. This property makes them superior to traditional NMDS that requires the full order of distances. On the other hand, the embedding methods are not limited to the case of one-dimensional monotonic functions, as it is the case for MLDS.

As the number of items (and consequently the number of triplets) grows, the ordinal embedding algorithms become drastically slow. This is however, more of a concern for machine learning purposes which deal with thousands of items and hundreds of thousands of triplets. The algorithms (particularly STE and t-STE) have a quite acceptable running time for standard psychophysics experiments. Our experiments are performed on an iMac 18.3 (2017) with a 3.4 GHz i5 quad-core processor. On this machine, the (t)-STE algorithm, implemented in MATLAB, requires about 30 minutes to embed 100 items in two dimensions using 2000 triplet answers.

Method Data required Statistical noise model Multi-dimensional Restrictions
NMDS Complete order of distances No Yes
MLDS Partial set of quadruplets Yes No Monotonic functions
t-STE Partial set of triplets Yes Yes
Table 1: The comparison of ordinal embedding methods. Each row corresponds to one method, while the properties are listed in the columns.

3 Simulations

In this section, we compare ordinal embedding with the traditional embedding approaches in psychophysics (NMDS and MLDS) with diverse simulations. We consider one-dimensional and two-dimensional perceptual spaces.

3.1 Simulation setup

Stimulus and perceptual scale: We assume that the stimulus and perception are measured on a scale from 0 to 1, and the true relation between the physical stimulus and the perception is encoded by a function , where the dimension of the perceptual space will be 1 or 2. We consider uniformly chosen steps for the stimulus levels, denoted by . In our simulations we assume that a true perceptual scale exists, for the stimulus is denoted by . We will choose different functions for our different simulations below.

Generating subsets of triplet questions: Our goal will be to feed the same number of triplet questions to each of our algorithms. However, we need to be a bit careful regarding which comparisons are “valid” for each of the algorithms. The MLDS method assumes that the perceptual function is monotonic. If we consider three stimuli such that , then MLDS always assumes that and —from the MLDS point of view, it would be useless to ask a participant for the comparison of to . The only useful triplet question for MLDS is whether is smaller or larger than the distance . Thus, for any set of three stimuli, there is only one “valid” triplet question, leading to a total of valid triplet questions for MLDS. All other algorithms under consideration do not make any monotonicity assumption. Here, for any set of three different stimuli we can ask three useful triplet questions, leading to a total of valid triplet questions.

In all our simulations, we feed the same number of triplets to all embedding algorithms. A random subset of triplets are chosen without replacement from the set of all valid triplets for each algorithm, where this set of valid triplets is different for MLDS and the other algorithms, as described above. In our simulations, the size of the random subset of triplets will be chosen in the range with . The value is equivalent to choosing the whole set of valid triplets for the MLDS method and a third of the set of valid triplets for the other methods.

Note that MLDS and ordinal embedding methods get triplet answers as input. However, NMDS needs a dissimilarity matrix. Therefore, we design a fair procedure, explained later in this section, to construct the required dissimilarity values.

Underlying model to generate triplet answers: In order to generate answers to the triplet questions, we construct a model that resembles a typical observer of a psychophysical experiment. Given a fixed perceptual scale function , we assume that the simulated observer answers the triplet questions based on a noisy version of this function, denoted by , where is a zero-mean Gaussian noise with unit covariance matrix and standard deviation in d dimensions. In our simulations we use in the range of . The simulated observer produces the answer to the queried triplet question by

Note again that the embedding values play the same role as the perceptual scale values in the psychophysics notation. We sometimes use a different notation to emphasize that the embedding values can be multi-dimensional, and to make a clear distinction to scalar values of . The scalar is depicted in Figure 1 and also in the MLDS method.

Feeding triplet answers to the algorithms. The above-mentioned model produces answers to the triplet questions. In case of (t)-STE, these triplet answers directly serve as the input to the algorithm. The same is true for MLDS, because we took care to only sample valid triplets. For NMDS, however, we need to proceed differently. NMDS requires dissimilarities between pairs of items (but in the end only makes use of the order between these values due to the monotonic transformation function used in the definition of stress in Equation (1)). For our simulations, we generate a set of noisy perceptual values for stimuli levels as before, , and then explicitly compute all dissimilarity values . These are then the values that we give to the NMDS algorithm. Strictly speaking, the NMDS algorithm thus gets to see more information about the data, but it does not access it because in the end it only considers the ordering between the dissimilarities. Also note that because NMDS requires the full matrix of dissimilarities, we only apply and compare the NMDS algorithm with MLDS and the embedding methods when , that is all triplet questions are being asked. This procedure makes sure that all three algorithms get the same amount of information.

Embedding methods: We now apply various algorithms to generate embeddings or perceptual scales. For STE and t-STE we use the MATLAB implementation by Van Der Maaten and Weinberger (2012) 333

. We use the default optimization parameters for both methods. The degree of freedom for the t-Student kernel is set to

for the t-STE method. We also use the R-implementation of a second algorithm from the machine learning community, local ordinal embedding (LOE)444, with the default parameter settings. For MLDS, we use the R-package available on CRAN repository555, again with the default optimization parameter settings. For the NMDS algorithm, we use MATLAB implementation, which is available by calling the function “mdscale”. The implementation optimizes the stress function defined defined by Kruskal (1964b); see Equation (1).

In all cases, we set the embedding dimension to the dimension of true perceptual function. In the section on real experiments we also consider cases where the embedding dimension is not known.

All embedding methods solve a non-convex optimization problem and thus are prone to find inaccurate local optima. To reduce this effect, we run all the algorithms 10 times with random initializations. Among the 10 embedding outputs we choose the one which has the least triplet error (see next subsection for a definition).

Independent of the above repetition, which is done to remove the effect of local minima, each embedding method is executed 10 times, on 10 independent draws of the random input data. This repetition is meant to analyze the average behaviour and the variances of the algorithms. We finally report the average values of the evaluations over the 10 repetitions. The standard deviations are reported in the supplementary material.

Figure 4: Comparison of various ordinal embedding methods (LOE, STE, t-STE) against the traditional embedding methods in psychophysics (MLDS and NMDS) for a monotonic one-dimensional perceptual function (Sigmoid). (a) The true perceptual function (). (b) Ten embedding results () of the MLDS method for a fixed value of standard deviation and triplet fraction . (c) Ten embedding results () of the STE method for a fixed value of standard deviation and triplet fraction . (d) The average MSE of embedding methods. (e) The average triplet error of embedding methods.

Evaluating the results: We consider two approaches to evaluate the performance of the various methods:

  1. Mean-squared-error (MSE): For one-dimensional perceptual spaces, we can compute the mean-squared-error (MSE) between the estimated scales and the true perceptual function values . Since the embedding result is unique only up to similarity transformations (scaling, rotation and translation), we need two steps of normalization before computing the MSE. First, we transform the output of embedding to be in the range of as our scaling functions are defined in this range. More precisely, we shift (translate) the minimum value to zero and divide all the values by the maximum. Secondly, If we get the output as a result of embedding, this answer is not unique. More precisely, can also be considered as an answer without violating any triplet of the input set. Therefore, we choose between and , the output which shows a better performance with respect to the MSE. In this way we choose the best rotation of the output.

  2. Triplet error: The MSE criterion is cumbersome to compute in multivariate scenarios, because we have to take into account all possible rotations of the embeddings. Moreover, in real-world scenarios, MSE cannot be computed anyways because the underlying ground truth is unknown. As an alternative, we propose to evaluate the quality of an embedding by its ability to predict the answers to (potentially new) triplet questions. To this end, we compute a quantity called the triplet error. Given an embedding and a validation set of triplets, the triplet error of the embedding with respect to is defined as


    where the characteristic function

    takes the value 1 if the expression in the curly parenthesis is true (that is, if the estimated embedding is not consistent with the new triplet ), and it takes the value 0 otherwise.

    In words, the triplet error counts how many of the triplets in are not consistently represented by the given embedding. In practice, we are typically provided with only one set of answered triplets; this set then has to be used both for constructing the embedding and for evaluating its quality.

    The first way is to set , meaning that we use the same set of triplets to construct the embedding and to measure its quality. In a second way, we perform -fold cross-validation to avoid overfitting: We partition the set of input triplets into non-intersecting subsets (“folds”). We perform the embedding and the evaluation times. In each iteration we pick one of the folds as the validation set () and the rest of the folds as the training set (the input to the embedding algorithm). The final triplet error is the average over the triplet errors of the validation sets. Throughout the rest of the paper, we refer to the latter approach as cross-validated triplet error, while the first approach is simply called the triplet error.

3.2 One-dimensional perceptual space

3.2.1 Simulations with monotonic scales

Our first simulation involves a typical monotonic function as it occurs in many psychophysics experiments. The true perceptual function

(a Sigmoid function) is shown in Figure 

4 (a). Figure 4 (b) and (c) show the output embedding of the MLDS and STE algorithms for 10 iterations, respectively. The other ordinal embedding methods have a similar performance and the output embeddings are reported in the supplementary material. The average (over 10 runs) MSE and triplet errors of various embedding algorithm are depicted in Figure 5 (d) and (e), respectively.

In both error measures the MLDS method performs better than the ordinal embedding algorithms. The obvious reason is that MLDS makes a strong model assumption, namely the monotonicity of the scaling function, and this assumption is satisfied in this example. Hence it has an inductive advantage that pays off. The ordinal embedding algorithms also show an acceptable performance, however. In particular, when we provide more triplet answers () the average errors of MLDS and the ordinal embedding algorithm tends to be the same. More detailed results regarding this simulation, including the four ordinal embedding outputs and the performance of algorithm with other values of , can be found in the supplementary material, see Figure 10. We also examine another monotonic function in Figure 11 of the supplementary material, and the results are consistent with the ones presented here.

Figure 5: Comparison of various ordinal embedding methods (LOE, STE, t-STE) against the traditional embedding methods in psychophysics (MLDS and NMDS), for a non-monotonic one-dimensional perceptual function (second degree polynomial). (a) The true perceptual function (). (b) Ten embedding results () of the MLDS method for a fixed value of the standard deviation and triplet fraction . (c) Ten embedding results () of the STE method for a fixed value of standard deviation and triplet fraction . (d) The average MSE of embedding methods. (e) The average triplet error of embedding methods.

3.2.2 Simulations with non-monotonic scales

We now perform the same experiment with a non-monotonic function: a second-degree polynomial function is chosen as the true perceptual function ; see Figure 5 (a). Figure 5 (b) and (c) show the output embedding of the MLDS and STE algorithms for 10 iterations respectively (the embeddings produced by LOE and t-STE are quite similar to the STE, see supplementary material). The average (over 10 runs) MSE and triplet error of various embedding algorithm are depicted in Figure 5 (d) and (e) respectively.

The function shapes depicted in Figure 5 (b) show the poor performance of the MLDS method for the non-monotonic function: MLDS tries to fit the most consistent monotonic function to the input triplets; but as the ground truth is far from being monotonic, the MLDS result is far off. The average MSE and triplet errors are also significantly larger for the MLDS. In contrast, the ordinal embedding algorithms can correctly estimate the true function shapes. Note that the ordinal embedding methods are capable of discovering a much broader range of scaling functions (non-monotonic scales) with the same number of triplets as we used for the monotonic scales.

Similar to the monotonic functions we report the full details of the simulation in supplementary material; see Figure 13. We also perform the simulation on a Sinosoid function. The results are quite similar to the second-degree polynomial function and are demonstrated in the Figure 12 of the supplementary material.

Figure 6:

Comparison of the ordinal embedding methods (MLDS, STE and TSTE) against the traditional NMDS method of psychophysics for the two-dimensional color perception function. (a) The true perceptual function in two dimensions. The stimulus value, color wavelength, is written beside each color. The two-dimensional vector space represents the perceptual space. (b) The embedding result of the NMDS method depicted in two dimensions for a fixed value of standard deviation

and triplet fraction . (c) The embedding result of the STE method depicted in two dimensions for a fixed value of standard deviation and triplet fraction . (d) The average triplet error of various ordinal embedding methods in comparison with the NMDS method.

3.3 Multi-dimensional perceptual space

So far, we considered simulations in which the perception could be represented in a one-dimensional Euclidean space. However, in some cases such as the examples of color and pitch perception in Figure 3, more than one dimension is required to represent the perception. Here, we perform a simulation with a function mapping from one-dimensional stimulus space into a two-dimensional perceptual space.

In order to construct a realistic psychometric function , we use the color similarity data666 presented in Ekman (1954). We first construct a two-dimensional embedding using NMDS; see Figure 6 (a). In the following, this embedding will be considered our ground truth, which will then be used to generate further data (let us stress: we do not argue that this embedding is “correct” in any way; we just use it as a ground truth to generate further simulations). Figure 6 (a) shows this embedding. The wavelength of each color is also denoted beside the color. The X-Y axes of the plot correspond to the two perceptual dimensions of the color.

To generate noisy triplets from our ground truth, we essentially proceed as before: we rescale the stimulus sizes (wavelengths) to the range of to be consistent with the underlying model we defined earlier. We define the true ground truth function by our ground truth embedding, which is the true two-dimensional representation of a stimulus is given by the values in Figure 6. Now we generate noisy versions of the perceptual scale functions, random subsets of triplets and noisy answers to triplet questions as described in the beginning of this section, and use the various algorithms to compute an estimated embedding of all the stimuli. We fix the embedding dimension to for the following embedding methods: NMDS, LOE, STE and t-STE. However, MLDS is only capable of embedding in one dimension. Thus, we perform MLDS with . Figure 6 (b) and (c) show the two-dimensional embedding output of the NMDS and STE algorithms respectively. The embedding are shown for the parameter values and . The average triplet error of various embedding methods is shown in Figure 6 (d) for the parameter value .

The comparison of Figure 6 (b) and (c) reveals the different performances of NMDS and ordinal embedding methods in the presence of noise. NMDS is known to be quite vulnerable to noise, and this can be seen from the figures. While STE produces a circle of colors fairly similar to the true perceptual function, the colors are somewhat mixed up in the NMDS embedding. The triplet error also shows that ordinal embedding algorithms outperform the NMDS method significantly—even if we have only half or less of the triplets available! Finally, and as expected by design, MLDS cannot produce an embedding in two dimensions. When evaluated on its one-dimensional embedding, it unsurprisingly produces triplet errors that is much larger than the one of the two-dimensional embedding methods. More details regarding this experiment can be found in the supplementary material; see Figure 14.

Figure 7: (Top) Average and standard deviation of cross-validated triplet error for 8 subjects of the slant-from-texture experiment. Each group of bar shows the error for one subject, as each bar in the group corresponds to one of the embedding methods shown with different colors. (Bottom) The embedding outputs for 8 subjects with two embedding methods: MLDS and t-STE. The MLDS method is depicted at the top row while the t-STE is shown at the bottom.

4 Experiments

In this section, we apply the comparison-based approach and ordinal embedding methods to two real experiments in visual perception: the slant-from-texture experiment that we have already mentioned above, and a more complex “Eidolon” experiment.

4.0.1 Slant-from-texture experiment

This experiment intends to find the functional relation between the perceived angle of the slant with a dotted surface and the actual physical degree of slant. The dataset has originally been generated in (Aguilar et al., 2017) (see the paper for more information on the experimental settings). Figure 2 (top) shows the eight stimuli used in this experiment. The degree of slant is varied from 0 to 70 degrees in steps of 10 degrees, making 8 stimulus levels. Then participants had to answer triplet comparisons. As the experiment has initially been performed with the assumption of a monotonic relation of slant degree and the perception, for each combination of three stimuli (three degrees of tilting) only one triplet question has been asked: whether . With 8 levels of the stimulus, this results in possible triplet questions. Eight subjects participated in the study. Each subject has answered all 56 triplet questions many times, in order to reduce the effect of noisy responses. Subjects have answered 420 triplet question in total, while the other subjects answered 840.

Since the ground truth embedding is unknown, we can only rely on the triplet error for evaluation of the embeddings. To avoid overfitting we use 10-fold cross-validation to compute the cross-validated triplet error (see the definition in the simulation setup). Figure 7 (top) shows the average and standard deviation of the cross-validated triplet error for 8 subjects and four embedding methods, including: MLDS, STE, t-STE and LOE. All ordinal embedding algorithm have very similar performance to MLDS—thus in this real-world example the advantage of MLDS over the ordinal embedding algorithms see in Figure 4 appears to have vanished.

In addition to the triplet error, we also show the embedding outputs of MLDS and t-STE for 8 subjects in Figure 7 (bottom). Note that these plots are generated with the full set of triplets, not only the training folds that are used to evaluate the triplet error. The resulting functions are similar, both across the two methods and across the participants. However, it is curious to see that the t-STE embeddings produced for observers 1 and 2 are not monotonous. This is an effect that could not happen for MLDS, because MLDS always outputs monotonous functions. On the other hand, the triplet errors in both cases are comparable. It would now be interesting to further investigate the perceptions of observers 1 and 2 in more details; however this would require more lab experiments involving the two observers, which is beyond the scope of this paper. At this point we can only stress that ordinal embedding methods at least have the potential to discover interesting, non-standard patterns in perceptual data that might be overlooked by MLDS.

4.1 The eidolon experiment

Figure 8: Left: The original image from our Eidolon experiment. Right: An example triplet question — “Which of the bottom two images is more similar to the top image?”

Our final setup concerns the comparison of images. To generate images we use the Eidolon Factory by Koenderink et al. (2017)—more specifically, its partially_coherent_disarray() function. In this toolbox, a given basis image can be distorted systematically using three different parameters called reach, grain and coherence. An Eidolon of a basis image corresponds to a parametrically altered version of this image. Reach controls the strength of a distortion (the higher the value, the stronger the amplification), grain modifies how fine-grainedness of the distortion (low values correspond to ‘highly fine-grained’), whereas a parameter value close to 1.0 for coherence indicates that “local image structure [is retained] even when the global image structure is destroyed” (Koenderink et al., 2017). From a perceptual point of view we want to know which and to what degree the image modifications influence the percept. Starting with a black and white image of a natural landscape as basis image (see Figure 8, left), we generate 100 altered images, using reach and grain in and coherence in . All possible combinations of these parameter values result in different images.

Lab experiment setup:

In our lab, we asked three participants aged 19 to 25 to answer triplet questions, see Figure 8 (right) for an example question. For this purpose, participants use a standard computer mouse to click on the one of the two bottom images that they deemed more similar to the top image. Stimuli were presented on a pixels ( mm) VIEWPixx LCD monitor (VPixx Technologies, Saint-Bruno, Canada) at a refresh rate of 120 Hz in an otherwise dark chamber. Viewing distance was 100 cm, corresponding to degrees of visual angle for a single pixels image. The surround of the screen was set to a grey value of in the range, the mean value of all experimental images. The experiment was programmed in MATLAB (Release 2016a, The MathWorks, Inc., Natick, Massachusetts, United States) using the Psychophysics Toolbox extensions version 3.0.12 (Brainard, 1997, Kleiner et al., 2007) along with the iShow library of the Wichmann-lab (

Answers had to be given within seconds after a triplet presentation onset, otherwise the triplet was registered as unanswered and the experiment proceeded to the next triplet (this occurred in only % of all cases and can thus be safely ignored). The experiment was self-paced: once a participant had answered a question, the next one appeared directly after a short fixation time of seconds during which only a white pixels fixation rectangle at the center of the screen was shown. Before the experiment started all test subjects were given instructions by a lab assistant and performed 100 practice trials to gain familiarity with the task. The set of practice triplets is disjoint from the set of experimental triplets. Participants were free to take a break every 200 triplet questions. They gave their written consent prior to the experiment and were either compensated €10 per hour for their time or gained course credit towards their degree. All test subjects were students and reported normal or corrected-to-normal vision. The experiments were carried out in accordance with the guidelines of the Deutsche Gesellschaft für Psychologie (DGPs). Informed consent was obtained for experimentation by all participants.

Experiment design:

The first step is to design a plausible subset of triplet question. There exist 100 stimuli, giving rise to around possible triplet questions! In contrast to the previous slant-from-texture experiment it is absolutely impossible to ask the whole set of triplet questions—NMDS would thus not be possible. Here the machine learning theory literature comes to aid: it has been proven theoretically that if the embedding dimension is , then of the order of triplet questions are sufficient to reconstruct the Euclidean representation of items up to small error (Jain et al., 2016). Even though this is just an asymptotic statement and constants are completely ignored, it gives a guideline. If we assume that the perceptual embedding dimension is not more than (because three parameters are involved to modify the images), then triplet questions. In order to have sufficient triplets, and considering the training and test split, we hence decided to ask 6000 triplets from each participant of the experiment. The triplets are chosen uniformly at random from the set of possible triplets. The triplets have been asked in three sessions each consisting of 2000 triplets each.

Figure 9: Cross-validated triplet error of three embedding methods for three subjects of the Eidolon experiment. Each plot corresponds to one subject and each curve denoted the cross-validated triplet error of one method. The x-axis is the dimension of embedding.

Based on the triplet answers, we now run the ordinal embedding algorithms (STE, t-STE, LOE). As the best embedding dimension is not known, we test dimensions in the range . MLDS method is again performed only in one dimension, as it is not applicable in multi-dimensional cases. We perform 10-fold cross-validation, and the cross-validated triplet error (see Equation 5) is reported as the evaluation criterion.

Figure 9 shows the cross-validated triplet error for three subjects with various dimensions and three embedding methods. Each plot corresponds to one subject, while each curve shows the error corresponding to an embedding method. We observe that t-STE consistently outperforms the other methods. The cross-validated triplet error for MLDS is larger than 0.25 for all three subjects. Thus, MLDS is not comparable to the performance of the best embedding methods, and omitted from the plots—it would be off the scale in each of the panels. For all three subjects, increasing the embedding dimension from one to two definitely improves the embedding error—hence we obviously need more than one dimension to describe the perceptual space. Adding further dimensions in most cases does not really help. It looks as if further investigations, and in particular more participants and a joint analysis over all participants would be necessary to come to a conclusion here if one wanted to know how the parameters of the Eidolon Factory are connected to perception.

The above results show that best embedding method (t-STE) leads to a cross-validated triplet error around . How should we know if the error is acceptable, or whether it might be possible to reach a much lower error, for example by collecting more triplets? To answer this question, we would need to know what the error baseline of human participants is. In particular, there might be a proportion of ambiguous triplets for which no obviously “correct” answer exists. For example, if we knew that 80% of the triplet questions had an easy, obviously correct answer, and 20% of the questions were so hard that the answer was almost random, then the best error rate we could hope for would be around 10% (on 80% of the triplets we do not make any error, on 20% of the triplets we guess randomly, getting about 10% right and 10% wrong). In case of the Eidolon experiment we do not have any external knowledge about the difficulty of triplets. Thus, we conducted a side experiment. We chose a set of 2000 random triplets and asked these questions three times to each subject (triplets have been shuffled such that subjects did not realize that they are answering the same triplets repeatedly). We now estimate the “difficulty” of a triplet by how consistent the repeated answers were: if a subject answers the same triplet question with different answers, we consider it as “hard”, otherwise as “easy”. We performed this experiment on three subjects. They show the following percentage of hard triplets: , and . Having these answers, we would expect at least triplet error. The cross-validated triplet errors reported in our plots above are pretty close to this value, suggesting that our ordinal embeddings are close to what is achievable.

5 How to apply ordinal embedding methods in psychophysics

In this section we would like to present some rules of thumb to make ordinal embedding methods more applicable for a researcher who is unfamiliar with the methods.

5.1 How many triplets?

For a set of stimuli, there exist many triplet questions — already for moderate this is by far too many to ask to a participant of an experiment. However, the good news is that for the ordinal embedding methods a small subset of triplets already contains enough information to accurately reconstruct the true embedding. It has been proven that if the required embedding dimension is , then of the order triplets are sufficient to reconstruct the true embedding of items (stimulus levels) up to a small error (Jain et al., 2016). According to this result, we suggest to start with a subset of size or triplets and perform the ordinal embedding. If the time budget allows, one can still increase the number of triplets and see whether the error improves significantly, but should be good baseline.

5.2 How to make the subset of triplets?

Consider a set of stimuli. At the fist step, one needs to consider the whole set of possible triplets. As we mentioned earlier in simulations, every combination of three items from the stimuli set gives rise to three questions. Therefore, the complete set of possible triplet questions contains

triplets. The set of all possible triplet might be very large indeed, thus a small subset of triplets needs to be sub-sampled. A natural question is: Which of the triplet questions among the whole set of possible questions should be chosen? Over the course of many years we have tried many subsampling strategies in our group (Luxburg-lab): based on landmarks, based on active learning, based on estimated confidence values, based on the difficulty of triplet questions, etc. However, in all our experiments the simple strategy of selecting triplets uniformly at random from the set of all possible triplets outperformed all other strategies in terms of triplet error. Hence our general rule of thumb is to apply the straightforward random sub-sampling method.

5.3 How to evaluate the quality of embedding?

We reported the MSE in our simulations; however, the true perceptual scale is not available in a real experiment. The general approach that we suggest for the evaluation of ordinal embedding is through the cross-validated triplet error (see Equation 5)—indeed, we suggest that this may be a good idea for MLDS and NMDS, too. The chosen subset of triplets needs to be partitioned into training and validation sets. The embedding method finds a Euclidean embedding for the perceptual scales, given the training set of triplet as input. We then calculate the cross-validated triplet error on the validation set. This procedure is preferable to the triplet error that is evaluated on the very same set that is used to construct the embedding; the latter can be highly biased and typically underestimates the true triplet error (overfitting).

5.4 How to choose the embedding dimension?

We suggest to run the embedding algorithms in various dimensions, say from 1 to 10, and to finally choose the smallest dimension that shows an acceptable cross-validated triplet error. The formal problem is that increasing the dimension can always produce less triplet error—in the extreme case, it is always possible to embed items in a space of dimensions without any error. In some cases, it might also be possible to estimate the dimension of the data based on particular distance comparisons (Kleindessner and von Luxburg, 2015).

5.5 Which algorithm, which implementation?

Considering the results of the various algorithms on the many tasks, and our experience in running ordinal embedding algorithms for many years, we consider t-STE as our method of choice. The original implementation of the authors is available at implemented in MATLAB. We will also provide a general toolbox containing t-STE and MLDS in R upon the acceptance of the paper.

6 Discussion

In this paper, we introduced the ordinal embedding methods as a powerful approach to analyze the triplet comparisons gathered from the method of triads. As opposed to common belief, the ordinal embedding methods require a surprisingly small () subset of triplet comparisons to achieve acceptable results. This property makes them preferable to traditional NMDS, which needs the rank order of all pairwise distances. On the other hand, ordinal embedding methods are capable of embedding in multi-dimensional Euclidean spaces without restrictions on the scaling function. In these situations, they have an advantage over MLDS, whereas even in one-dimensional, monotonic scaling scenarios they are not much worse than MLDS. Hence ordinal embedding methods such as -STE are promising candidates for “default” psychophysical scaling algorithm.

6.1 Open issues

As almost always there are a few open issues regarding the use of ordinal embedding methods that we think need to be mentioned and/or addressed in the future.

Confidence intervals:

There have been considerable efforts to propose algorithms for the ordinal embedding problem. However, there exist no particular study which provides confidence intervals for the estimated embeddings. Although this issue is not taken very seriously in machine learning, for psychophysics this is an issue of high importance. Some first steps in this direction have been take in 

Lohaus et al. (2019), but there is definitely much room for improvement.

Interpreting the embedding: A challenging yet important step is to interpret the embedding results. To make the point clear, consider the Eidolon experiment discussed in the previous section. After gathering a two-dimensional perceptual space and a mapping of stimuli in this space, there are a couple of natural questions arising. What does each perceptual dimension mean? How are the perceptual dimensions related to the parameters of the stimulus (in this case reach, coherence and grain)? These are essential questions which can lead to better understanding of human perception.

Conjoint measurement: In addition to the general scaling problem, we believe that ordinal embedding is a promising candidate to tackle conjoint measurement problems. In a conjoint measurement experiment the sensory stimulus consists of more than one modality. Again we could ask participants to compare triplets of items, and subsequently apply the ordinal embedding. The approach of using triplet comparisons and ordinal embedding would make much less restrictions than many of the approaches in conjoint measurement, which often rely on independence or additivity assumptions on the modalities.

7 Acknowledgements

The authors would like to thank Robert Geirhos and Patricia Rubisch for the programming and running of the Eidolon experiment, Uli Wannek for the help with experimental setup, Guillermo Aguilar for providing the slant-from-texture dataset and fruitful discussions, and Silke Gramer for administrative support. This work has been supported by the German Research Foundation DFG via SFB 936/ Z3, the Institutional Strategy of the University of Tübingen (Deutsche Forschungsgemeinschaft, DFG, ZUK 63), and the DFG Cluster of Excellence “Machine Learning – New Perspectives for Science”, EXC 2064/1, project number 390727645. We also acknowledge support of the International Max Planck Research School for Intelligent Systems (IMPRS-IS).


  • Agarwal et al. (2007) Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D., and Belongie, S. Generalized non-metric multidimensional scaling. In

    International Conference on Artificial Intelligence and Statistics (AISTATS)

    , pages 11–18, 2007.
  • Aguilar et al. (2017) Aguilar, G., Wichmann, F. A., and Maertens, M. Comparing sensitivity estimates from MLDS and forced-choice methods in a slant-from-texture experiment. Journal of Vision, 17(1):37, 1–18, 2017.
  • Ailon (2011) Ailon, N. Active learning ranking from pairwise preferences with almost optimal query complexity. In Advances in Neural Information Processing Systems (NIPS), pages 810–818, 2011.
  • Amid and Ukkonen (2015) Amid, E. and Ukkonen, A. Multiview triplet embedding: Learning attributes in multiple maps. In International Conference on Machine Learning (ICML), pages 1472–1480, 2015.
  • Arias-Castro (2015) Arias-Castro, E. Some theory for ordinal embedding. Preprint available at Arxiv, abs/1501.02861, 2015.
  • Balcan et al. (2016) Balcan, M., Vitercik, E., and White, C. Learning combinatorial functions from pairwise comparisons. In Conference on Learning Theory (COLT), 2016.
  • Barsalou (2014) Barsalou, L. W. Cognitive psychology: An overview for cognitive scientists. Psychology Press, 2014.
  • Brainard (1997) Brainard, D. H. The psychophysics toolbox. Spatial Vision, 10:433–436, 1997.
  • Coombs et al. (1970) Coombs, C. H., Dawes, R. M., and Tversky, A. Mathematical Psychology. Prentice-Hall, New Jersey, 1970.
  • de Beeck et al. (2001) de Beeck, H. O., Wagemans, J., and Vogels, R.

    Inferotemporal neurons represent low-dimensional configurations of parameterized shapes.

    Nature neuroscience, 4(12):1244, 2001.
  • Demiralp et al. (2014) Demiralp, Ç., Bernstein, M. S., and Heer, J. Learning perceptual kernels for visualization design. IEEE transactions on visualization and computer graphics, 20(12):1933–1942, 2014.
  • Ekman (1954) Ekman, G. Dimensions of color vision. The Journal of Psychology, 38(2):467–474, 1954.
  • Fechner (1860) Fechner, G. T. Elemente der Psychophysik. Breitkopf und Härtel, Leipzig, 1860.
  • Gescheider (1988) Gescheider, G. A. Psychophysical scaling. Annual review of psychology, 39(1):169–200, 1988.
  • Haghiri et al. (2017) Haghiri, S., Ghoshdastidar, D., and von Luxburg, U. Comparison-based nearest neighbor search. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 851–859, 2017.
  • Houtsma (1995) Houtsma, A. J. M. Pitch perception. Hearing, 6:262, 1995.
  • Jain et al. (2016) Jain, L., Jamieson, K. G., and Nowak, R. Finite sample prediction and recovery bounds for ordinal embedding. In Advances in Neural Information Processing Systems (NIPS), pages 2711–2719, 2016.
  • Jamieson and Nowak (2011) Jamieson, K. G. and Nowak, R. D. Low-dimensional embedding using adaptively selected ordinal data. In Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1077–1084, 2011.
  • Kaneshiro et al. (2015) Kaneshiro, B., Perreau Guimaraes, M., Kim, H.-S., Norcia, A. M., and Suppes, P. A representational similarity analysis of the dynamics of object processing using single-trial eeg classification. PLoS One, 10(8):e0135697, 2015.
  • Kayaert et al. (2005) Kayaert, G., Biederman, I., Op de Beeck, H. P., and Vogels, R. Tuning for shape dimensions in macaque inferior temporal cortex. European Journal of Neuroscience, 22(1):212–224, 2005.
  • Kleindessner and von Luxburg (2014) Kleindessner, M. and von Luxburg, U. Uniqueness of ordinal embedding. In Conference on Learning Theory (COLT), pages 40–67, 2014.
  • Kleindessner and von Luxburg (2015) Kleindessner, M. and von Luxburg, U. Dimensionality estimation without distances. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 471–479, 2015.
  • Kleiner et al. (2007) Kleiner, M., Brainard, D., Pelli, D., Ingling, A., Murray, R., and Broussard, C. What’s new in Psychtoolbox-3. Perception, 36(14):1, 2007.
  • Knoblauch et al. (1998) Knoblauch, K., Charrier, C., Cherifi, H., Yang, J., and Maloney, L. Difference scaling of image quality in compression-degraded images. Perception ECVP abstract, 27, 1998.
  • Knoblauch and Maloney (2010) Knoblauch, K. and Maloney, L. T. MLDS: Maximum likelihood difference scaling in R. Journal of Statistical Software, 25:1–26, 2010.
  • Koenderink et al. (2017) Koenderink, J., Valsecchi, M., van Doorn, A., Wagemans, J., and Gegenfurtner, K. Eidolons: Novel stimuli for vision research. Journal of Vision, 17(2):7, 2017.
  • Krantz (1972) Krantz, D. H. Visual scaling. In Visual psychophysics, pages 660–689. Springer, 1972.
  • Krantz et al. (2007) Krantz, D. H., Luce, R. D., Suppes, P., and Tversky, A. Foundations of measurement: Geometrical, threshold, and probabilistic representations, volume 1. Courier Corporation, 2007.
  • Kruskal (1964a) Kruskal, J. B. Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29(2):115–129, 1964a.
  • Kruskal (1964b) Kruskal, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27, 1964b.
  • Li et al. (2016) Li, L., Malave, V. L., Song, A., and Yu, A. Extracting human face similarity judgments: Pairs or triplets? In CogSci, 2016.
  • Liberti et al. (2014) Liberti, L., Lavor, C., Maculan, N., and Mucherino, A. Euclidean distance geometry and applications. Siam Review, 56(1):3–69, 2014.
  • Lohaus et al. (2019) Lohaus, M., Hennig, P., and von Luxburg, U. Uncertainty estimates for ordinal embeddings. Preprint available at Arxiv, abs/1906.11655, 2019.
  • Luce and Edwards (1958) Luce, R. D. and Edwards, W. The derivation of subjective scales from just noticeable differences. Psychological review, 65(4):222, 1958.
  • Machado et al. (2015) Machado, J., Mata, M., and Lopes, A. Fractional state space analysis of economic systems. Entropy, 17(8):5402–5421, 2015.
  • Maloney and Yang (2003) Maloney, L. T. and Yang, J. N. Maximum likelihood difference scaling. Journal of Vision, 3(8):5, 2003.
  • Marks and Gescheider (2002) Marks, L. E. and Gescheider, G. A. Psychophysical scaling. In Stevens’ Handbook of Experimental Psychology, volume IV, Methodology in Experimental Psychology, chapter 3, pages 91–138. John Wiley and Sons, 2002.
  • Norris and Oliver (1898) Norris, W. F. and Oliver, C. A. System of Diseases of the Eye, volume 3. JB Lippincott, 1898.
  • Reed (1972) Reed, S. K. Pattern recognition and categorization. Cognitive psychology, 3(3):382–407, 1972.
  • Rosas et al. (2004) Rosas, P., Wichmann, F. A., and Wagemans, J. Some observations on the effects of slant and texture type on slant-from-texture. Vision Research, 44(13):1511–1535, 2004.
  • Rosas et al. (2005) Rosas, P., Ernst, M. O., Wagemans, J., and Wichmann, F. A. Texture and haptic cues in slant discrimination: Reliability-based cue weighting without statistically optimal cue combination. Journal of the Optical Society of America A, 22(5):801–809, 2005.
  • Rosas et al. (2007) Rosas, P., Wichmann, F. A., and Wagemans, J. Texture and object motion in slant discrimination: Failure of reliability-based weighting of cues may be evidence for strong fusion. Journal of Vision, 7(6:3):1–21, 2007.
  • Schultz and Joachims (2003) Schultz, M. and Joachims, T. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems (NIPS), pages 41–48, 2003.
  • Shepard (1962) Shepard, R. N. The analysis of proximities: multidimensional scaling with an unknown distance function. i. Psychometrika, 27(2):125–140, 1962.
  • Shepard (1981) Shepard, R. N. Psychological relations and psychophysical scales: On the status of “direct” psychophysical measurement. Journal of Mathematical Psychology, 24(1):21–57, 1981.
  • Shepard (1982) Shepard, R. N. Geometrical approximations to the structure of musical pitch. Psychological review, 89(4):305, 1982.
  • Smith and Ellsworth (1985) Smith, C. A. and Ellsworth, P. C. Patterns of cognitive appraisal in emotion. Journal of personality and social psychology, 48(4):813, 1985.
  • Stevens (1957) Stevens, S. S. On the psychophysical law. Psychological review, 64(3):153, 1957.
  • Stevens (1961) Stevens, S. S. To honor fechner and repeal his law. Science, 133(3446):80–86, 1961.
  • Tamuz et al. (2011) Tamuz, O., Liu, C., Belongie, S., Shamir, O., and Kalai, A. Adaptively learning the crowd kernel. In International Conference on Machine Learning (ICML), pages 673–680, 2011.
  • Terada and von Luxburg (2014) Terada, Y. and von Luxburg, U. Local ordinal embedding. In International Conference on Machine Learning (ICML), pages 847–855, 2014.
  • Thurstone (1927) Thurstone, L. L. A law of comparative judgment. Psychological review, 34(4):273, 1927.
  • Torgerson (1958) Torgerson, W. S. Theory and Methods of Scaling. John Wiley and Sons, New York, 1958.
  • Ukkonen et al. (2015) Ukkonen, A., Derakhshan, B., and Heikinheimo, H. Crowdsourced nonparametric density estimation using relative distances. In Conference on Human Computation and Crowdsourcing (HCOMP), 2015.
  • Van Der Maaten and Weinberger (2012) Van Der Maaten, L. and Weinberger, K. Stochastic triplet embedding. In International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6, 2012.
  • Wichmann and Jäkel (2018) Wichmann, F. A. and Jäkel, F. Methods in psychophysics. In The Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience., volume V. Methodology. Wiley, 4th edition, 2018.
  • Wichmann et al. (2017) Wichmann, F. A., Janssen, D. H. J., Geirhos, R., Aguilar, G., Schütt, H. H., Maertens, M., and Bethge, M. Methods and measurements to compare men against machines. Electronic Imaging, Human Vision and Electronic Imaging 2017, pages 36–45, 2017.

8 Supplementary Material

8.1 Extended simulation results

Monotonic scaling functions: Here we present extensive results of the simulations with the monotonic scaling functions. Figure 10 shows the extended results of the embeddings for a Sigmoid function. Besides the MSE and triplet error, we also show the standard deviation of these two criteria. In order to keep the plots neat and orderly, we reported the standard deviations in separate plots instead of using error-bars. We again see that MLDS works slightly better than other methods, as the scaling function meets the assumptions of MLDS. We also report the results for the same experiment with a different scaling function, see Figure 11. The function is a conditional degree 3 polynomial. Again, the MLDS outperforms all other embeddings.

Non-monotonic scaling functions: Similar to the monotonic functions, we compare the performance of the embedding methods on two non-monotonic function. Figure 13 shows the extra results for a degree two polynomial scaling function. The MSE and triplet error both show poor performance of MLDS in comparison with the embedding methods. The second scaling function is a Sinusoid. The results of comparison are shown in Figure 12. The embedding methods work clearly better than the MLDS, however the difference is not as big as the previous function. The reason might be that the Sinusoid is closer to a monotonic function than the second degree polynomial.

Figure 10: The comparison of various ordinal embedding methods (LOE, STE, t-STE) against the traditional methods in psychophysics (MLDS and NMDS) for a monotonic one-dimensional perceptual function. The true perceptual function () is appeared at the top left corner. Ten embedding results () for a fixed value of noise standard deviation () and triplet fraction () is shown on the top right corner. The second and third row depict the average MSE and the standard deviation of MSE for 10 runs of the algorithms. The fourth and fifth row show the average triplet error and the standard deviation of triplet error for 10 runs of the algorithms.
Figure 11: The comparison of various ordinal embedding methods (LOE, STE, t-STE) against the traditional methods in psychophysics (MLDS and NMDS) for a monotonic one-dimensional perceptual function. The true perceptual function () is appeared at the top left corner. Ten embedding results () for a fixed value of noise standard deviation () and triplet fraction () is shown on the top right corner. The second and third row depict the average MSE and the standard deviation of MSE for 10 runs of the algorithms. The fourth and fifth row show the average triplet error and the standard deviation of triplet error for 10 runs of the algorithms.
Figure 12: The comparison of various ordinal embedding methods (LOE, STE, t-STE) against the traditional methods in psychophysics (MLDS and NMDS) for a monotonic one-dimensional perceptual function. The true perceptual function () is appeared at the top left corner. Ten embedding results () for a fixed value of noise standard deviation () and triplet fraction () is shown on the top right corner. The second and third row depict the average MSE and the standard deviation of MSE for 10 runs of the algorithms. The fourth and fifth row show the average triplet error and the standard deviation of triplet error for 10 runs of the algorithms.
Figure 13: The comparison of various ordinal embedding methods (LOE, STE, t-STE) against the traditional methods in psychophysics (MLDS and NMDS) for a monotonic one-dimensional perceptual function. The true perceptual function () is appeared at the top left corner. Ten embedding results () for a fixed value of noise standard deviation () and triplet fraction () is shown on the top right corner. The second and third row depict the average MSE and the standard deviation of MSE for 10 runs of the algorithms. The fourth and fifth row show the average triplet error and the standard deviation of triplet error for 10 runs of the algorithms.
Figure 14: The comparison of various ordinal embedding methods (LOE, STE, t-STE) against the traditional methods in psychophysics (MLDS and NMDS) for a two-dimensional perceptual function. The true perceptual function () is appeared at the top left corner. one of embedding results () for a fixed value of noise standard deviation () and triplet fraction () is shown on the top right corner. The second row shows the average triplet error over 10 repetitions of the algorithms.