For human motor performance modeling, researchers have sought to develop new models and modify existing models to improve their prediction accuracy (i.e., model fitness). The model that we focus on here, the Finger-Fitts law (a.k.a., FFitts law) proposed by Bi et al. (Bi et al., 2013), is a modified version of Fitts’ law (Fitts, 1954) for predicting operational times in target pointing on touchscreens. FFitts law is based on the effective width method (Crossman, 1956), which adjusts the target size (or width ) from the nominal value drawn on the screen to an effective width that takes the actual touch-point distributions into account. Bi et al. modified this effective width method to deal with finger touch ambiguity and empirically showed that FFitts law is superior to Fitts’ law in terms of model fitness (Bi et al., 2013). As touchscreen devices have become common in our daily life, deriving a model with a high prediction accuracy will contribute directly to human-computer interaction (HCI), e.g., when designers create user interfaces for webpages and apps.
As another type of contribution in performance modeling, standardizing a model’s methodology is important for future researchers in terms of the replicability (MacKenzie, 1992; Soukoreff and MacKenzie, 2004; Wobbrock et al., 2011). Unfortunately, while several research groups have examined FFitts law (Bi et al., 2013; Ko et al., 2020; van Noort, 2015; Yamanaka, 2018b, a; Woodward et al., 2020)
, there is no consensus on a standard methodology, which is an obstacle to future research on finger-touch pointing. The methodology inconsistencies include the computation method for touch ambiguity (running an independent finger calibration task, computing the intercept for a linear regression of the target sizes and touch-point distributions, or performing parameter optimization from the regression expression of Fitts’ law); the instruction for the finger calibration task (balancing the speed and accuracy or concentrating on the accuracy); and the target size for the calibration task (a 1-pixel target or the smallest size used in the main Fitts’ law task). There are two other issues of inconsistency: in contrast to Bi et al.’s finding(Bi et al., 2013), the model fitness of FFitts law was also found to be inferior to that of Fitts’ law (Woodward et al., 2020); and FFitts law sometimes cannot be used because of a mathematical error that results when the value inside a square root is negative (Yamanaka, 2018a, b).
Leaving these inconsistent methodologies and issues unsolved could be harmful to HCI studies such as evaluating novel pointing techniques and comparing different user groups. This point was previously mentioned by Soukoreff and MacKenzie in regard to Fitts’ law (Soukoreff and MacKenzie, 2004), and rethinking the finger-touch model (FFitts law) is a timely notion given the recent trend of widespread smartphone and tablet use. In this paper, taking a step toward a standard for measuring touch-pointing performance, we explain the concept of FFitts law, survey the inconsistencies of its methodologies in the literature, and discuss the related problems. Then, we empirically examine how the inconsistent methodologies change the results of FFitts law for both 1D and 2D target-pointing tasks, through eight sub-tasks in total. Our contributions are twofold.
Surveying related work on FFitts law to explore inconsistencies in its methodologies (Section 3), and reanalyzing previous FFitts law studies with modern methods (Section 7). These sections emphasize better understanding of the principle of FFitts law, the reasons why previous researchers have run different procedures for a single model, and the relative advantages and disadvantages of each approach. For example, parameter optimization can always be applied (i.e., to avoid the error due to a negative value inside the square root), but it induces the risk of overfitting the data.
Conducting eight sub-tasks in total, including two main Fitts’ law tasks with 1D and 2D targets. The results show that (1) using the baseline Fitts’ law model is the best and safest choice for predicting the under a single task condition; and (2) for comparing different input devices or user groups, using parameter optimization is the best choice for 2D circular targets, while using the conventional effective width method is the best choice for 1D targets.
In conclusion, by integrating our results and reanalyses of previous studies’ data, we found that there is no single best methodology for FFitts law. We stress the importance of conducting more user experiments under different task conditions, such as different input devices or user groups, to achieve a better understanding of touch-pointing behaviors and to derive more accurate models.
2. Related Work
2.1. Fitts’ Law and the Effective Width Method
According to Fitts’ law, the movement time to point to a target is linearly related to the index of difficulty, (Fitts, 1954):
where and are empirical constants. In the HCI field, the Shannon formulation is widely used for the value (MacKenzie, 1992):
where is the distance to the target and is its width. Here, and are nominal values shown on the display.
In typical pointing experiments, participants are instructed to “point to a target as rapidly and accurately as possible,” which emphasizes balancing the speed and accuracy (Soukoreff and MacKenzie, 2004). However, it is common that some participants tend to show short values and high error rates, while others show long values and low error rates (Zhai et al., 2004). To normalize such biases in comparing those participants’ performance, or to compare the performance with different input devices (e.g., finger vs. stylus), using Crossman’s post-hoc correction for calculating the effective target width (Crossman, 1956) is recommended (MacKenzie, 1992; Soukoreff and MacKenzie, 2004; Wobbrock et al., 2011):
is the standard deviation
of the observed endpoints. The basis of this adjustment is that the spread of hits then follows a normal distribution. By using this method, theis adjusted so that 96 of hits fall inside the target. The effective using the is defined as follows:
While the theoretical justification of the effective width method has been questioned recently (Gori et al., 2018), several advantages have been shown empirically (e.g., (MacKenzie and Isokoski, 2008; Wright and Lee, 2013; Zhai et al., 2004)). FFitts law is also based on this effective width method.
2.2. Overview of Finger-Fitts Law
Bi, Li, and Zhai hypothesized that the observed spread of hits () includes the relative and absolute components; the former component follows the speed-accuracy tradeoff rule, while the latter one solely depends on the finger touch precision (Bi et al., 2013)
. The tapped point is considered a random variablefollowing a normal distribution (). Then, is the sum of two independent random variables for the relative and absolute components, both of which follow normal distributions: and
, respectively. Bi et al. named this “the dual Gaussian distribution hypothesis.” Although the relative spread of hits,, decreases as the movement speed and target width decrease, the absolute finger precision cannot be controlled via a user’s speed-accuracy priority. The means of both components ( and ) are assumed to be close to the target center: .
Here, is what the effective width method models. Thus, from Equation 3, we have
Then, because Bi et al. assumed that is the sum of the independent random variables and , we obtain
By applying Equation 7 to in the Fitts’ law expression (Shannon formulation), we obtain the for finger touching:
2.3. Measurement of the Touch Ambiguity Factor
2.3.1. Finger Calibration Task with “Rapid and Accurate” Instruction
Bi et al. also obtained via 1D and 2D finger calibration tasks conducted independently from the Fitts’ law task (Bi et al., 2013). In the 1D task, participants repeatedly tapped as closely to a 2.4-mm-high horizontal bar target as possible, and the of the signed biases from the target was computed as . For the 2D condition, a 2.4-mm-diameter circle was used as the target, and the bivariate was taken as . In both tasks, the participants were instructed to tap the target as rapidly and accurately as possible. After the screen was tapped, the next target appeared after a 1-sec break. Because this task does not require a movement to a target from a specific position, Bi et al. stated that the speed-accuracy tradeoff rule has a negligible effect on .
Woodward et al. conducted FFitts law tasks with children and circular targets (Woodward et al., 2020). Overall, they followed the procedure of Bi et al. For the calibration task, they used a target with mm , which was also the smallest size for the main Fitts’ law task. Their paper does not explicitly state whether the participants were instructed to balance speed and accuracy or to concentrate on accuracy.
2.3.2. Finger Calibration Task with “Concentrate on Accuracy” Instruction
Luo and Vogel tested the applicability of FFitts law to touch-based goal-crossing tasks (Luo and Vogel, 2014). They drew a 2-pixel line for the finger calibration task and instructed the participants “not to rush and focus on accuracy,” because “measuring is not about speed.” Hence, in contrast to Bi et al.’s instruction, Luo and Vogel removed the instruction of “operating as rapidly as possible.” They reported somewhat negative results: the data fit for the discrete crossing condition decreased from (conventional Fitts’ law) to (FFitts law). After removing the data point with the highest , the FFitts law fitness improved to , but this was likely due to an arbitrary choice of data-point removal to increase .
Yamanaka tested Fitts’ and FFitts laws for touch-pointing tasks with unwanted target items (called distractors) (Yamanaka, 2018a, b). In the finger calibration tasks, 1-pixel targets were used (a bar for 1D and a crosshair for 2D). As in Luo and Vogel’s study, Yamanaka instructed the participants to “tap as close to the target as possible” and emphasized that the “participants were instructed to concentrate on spatial precision and not on time.” He reported that FFitts law could not be used, because in some task conditions, in particular for the smallest target ( mm), the values were smaller than , in both the 1D and 2D tasks, resulting in a negative value inside the square root in FFitts law (Equation 7). This was partially because the participants in Yamanaka’s studies had to pay attention to avoid the distractors, in addition to accurately aiming for the target, but this mathematical error occurred even in no-distractor conditions.
2.3.3. Intercept of Regression Between the Squares of and
In Bi and Zhai’s 2D touch-pointing task, at the beginning of each trial, a circular target appeared on the screen, and the participants tapped it as rapidly and accurately as possible (Bi and Zhai, 2013). Bi and Zhai assumed that the endpoints when using a fine probe like a mouse cursor are proportionally related to (i.e., ), thus giving
Figure 1 shows this relationship. They used five circular target diameters ( = 2, 4, 6, 8, and 10 mm), and their regression expression for versus the corresponding values on the (e.g.) y-axes gave . From this, was computed as mm.
2.3.4. Parameter Optimization
Recently, Ko et al. proposed to obtain the finger tremor factor used in FFitts law by parameter optimization (Ko et al., 2020)111Ko et al.’s model is for rectangular targets whose width and height are defined as and , respectively. We invoke their method instead for circular targets whose size is solely defined by .:
where is a free parameter that represents finger tremors (). They empirically confirmed that using the nominal instead of gave a higher model fitness, which is consistent with previous studies on the effective width method (e.g., (Wright and Lee, 2013; Zhai et al., 2004)).
In fact, Equation 11 was proposed by Welford in 1968 (p. 156, l. 30 in (Welford, 1968)) with the “+0.5” version of Fitts’ law instead of “+1”. His aim was the same: represents hand tremors in stylus-tapping tasks. Also, he empirically confirmed that the following “no square root, no power” formulation was even better in terms of model fitness:
Because the and use standard deviations, it would be more mathematically sound to subtract after squaring them and then taking the square root; however, the dimensions of and are the same (both in mm), and Equation 12 is thus valid. This model’s superiority with respect to the baseline (Equation 2) for small targets was confirmed by Chapuis and Dragicevic (Chapuis and Dragicevic, 2011).
3. Discussion on Inconsistencies and Problems of FFitts Law
3.1. Target Size in Calibration Task
There are two kinds of approaches: using the smallest used in a Fitts’ law task (2.4-mm (Bi et al., 2013) or 4.8-mm target (Woodward et al., 2020)) or the minimum visible target (1 pixel (Yamanaka, 2018a, b) or 2 pixels (Luo and Vogel, 2014)). In pointing tasks with a fine probe, is assumed to be proportional to when users can spend sufficient time. In this case, users can accurately point to a small target even if the width is quite narrow: e.g., pixel. In contrast, in touch-pointing tasks, there is an unavoidable lower bound on the finger precision . Hence, even if users can spend a long time, there is a slight bias from the intended target position to the actual tapped position sensed by the system (Bi et al., 2013; Holz and Baudisch, 2011). The aim of a finger calibration task is to measure this lower bound of precision in the Fitts’ law paradigm. For this purpose, pointing to a 1-pixel target with the instruction to operate as rapidly and accurately as possible is a straightforward method.
There is, however, an issue related to using the smallest in the main Fitts’ law task. The issue is that we may observe a mathematical error in the square root in Equation 7 (). For example, Woodward et al used a target with mm for the calibration, and was 1.590148 mm (Woodward et al., 2020). The smallest measured in the main Fitts’ law task was 1.591275 mm for the mm condition; the difference was only 0.001127 mm. Because and are variability values (i.e., standard deviations), it is possible to observe greater than by chance. According to Equation 10 (), using a -pixel target should reduce this risk as compared with using mm.
For this reason, using a 1-pixel target is a more theoretically sound approach. This solution was noticed by Bi, Li and Zhai (“Alternatively, single pixel wide lines and cross hairs could be used in lieu of bars and circles.”) (Bi et al., 2013) (p. 1366), but it is not an exact “alternate” for a -mm target according to Equation 10. More rigorously, should not be affected by the speed-accuracy rule, so must be the value for the (either mm or pixels) condition. Practically, however, the finest target must be visible to users; thus, pixel is a reasonable approximation of pixel.
3.2. Instruction in Calibration Task
There have been two instruction choices: balancing the speed and accuracy (Bi et al., 2013) or concentrating on accuracy (Luo and Vogel, 2014; Yamanaka, 2018a, b). We assume that both instructions are valid for measuring . For the “rapid and accurate” instruction by Bi et al. in Fitts’ law tasks, as the becomes smaller, participants have to be more careful to avoid missing the target, which causes them to spend a longer time. Therefore, even if the participants were instructed to tap the target “as rapidly (and accurately) as possible,” the operational time for a 1-pixel (or smallest-) target would be quite long, and the difference from the instruction to “concentrate on accuracy” becomes almost negligible. Still, the effect of this instruction difference on FFitts law fitness has been neither discussed nor empirically compared. Hence, we empirically assess this difference in our data analyses.
3.3. Computation of : Calibration Task, Intercept of Regression, or Parameter Optimization
As discussed in Section 3.1, the use of a finger calibration task induces a risk of a mathematical error in the square-root calculation. Another method to obtain is to use the intercept of the regression expression for vs. (Equation 10, ). Bi and Zhai (Bi and Zhai, 2013) and Yamanaka and Usuba (Yamanaka and Usuba, 2020) conducted target-pointing tasks in which a new target appeared at a random position (i.e., was not controlled by the researchers), with several values; they then obtained regression expressions. Yamanaka and Usuba also ran regressions for Fitts’ law tasks in which four values were preset to use this method.
If we apply computed by this intercept method to FFitts law, it is possible to obtain greater than , which causes the mathematical error. Figure 2 illustrates this problem with hypothetical data observed in a Fitts’ law task. In this case, for the values computed from both the random- and preset- conditions, several values at the lowest condition in the main Fitts’ law task (Figure 2b) are smaller than the intercept.
To avoid this issue, a possible choice is to use large target width values for the main Fitts’ law task. For example, if we had not used the narrowest condition in Figure 2b, all the values would be greater than . Using only wide values also lowers the risk of the mathematical error in using the measured by a finger calibration task. Yet, this approach has a clear limitation: it prevents researchers from using a small target, and the threshold for the smallest target to avoid the error is unclear and would vary among participant groups. In addition, the effectiveness of FFitts law is for small targets; when targets are large, FFitts law approximates the original effective width method without using (Bi et al., 2013).
The state-of-the-art method to obtain the finger tremor factor is parameter optimization (Ko et al., 2020). The method’s drawback is that it uses an additional free parameter , which is adjusted to maximize for the regression of vs. . Depending on the measured values and the number of task conditions, introducing additional free parameters could lead to overfitting. In contrast, using a value computed from a calibration task or the intercept method has no such problem, because is then independent of the values measured in a Fitts’ law task.
Regarding the model fitness in terms of , using parameter optimization would theoretically give the best fit among the candidates. Also, it does not require an independent finger calibration task and is thus less time-consuming for researchers and participants. However, if other model-fit metrics that consider the model complexity show a worse result due to the free parameter , then using instead of is recommended. To assess this issue, we also compare the model fitness by using the Akaike Information Criterion and Bayesian Information Criterion in our data analyses.
We conducted touch-pointing experiments with a smartphone, as shown in Figure 3a. The experiments were conducted on two separate days: Day 1 for 1D horizontal bar-shaped targets, and Day 2 for 2D circular targets. The procedures for the two days were the same. Under both the 1D and 2D conditions, we conducted four sub-tasks. The main one was a Fitts’ law task with conditions, and the remaining three sub-tasks were used to compute values: one was for the intercept-based method with five values and random values, and the other two were for finger calibration tasks. The order of the four sub-tasks was balanced using a Latin square pattern among 12 participants for both days. Each participant took 40 to 50 min for the experiment on each day.
For both the 1D and 2D conditions, our data computed by the intercept method for Fitts’ law and the random- tasks was reported before (Yamanaka and Usuba, 2020). The data for the two finger calibration tasks is newly reported here. Because our novel contribution in this paper is the evaluation of the model fitness for , we repeat the minimum necessary explanation of the experiments (e.g., the mean error rate) to make this paper self-contained, while taking care to avoid plagiarism. For example, we could have reported all the pairwise test results for the error rate, but that data would not relate to this paper’s main contribution. Thus, we mainly report the and results, and readers who are interested in the detailed error rate results and error-rate prediction models are directed to (Yamanaka and Usuba, 2020).
4.1.1. Finger Calibration Task with “Rapid and Accurate” Instruction
The participants were instructed to tap as rapidly and accurately as possible on a 1-pixel horizontal bar target or a 25-pixel-wide crosshair target in the 1D or 2D conditions, respectively. For the 2D condition, we emphasized that the intersection of the crosshair was the target to aim for. A 1-sec break was enforced before the next target appeared so that the participants did not have to aim for the targets one after another extremely rapidly. Each participant repeated this procedure 50 times, which entailed five practice trials followed by 45 data-collection trials. The signed biases of the tap point from the target were used to compute the (i.e., ) on the y-axis for the 1D case and the bivariate on the x- and y-axes for the 2D case.
4.1.2. Finger Calibration Task with “Concentrate on Accuracy” Instruction
For this sub-task, only the instruction was different from the previously explained sub-task. That is, the participants were instructed to tap as closely as possible to the target without paying attention to the operational time.
4.1.3. Fitts’ Law Task
This was a discrete pointing task with preset and values. For the 1D task, a 6-mm-wide blue start bar was displayed at the top of the screen, and a green target bar was at the bottom, as shown in Figure 3b. The midway point between the start bar and the target was at the screen center. The movement direction was always downwards. When participants tapped the start bar, it disappeared and a click sound played. Then, if they successfully tapped the target, a pleasant bell played, and then the next set of start and target bars appeared. If the tap point fell outside the target, they had to aim for the target again until they succeeded; the trial was not restarted from tapping the start bar. The participants were instructed to tap the target as rapidly and accurately as possible. For the 2D task, circles were used instead of horizontal bars, and the start and target circles’ positions were randomized while keeping a distance between them.
This sub-task used a within-subjects design with the following independent variables and levels. We included four target distances ( = 20, 30, 45, and 60 mm) and five target widths ( = 2, 4, 6, 8, 10 mm). Each combination entailed a single repetition of practice trials followed by 16 repetitions. The order of the conditions was randomized. Thus, we recorded data points in total. The dependent variables were the , the standard deviation of the endpoints (), and the error rate.
4.1.4. Pointing Task with Random Target Distance
For the 1D case, a 6-mm-high start bar was initially displayed at a random position. When the participants tapped it, the first target bar appeared at a random position, and then they successively tapped new targets. If a target was missed, a beep sounded, and the participants re-aimed for the target. A successful tap resulted in a bell sound. To reduce the negative effect of the screen edges, the random target position was at least 11 mm away from the top and bottom edges (Avrahami, 2015). For the 2D case, circular targets were used.
This sub-task used a single-factor, within-subjects design with an independent variable of : 2, 4, 6, 8, and 10 mm. The dependent variable was the observed touch-point distribution, . First, the participants performed 20 trials as practice, which included 4 repetitions of the 5 values appearing in random order. In each session, the values appeared 10 times in a random order. The participants were instructed to successively tap the target as rapidly and accurately as possible in a session. They each completed four sessions as data-collection trials. In total, we recorded trials.
On Day 1, 12 university students participated in this study (2 female, 10 male; 20 to 25 years, , ). On Day 2, 12 university students again participated (3 female, 9 male; 19 to 25 years, , ), with nine new participants. For both days, all the participants had normal or corrected-to-normal vision. All were right-handed and were daily smartphone users. Each participant received JPY 5000 (US$ 45) in compensation for one day. The participants were instructed to hold the smartphone in their non-dominant (left) hand and perform tapping operations with their dominant (right) index finger, as shown in Figure 3a. They were instructed to sit on an office chair and not to rest their hands or elbows on the table or their lap.
5. Results of 1D Experiment
As in previous studies, data points for which the distance between the tap point and the target center was greater than 15 mm were removed as outliers before we analyzed the, , , and error rate (Bi and Zhai, 2013; Yamanaka and Usuba, 2020).
5.1. Finger Calibration Task with “Rapid and Accurate” Instruction
Among the 540 trials (45 repetitions 12 participants), we observed no outliers. Two participants’ data did not pass the normality test (Shapiro-Wilk test with alpha ). The of the tap positions (i.e., ) for each participant ranged from 0.5448 to 1.325 mm, and the mean was 0.8837 mm.
5.2. Finger Calibration Task with “Concentrate on Accuracy” Instruction
We again observed no outliers, while two participants’ data did not pass the normality test. The values ranged from 0.4569 to 1.296 mm among the participants, and the mean was 0.7362 mm.
5.3. Fitts’ Law Task
Among the 3840 trials, four data points were removed as outliers (0.10%). According to the experimenter’s observation, the outliers resulted mainly from participants accidentally touching the screen with the thumb or little finger. Two or more taps were observed in 347 trials, and the mean error rate was thus . We found that 218 of the 240 conditions () passed the normality test, or 90.8%.
Throughout this paper, we use repeated-measures ANOVA with Bonferroni’s -value adjustment method for pairwise comparisons. Although our results showed that the dependent variables did not pass the Shapiro-Wilk test (alpha = 0.05) in some cases, it is known that ANOVA is robust against violations of the normality test assumptions (Dixon, 2008; Mena et al., 2017); thus, we consistently use repeated-measures ANOVA. For the
statistic, the degrees of freedom for the main effects ofand , as well as their interactions, were corrected using the Greenhouse-Geisser method when Mauchly’s sphericity assumption was violated (alpha = 0.05).
For the endpoint variability , we found significant main effects of (, , ) and (, , ), but no significant interaction of (, , ). For the , we found significant main effects of (, , ) and (, , ), and the interaction of was significant (, , ).
Figure 4a shows the result of vs. regression. The value was mm. The regression line clearly passes above the four data points at the smallest value: the intercept was thus greater than some , causing the mathematical error in FFitts law.
5.4. Pointing Task with Random Target Distance
We removed 13 outlier trials (0.54%). The Shapiro-Wilk test showed that the touch points followed a normal distribution under 47 of the 60 conditions (), or 78.3%. The value of had a significant main effect on (, , ). Figure 4b shows the regression result. The value of was mm, and some values in the Fitts’ law task (Figure 4a) were smaller than 1.0123 mm, which caused the mathematical error when we applied the measured with this intercept method to Fitts’ law data.
5.5. Model Fitting Results for 1D Task
Before analyzing the FFitts law fitness, we found that we could only use the parameter optimization method. As listed in Table 1, among the data points for fitting, any method using had one or more mathematical errors (due to a negative value inside the square root in ). This result shows the low robustness of FFitts law when using , regardless of whether the value is directly measured by a finger calibration task or calculated by the intercept method.
For model fitness comparison, we use both the absolute and adjusted . The latter balances the number of coefficients. We also compare models through the Akaike Information Criterion () (Akaike, 1974). This statistical method balances the number of free parameters and the fitness to identify a comparatively best model. As a brief guideline, (a) a model with a lower value is a better one; (b) a model with (
) is probably comparable with better models; and (c) a model with() should be rejected. We also use the Bayesian Information Criterion () (Kass and Raftery, 1995) for comparison. For this metric, differences of 0–2 are not significant, of 2–6 are positive, and of 6–10 are strong; differences greater than 10 are very strong (Kass and Raftery, 1995). Hence, a model with a higher and adjusted is better, while one with a lower and is also better. The penalizes using additional free parameters the least, while the penalizes it the most.
Table 2 lists the model fitness results. Overall, the baseline model of Fitts’ law showed the best model fitness in terms of the adjusted , , and . While Model #6 showed the highest , it was due to the additional free parameter; thus, the adjusted was slightly lower than that of Model #1. According to the , Models #5 and #6 should not be rejected as worse models than Model #1. For the , however, Models #5 and #6 are worse than #1 (“positive”). Lastly, the models using (#2, #3, and #4) are significantly worse than the other models and can be safely rejected. These lower fits are also shown in Figure 5. As a result, we empirically confirmed that the baseline model is the best, and we did not find any advantage of introducing another free parameter or using .
|#3 Param. Opt. (, no sqrt)||0.9133||0.9031||189.2||192.2||119.8||101.6||0.5067|
|#4 Param. Opt. (, sqrt)||0.9141||0.9040||189.1||192.0||121.6||103.0||1.512|
|#5 Param. Opt. (, no sqrt)||0.9814||0.9792||158.5||161.5||134.9||88.58||0.08178|
|#6 Param. Opt. (, sqrt)||0.9815||0.9793||158.3||161.3||136.4||88.33||0.5806|
6. Results of 2D Experiment
6.1. Finger Calibration Task with “Rapid and Accurate” Instruction
Again, data points for which the distance between the tap point and the target center was longer than 15 mm were removed as outliers. Among the 540 trials for this sub-task, we observed no outliers. Two participants’ data did not pass the normality test. The of the tap positions (i.e., ) for each participant ranged from 0.8717 to 2.148 mm, and the mean was 1.372 mm.
6.2. Finger Calibration Task with “Concentrate on Accuracy” Instruction
We again observed no outliers, while three participants’ data did not pass the normality test. The values ranged from 0.7107 to 1.752 mm among the participants, and the mean was 1.163 mm.
6.3. Fitts’ Law Task
Among the 3840 trials, nine outlier trials were removed (0.23%). The mean error rate was 17.91%. Under 184 (76.7%) conditions, the touch points followed a bivariate normal distribution.
For the tap point distribution , we found a significant main effect of (, , ), but not of (, , ). The interaction of was also not significant (, , ). For the , we found significant main effects of (, , ) and (, , ), and the interaction of was significant (, , ).
Figure 6a shows the result of vs. regression. The value was mm. The regression line clearly passes above the four data points at the smallest value, causing the mathematical error in FFitts law.
6.4. Pointing Task with Random Target Distance
We removed 33 outlier trials (1.375%). Under 41 (68.3%) conditions, the touch points followed a bivariate normal distribution. The value of had a significant main effect on (, , ). Figure 6b shows the regression result. The value of was mm, and this was greater than some values in the Fitts’ law task (Figure 6a), which caused the mathematical error.
6.5. Model Fitting Results for 2D Task
In contrast to the results for the 1D task, we can use the value obtained from the finger calibration task with the “concentrate on accuracy” instruction. As listed in Table 3, the data points for fitting had no negative values inside the square root in FFitts law. Thus, in Table 4, we add Model #7, which is the original FFitts law model. The fitting results are also shown in Figure 7.
Overall, for Models #1 to #6, the results were similar to those for the 1D tasks. The baseline model of Fitts’ law was the best in terms of the adjusted , , and values. According to the , Models #5 and #6 showed similar model fitness to that of Model #1, while in terms of the , Models #5 and #6 are worse than #1 (“positive”). Moreover, the models using (#2, #3, and #4) are significantly worse. Regarding Model #7, which also uses the factor (), it showed significantly worse fits than those of Models #5 and #6 but an improved fit compared with the original effective width method (#2). The conclusion obtained from the 2D tasks is equivalent to that obtained from the 1D tasks: we empirically confirmed that the baseline model is the best, and we did not find any advantage of introducing another free parameter or using .
|#3 Param. Opt. (, no sqrt)||0.9400||0.9330||185.7||188.7||-1.399||108.0||4.026|
|#4 Param. Opt. (, sqrt)||0.9341||0.9263||187.6||190.6||35.29||119.1||4.825|
|#5 Param. Opt. (, no sqrt)||0.9905||0.9893||149.0||151.9||110.5||99.08||0.02535|
|#6 Param. Opt. (, sqrt)||0.9905||0.9893||148.9||151.9||110.7||99.12||0.2850|
|#7 Calib. (Acc) (given )||0.9340||0.9303||185.6||187.6||33.14||120.1|
7. Reanalyses of Previous Studies
|Woodward et al. (Woodward et al., 2020)||Bi et al. (Bi et al., 2013), 1D||Bi et al. (Bi et al., 2013), 2D|
|#3 Param. Opt. (, no sqrt)||0.215||0.122||69.0||68.4||0.967||0.963||46.5||45.9||0.981||0.979||41.7||41.1|
|#4 Param. Opt. (, sqrt)||0.214||0.122||69.0||68.4||0.960||0.955||47.6||47.0||0.978||0.975||42.7||42.1|
|#5 Param. Opt. (, no sqrt)||0.975||0.972||48.5||47.8||0.956||0.95||48.2||47.6||0.849||0.831||54.2||53.6|
|#6 Param. Opt. (, sqrt)||0.971||0.967||49.3||48.7||0.956||0.950||48.2||47.6||0.849||0.831||54.2||53.6|
|#7 Calib (R&A) (given )||0.213||0.169||67.1||66.6||0.958||0.955||45.9||45.5||0.968||0.966||42.9||42.5|
Here, we reanalyze three sets of data reported in previous studies: Woodward et al.’s study using circular targets (Woodward et al., 2020) and Bi et al.’s 1D and 2D targets (Bi et al., 2013). The results are summarized in Table 5. We also examined using obtained by the intercept method, but the mathematical error occurred for all the three data sets. Thus, we report the fits for Models #1 to #7 used in our 2D data analysis. Note that Model #7 is an exception: our was obtained from the finger calibration task with the “concentrate on accuracy” instruction, while Bi et al. used the “rapid and accurate” instruction and Woodward et al.’s instruction was unclear from their paper.
For Woodward et al.’s data, the model fitness for the baseline (Model #1) was higher than the (Model #2), and Model #3 using partially improved the fit (adjusted increased from 0.0253 to 0.122); these results are consistent with ours. In contrast to our results, however, according to the and , the model fitness was slightly degraded by using Models #2 and #3. In addition, when we used the nominal and (Models #5 and #6), the fitness was improved over the baseline. Lastly, use of the given value (Model #7) gave the best fit among the -based candidates (Models #2, #3, #4, and #7). Still, the and differences were both less than 2 and thus insignificant; this point was not discussed by Woodward et al. (Woodward et al., 2020).
For Bi et al.’s 1D task results, Model #7 was the best for the and , which is a unique outcome among all the analyses in this paper. The second best model using was #3, whose and differences from #7 were trivial. For Models #5 and #6, the value was determined as 0 to maximize the ; thus, while the values were the same as for Model #1, the adjusted , , and were worse than the baseline because of the additional free parameter. For Bi et al.’s 2D task results, among all models, the best fit was shown by Model #3 and was significantly better than the baseline (Model #1). Among all the data analyses in this paper, only this case showed that a model using gave a better fit than those using the nominal .
Through these reanalyses, we found the benefit of introducing an additional free parameter regardless of whether we used the nominal (to analyze a single task condition) or (to compare different conditions). However, this conclusion did not always hold. For Woodward et al.’ data, using for slightly degraded the fitness (Model #2 was a bit better than #3 and #4 according to the and ), and for Bi et al.’s 1D data, using gave the best and results.
8. General Discussion
8.1. Consistency of Results from Previous FFitts Law Studies
The first inconsistency with the previous studies is that we sometimes could not use FFitts law with because of the mathematical error. For the 1D task, we could not use it for any derivations (finger calibrations with two different instructions; square root of the intercept in the regression of vs. for the Fitts’ law task and random- task; see Table 1). For the 2D task, only the computed from the finger calibration task with the “concentrate on accuracy” instruction could be used (Table 3). This clearly shows a limitation of the conventional FFitts law: because it depends on both the and values, we cannot often use this methodology.
Even when we applied FFitts law with to the 2D results, the model fitness was significantly degraded as compared with the baseline model (Models #7 vs. #1 in Table 4). This is inconsistent with the finding of Bi et al. (Bi et al., 2013) but consistent with that of Woodward et al. on FFitts law for children whose ages ranged from 5 to 10 years (Woodward et al., 2020). While Woodward et al. assumed that the reason for this was the children’s motor development (e.g., not precisely following the known speed-accuracy tradeoff behavior), we observed that their finding on lower model fitness also held for adults in their twenties.
As for introducing an additional free parameter , we found a limited benefit. For the 2D task, while Model #2 () showed adjusted , using improved the fitness: Models #3 and #4 showed adjusted , and the and differences were significant (Table 4). Because comparing different user groups or devices requires using the effective width method to normalize the speed-accuracy biases, this additional parameter for finger tremor helps to improve the prediction accuracy. For the 1D condition, however, we observed the opposite finding. The adjusted of Model #2 () was 0.91, and those of Models #3 and #4 were slightly lower (Table 2). According to the result of the comparison, the use of Model #2 is positively supported, while there were no significant differences in terms of the .
8.2. Recommendations on Model Selection
For FFitts law and its alternatives (Models #3–#6), using the nominal was always superior for both the 1D and 2D conditions. If we use the nominal , however, the baseline (Model #1) is the best. Thus, when researchers seek to predict the s under untested conditions for a single user group or a single device, the baseline model is recommended; this is consistent with previous studies comparing the nominal and effective width methods (Wright and Lee, 2013; Zhai et al., 2004).
When researchers try to compare several conditions with 2D circular targets, models using are required. Rather than the (Model #2), models with are recommended: #3 or #4 are significantly better. For the 1D case, the model is still a better choice than using .
Among all the 1D and 2D conditions, we recommend not using , because it often causes the mathematical error in FFitts law. Use of the parameter optimization method is convenient for both researchers and participants. In addition, by avoiding the finger calibration task with a 1-pixel target, we can use and compare Fitts’ and FFitts laws by conducting only Fitts’ law tasks with reasonably sized targets. This enables testing of the model fitness by using data measured from (e.g.) a gamified task of tapping bubbles on the screen, as Woodward et al. did (Woodward et al., 2020).
8.3. Limitations and Future Work
Our conclusions are limited by the task conditions that we used. It is unclear whether our findings, e.g., on the best model and on when a mathematical error occurred, would hold under other conditions, such as operating a smartphone with a thumb in a one-handed posture and using much longer target distances. For the model-fitting results, we sometimes did not observe a great difference in the and values. For example, we sought to state more clearly whether Model #2 or #4 is better in the 1D condition, but the difference was 1.3 (no significant difference), while the difference was 2.2 (positively different). This prevented us from concluding that using is better, because the results could easily change depending on the user group and the task parameters and . Much more data is needed to understand this point, which will inform our future work.
Another unresolved point is the timing of when to compute the model fitness. Following previous studies on FFitts law (Bi et al., 2013; Luo and Vogel, 2014; Woodward et al., 2020), we examined the fit for conditions. For the effective width method, however, Soukoreff and MacKenzie stated that the values should be calculated for each task condition for each participant; the participants’ data should then be averaged last in order to compute the throughput (i.e., a unified performance metric) (Soukoreff and MacKenzie, 2004). By that methodology, we should have calculated Equation 7 () for the 20 data points for each of the 12 participants. This would have increased the chance to observe the mathematical error, because it would have required checking for it 240 times. This notion indirectly supports that researchers should avoid using when they seek to apply FFitts law robustly. According to Olafsdottir et al., there are at least 20 approaches to compute the throughput, depending on the order of aggregating the data (Olafsdottir et al., 2012). We did not get deeply involved in this point and simply followed the previous FFitts law studies, yet it will be worth revisiting in the future.
We have revisited FFitts law and the inconsistencies in its methodology. The parameter optimization method showed some relative advantages in comparison with measuring the finger tremor factor , which often causes a negative value inside a square root in both our data and the data in previous studies. While we discussed the best and suboptimal models in consideration of the research goal, such as a device comparison or prediction under a single condition, our conclusions could change for different user groups and task conditions in future experiments. To better understand touch-pointing performance and derive better models, we hope that researchers will report more data from touch-pointing experiments, even if the data shows that a novel model exhibits a lower fitness than the baseline or the data cannot be fitted because of mathematical errors.
- A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6), pp. 716–723. External Links: Cited by: §5.5.
- The effect of edge targets on touch performance. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI ’15, New York, NY, USA, pp. 1837–1846. External Links: Cited by: §4.1.4.
- FFitts law: modeling finger touch with fitts’ law. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’13, New York, NY, USA, pp. 1363–1372. External Links: Cited by: §1, §1, §2.2, §2.3.1, §3.1, §3.1, §3.2, §3.3, §4.3, Table 5, §7, §8.1, §8.3.
- Bayesian touch: a statistical criterion of target selection with finger touch. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST ’13), pp. 51–60. External Links: Cited by: §2.3.3, §3.3, §4.3, §5.
- Predicting finger-touch accuracy based on the dual gaussian distribution model. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST ’16, New York, NY, USA, pp. 313–319. External Links: Cited by: §4.3.
- Effects of motor scale, visual scale, and quantization on small target acquisition difficulty. ACM Trans. Comput.-Hum. Interact. 18 (3), pp. 13:1–13:32. External Links: Cited by: §2.3.4.
- The speed and accuracy of simple hand movements. Ph.D. Thesis, University of Birmingham. Cited by: §1, §2.1.
- Models of accuracy in repeated-measures designs. Journal of Memory and Language 59 (4), pp. 447–456. Cited by: §5.3.
- The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology 47 (6), pp. 381–391. External Links: Cited by: §1, §2.1.
- Speed-accuracy tradeoff: a formal information-theoretic transmission scheme (fitts). ACM Trans. Comput.-Hum. Interact. 25 (5), pp. 27:1–27:33. External Links: Cited by: §2.1.
- The generalized perceived input point model and how to double touch accuracy by extracting fingerprints. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, New York, NY, USA, pp. 581–590. External Links: Cited by: Figure 1.
- Understanding touch. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, New York, NY, USA, pp. 2501–2510. External Links: Cited by: Figure 1, §3.1.
- Bayes factors. Journal of the American Statistical Association 90 (430), pp. 773–795. External Links: Cited by: §5.5.
- Modeling two dimensional touch pointing. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, UIST ’20, New York, NY, USA, pp. 858–868. External Links: Cited by: §1, §2.3.4, §3.3.
- Crossing-based selection with direct touch input. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems, CHI ’14, New York, NY, USA, pp. 2627–2636. External Links: Cited by: §2.3.2, §3.1, §3.2, §8.3.
- Fitts’ throughput and the speed-accuracy tradeoff. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, New York, NY, USA, pp. 1633–1636. External Links: Cited by: §2.1.
- Fitts’ law as a research and design tool in human-computer interaction. Human-Computer Interaction 7 (1), pp. 91–139. External Links: Cited by: §1, §2.1, §2.1.
- Non-normal data: is anova still a valid option?. Psicothema, 2017, vol. 29, num. 4, p. 552–557. Cited by: §5.3.
- A new test of throughput invariance in fitts’ law: role of the intercept and of jensen’s inequality. In Proceedings of the 26th Annual BCS Interaction Specialist Group Conference on People and Computers, pp. 119–126. Cited by: §8.3.
- Towards a standard for pointing device evaluation, perspectives on 27 years of fitts’ law research in hci. International Journal of Human-Computer Studies 61 (6), pp. 751–789. External Links: Cited by: §1, §1, §2.1, §8.3.
- Effect of running on throughput in pointing tasks: a fitts’ law experiment. Master’s Thesis. Note: Utrecht University Cited by: §1.
- Fundamentals of skill. London: Methuen. Cited by: §2.3.4.
- The effects of task dimensionality, endpoint deviation, throughput calculation, and experiment design on pointing measures and models. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, New York, NY, USA, pp. 1639–1648. External Links: Cited by: §1, §2.1.
- Examining fitts’ and ffitts’ law models for children’s pointing tasks on touchscreens. In Proceedings of the Working Conference on Advanced Visual Interfaces, AVI ’20, New York, NY, USA. Cited by: §1, §2.3.1, §3.1, §3.1, Table 5, §7, §7, §8.1, §8.2, §8.3.
- Issues related to hci application of fitts’s law. Human-Computer Interaction 28 (6), pp. 548–578. External Links: Cited by: §2.1, §2.3.4, §8.2.
- Rethinking the dual gaussian distribution model for predicting touch accuracy in on-screen-start pointing tasks. Proc. ACM Hum.-Comput. Interact. 4 (ISS). External Links: Cited by: §3.3, §4, §5.
- Effect of gaps with penal distractors imposing time penalty in touch-pointing tasks. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services, MobileHCI ’18, New York, NY, USA. External Links: Cited by: §1, §2.3.2, §3.1, §3.2, §4.3.
- Risk effects of surrounding distractors imposing time penalty in touch-pointing tasks. In Proceedings of the 2018 ACM International Conference on Interactive Surfaces and Spaces, ISS ’18, New York, NY, USA, pp. 129–135. External Links: Cited by: §1, §2.3.2, §3.1, §3.2, §4.3.
- Speed-accuracy tradeoff in fitts’ law tasks: on the equivalency of actual and nominal pointing precision. International Journal of Human-Computer Studies 61 (6), pp. 823–856. External Links: Cited by: §2.1, §2.3.4, §8.2.