Modelling Noise-Resilient Single-Switch Scanning Systems

12/28/2017
by   Emli-Mari Nel, et al.
University of Cambridge
0

Single-switch scanning systems allow nonspeaking individuals with motor disabilities to communicate by triggering a single switch (e.g., raising an eye brow). A problem with current single-switch scanning systems is that while they result in reasonable performance in noiseless conditions, for instance via simulation or tests with able-bodied users, they fail to accurately model the noise sources that are introduced when a non-speaking individual with motor disabilities is triggering the switch in a realistic use context. To help assist the development of more noise-resilient single-switch scanning systems we have developed a mathematical model of scanning systems which incorporates extensive noise modelling. Our model includes an improvement to the standard scanning method, which we call fast-scan, which we show via simulation can be more suitable for certain users of scanning systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

04/20/2018

The Statistical Model for Ticker, an Adaptive Single-Switch Text-Entry Method for Visually Impaired Users

This paper presents the statistical model for Ticker [1], a novel probab...
02/17/2016

2D SEM images turn into 3D object models

The scanning electron microscopy (SEM) is probably one the most fascinat...
12/29/2021

Onsite Non-Line-of-Sight Imaging via Online Calibrations

There has been an increasing interest in deploying non-line-of-sight (NL...
01/21/2019

On the Capacity Region of Bipartite and Tripartite Entanglement Switching

We study a quantum switch serving a set of users. The function of the sw...
06/17/2020

Never Trust Your Victim: Weaponizing Vulnerabilities in Security Scanners

The first step of every attack is reconnaissance, i.e., to acquire infor...
03/11/2019

Performance Evaluation of a Quantum Entanglement Switch

We study a quantum entanglement switch that serves k users in a star top...
05/20/2007

Scanning and Sequential Decision Making for Multi-Dimensional Data - Part II: the Noisy Case

We consider the problem of sequential decision making on random fields c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Single-switch scanning systems are a class of augmentative and alternative communication (AAC) devices. A single-switch user is someone whose primary means of communication relies on toggling a switch. Single-switch users include non-speaking individuals with motor disabilities who are mostly confined to wheel chairs, such as users with cerebral palsy or locked-in syndrome. Examples of triggers for single-switch systems include blinking, raising an eye brow, or thinking of an activity such as tennis [1, 2, 3].

Scanning systems are the most prevalent text entry methods for single-switch users. It is an active area of research, with for instance a recent study published in [4] and new approaches being explored, such as Huffman scanning [5]. Grid2 is a typical commercial scanning system widely used in practice. An example of how to select a letter with such a scanning system is provided in Figure 1.

(a)
(b)
(c) (d) (e)
Fig. 1: A typical scanning interface. To select a letter, at least two clicks are necessary. In the first phase all rows are scanned in a sequence. The first click selects the desired row “e f g h .” (a). Thereafter the individual letter keys of the selected row are scanned in a sequence. The second click selects the desired letter “h” (b). (c)-(d) Examples of switches that can be used to interface with a computer. An eye-brow raise in (d) corresponds to a switch event, as detected by the shown Impulse EMG bluetooth access switch. (e) An example image of Nomon, a text entry method that can be controlled by a single switch.

The following three metrics are typically used to measure performance and they will be used throughout this article. First, text entry rate is measured in words per minute (wpm), with a word defined as five characters, including space. Second, the number of clicks per character (cpc). Third, character error rate (cer) is defined as the minimum edit distance between between the output word and the intentional (ground-truth) word, divided by the number of characters in the intentional word.

Although a vast number of error correction modes are possible, the literature taking noise into account as part of the scanning-system design is rare. Text entry methods are usually tested only on able-bodied users, seeking the fastest text entry rate with a reasonable word or character error rate. Although it is useful to determine performance boundaries, it is not always clear how such empirical results would change in the presence of inevitable noise in a realistic use-context with a non-speaking individual with motor disabilities triggering a switch prone to false activations and drift. In noisy situations, many impaired users are currently left with no automatic means of communication, even when having full cognitive capacity. It is noted in [4] that at least 6wpm can be expected for a non-impaired user, whereas much lower rates are the norm for non-speaking individuals with motor disabilities (1wpm or lower is common).

In this article we present the design of a plausible model of single-switch scanning systems within a probabilistic framework. Each noise source is represented by a probability distribution, which can be easily estimated through a few measurements before performing the simulation. A Markov chain is used to model all possible user interactions with the scanning system. A particularly useful and critical aspect of our work is the model’s ability to reflect how the user would react with the system while writing a word, highlighting potential shortcomings in a way that can be easily interpreted by a non-expert. In noisy situations this is especially useful, as it can reduce the immense effort to evaluate an interface with a large representative sample of impaired users. Indeed, since the performance experienced by the end-users of single-switch scanning systems is varying due to many factors, such as the type of disability, noise characteristics of the switch, motor control ability, cognitive capacity and level of literacy, it is often impossible to evaluate such methods reliable via A/B testing in controlled experiments. We instead argue it is possible to gain design insight by using a combination of probabilistic modelling and model validation with non-speaking individuals with motor disabilities.

We conjecture that unaccounted noise sources are the main reason why text entry rates drop substantially when single-switch scanning systems are used by non-speaking individuals with motor disabilities. Rigorous noise models that provide an accurate reflection of realistic use-contexts is the first step to improve the robustness of the underlying design of scanning systems. By modelling the noise sources one can potentially reduce the number of error correction substantially, which can help the users obtain text entry rates closer to the performance of able-bodied users.

Our simulation results indicate that scanning systems are indeed very sensitive to false positives, making it difficult to resolve by manual error correction actions. It is shown that the best way to deal with click-timing noise is to increase the scanning delay, which leads to reduced text-entry rates.

Ii Noise Sources

We define the most important noise sources as follows: Switch noise comprises false acceptances and false rejections. False acceptances, also referred to as false positives, are spurious detections. False rejections, also referred to as false negatives are switch events that are erroneously ignored. The click-timing noise causes the observed click time of a switch event to be earlier/later than intended. This can be due to the user’s response time or due to the delay caused by the device that captures a switch event. In practice the latency can easily be as much as 2 seconds, and sometimes much longer, depending on the complexity of the switch and the user’s disability. Able-bodied adults typically exhibit about 50ms reaction delay when using a standard keyboard.

An important practical consideration regarding the click-timing noise is a large but consistent (small variance) latency. This problem is exacerbated in an audio-based system (as opposed to a visually-based system), or when the user has a severe impairment. If a scanning delay is increased according to the average click-time latency, it can be detrimental for the text entry rate.

In scanning systems, sensitivity to the click-timing noise is typically reduced by increasing the scanning delay. False positives are typically dealt with in three ways [6]: First, the duration between accepted clicks is restricted, effectively modelling the recovery time of the user. Thus, if, for example a whole series of clicks is received in rapid succession, only the one click within the recovery-time window will be accepted. This compensates for situations where e.g., a faulty cable generates a series of clicks when only one click was intended. Second, an undo time window can be used to undo a false positive during a row scan. If column group scans have passed, the last row selection will be cancelled, and the system will resume by scanning the rows, where a group scan refers to all the scans associated with a specific row/column. Thirdly, a “delete” symbol can be included in the layout to remove the last letter selection.

Iii Background

Shannon’s noisy-channel coding theorem [7] is the foundation for an information theoretical analysis of reliable communication over a noisy channel, in this case single-switch scanning systems [8, 9, 10]. The noisy-channel coding theorem states that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data nearly error-free up to a computable maximum rate through the channel. The computable maximum rate of the channel is called the channel capacity. For our application, the information rate  is measured in bits per second:

(1)

where is the input set (a list of words that the user intends to write), is the output set (the list of words the user writes), is the mutual information, refers to the entropy function, e.g., , and is the average time it takes to produce an element in . measures the degree of uncertainty in after observing . If all elements in can always be inferred from , the probability of error is zero and will cause the conditional entropy term to vanish, so that .

If is maximised with respect to , the corresponding is equal to the channel capacity. For example, in some problems the channel capacity can be reached with zero probability of error by using only some of the input symbols, i.e., setting some values of to zero [11].

MacKay et al. [8] model a single-switch user as follows: A selection (letter/word/sentence) is made from one click to the next. The probability that the user waits for a time seconds from one click to the next is

(2)

resulting in an information rate of

(3)

where is the reaction delay, and it is assumed that the user clicks within seconds around each intended click time. The capacity is then computed by optimising with respect to . If the switch is unreliable (i.e., yielding false positives or false negatives) a fraction of the time it constitutes a bit error, so that the capacity will drop with a factor , where is the binary entropy function (see the second part of the noisy-channel coding theorem described in [7, 11]).

A low error rate is the most important requirement for our application, even more so than the speed at which the user can write. A reduction in the error rate typically requires longer output codes (elements in ) which increases and reduces . This doesn’t have to be the case, shown with an example in Section VI.

One way to increase the length of the output codes to accommodate noise (reducing the error rate) is to allow multiple clicks to make a selection. This effectively increases the length of the output code. After each click, the previous click information can be used to update the way the input symbols are presented to the user (referred to as dynamic updating) and this can be of huge potential benefit if it can be applied. It is, however, crucial for a single-switch system to use as few clicks as possible for a selection. One can imagine how exhausting an impaired user might find it to use more than two facial gestures to select only one letter at a time. Selecting parts of sentences instead of one letter at a time is therefore a highly desirable characteristic.

Equation 1 is used as basis to further discuss existing methods. However, we follow the literature when making direct performance measurements, namely the text-entry rate (wpm), click rate (cpc) and character error rate (cer), as mentioned in Section I. Many papers do not publish the last two quantities. So, following existing literature, the text entry rates of some techniques are compared, and where possible we also mention the click- and error rates.

Iv Scanning Systems

Scanning systems are widely used. Research on scanning systems is therefore still active, with a recent study published by Koester and Simpson [4] in 2014. The latter paper focuses on case studies of impaired users, and also contains a summary of previous work in the field. Some of their findings are summarised below.

The aforementioned study by Koester and Simpson [4] note that a very fast user may achieve 7–8wpm using a single-switch scanning system, but rates of 1wpm and lower are common. They note that at least 6wpm can be expected for an able-bodied user, whereas much lower rates are the norm for non-speaking individuals with motor disabilities. They focussed on calculating the settings for any scanning system that best suits a user’s abilities. Their study included nine participants whose text entry range improved from 0.3–2.9wpm to 1.1–6.5wpm, with the most important determining parameters being the scanning delay, the presence of word-predictions, and the alphabet layout. Most users were diagnosed with cerebral palsy. It was not clear if a large variety of switches were used, as some switches may be more error prone that others.

If the scanning delay is too short, error corrections will have a detrimental effect on the text entry rate, whereas if it is too long, unnecessary time will be wasted with each scan, also causing the text entry rate to be low. Koester and Simpson [4] found that a large improvement was achieved by measuring the user’s average latency and setting the scanning delay accordingly. Each user was allowed to use his/her system of choice, which included a variety of commercially available systems. Their results were in line with previous published results, ranging from 0.5–4wpm for motor-impaired users, even though they were not using their own systems.

Word predictions can increase the text entry rate of a scanning system in some circumstances. It can also reduce the average number of clicks per character significantly [12, 13]. Stephen Hawking’s text entry rate doubled from 1wpm with word-predictions from a customised language model [14].

A study by Koester and Levine in 1996 [12] compared the text entry rate of eight able-bodied users to that of six users who had high-level spinal cord injuries. The able-bodied users typed using their usual method of keyboard access, whereas the injured users used mouth stick typing. They found that word predictions had a negative impact in this study: the use of word prediction caused the text entry rate to decrease for the spinal-cord injured users and only modestly enhanced it for the able-bodied users. A more recent study by Koester and Simpson [4] (2014) reported an average text entry rate with/without word-predictions of 2.7/1.5wpm. They note, however, that the word-prediction settings generally had to be tailored to the user’s needs, which may explain the discrepancy in the literature, where it is often reported that word-predictions have negative or no impact on the text entry rate.

In the scanning system Grid2 [6], it is recommended that the word-prediction functionality is disabled when using a scanning system in audio mode, or when the scanning delay tends to be long. From Koester and Simpson’s work [4], we can expect an average of 1.5wpm (or much less) when a scanning system is used by a non-speaking individual with motor disabilities, and/or in audio mode. Also, without word predictions, two clicks are always necessary to select a character.

In scanning systems, sensitivity to the click-timing noise is typically reduced by increasing the scanning delay. An increase to , where is the scanning delay, for e.g., row , will reduce the capacity by a factor for that selection, where .

The model for scanning systems defined by Koester et al. [15] is similar in spirit to ours, but doesn’t allow modelling of sequences of actions. Quoting them, they assume that: “the user never makes two mistakes in the same selection attempt (i.e., the user never selects the wrong row and then selects the wrong column; the user never selects two incorrect rows). The model could be expanded to accommodate this by adding in additional probabilities for two error sequences, but it would get very complicated.” On a theoretical level, this assumption is only valid for a very small probability of error. We allow modelling of a large number of user actions; up to a configurable large limit.

Iv-a Dasher

One of the single-switch algorithms that theoretically performs the best (according to Equation 3) is Dasher [8, 9, 10]. Generally, Dasher depends on good vision. An untested audio version in two-button mode is available, but has not been adapted to cope with unreliable switches [8]. Dasher was originally designed as a hands-free text entry method, controlled by any continuous motion such as eye gaze, or finger movements [16]. The problem is divided into two independent parts:

  1. Efficient information capture by the system, i.e., optimising Equation 3.

  2. Efficient language compression. The language model used by Dasher typically compresses English to about 2 bits per character.

Unlike most other techniques, a click hardly ever maps to a single character, but to parts of sentences. This allows for fewer gestures to be used compared to existing techniques (a major advantage for physically impaired users). More probable letters occupy more screen space and are therefore easier to find and quicker to select. The latter allocation space is adapted dynamically, as determined by the language model and according to previous selections made by the user. However, the layout remains fixed (letters are always listed in alphabetical order), so that the dynamic updating places no additional cognitive burden on the user.

A text entry rate of about 34 wpm is possible when using a computer mouse to control Dasher in continuous mode by an expert user. This result is impressive considering the typical ten-finger typing rate of 40-60 wpm [16]. Extensive tests have shown that 14 wpm are possible with relatively little practise and when using gaze as controlling device [17]. Depending on the individual, it has been shown that an expert user can reach up to 25 wpm using gaze as controlling device [18].

In one-button mode, an expert user has shown to be able to communicate at 10 wpm using only 0.4 gestures per character [9]. Finding a way to use more screen space to display the selection options in 2D, or improving the language model, can each lead to a possible factor 2 speedup.

Iv-B Nomon

Nomon [19, 20] is a single-switch system that applies Shannon’s noisy coding theorem in its design. The capacity of the channel is computed in [20]. Several clocks are presented on a display screen. There is one clock associated with each letter in the alphabet, and some additional clocks are associated with likely words at the time. To make a selection, the user has to press a button when the rotating hand of the corresponding clock reaches noon. Words/letters that the system thinks are probable are shown in yellow (e.g, “not_”), whereas improbable ones are shown in white (e.g., “now_”), where “_” is used throughout this paper to indicate a space. Several clicks may be necessary per selection, but it has been measured that, on average, two clicks are usually enough. Figure 1(e) depict some of the clocks.

After each click, all possible random codes are decoded and mapped back to words. If the most probable word is above a certain threshold, it is selected. Nomon may require more than 2 cpc in the presence of noise. Dynamic updating is done by updating the most probable word-selection list after each letter selection. This noise-coping strategy can be compared to a scanning system where the layout changes dynamically, but without the additional cognitive load that the scanning system will impose.

Nomon is sensitive to the latency of the user/system as all the available click-time information is needed after each click before the program can proceed. Nomon is, however, less sensitive to the latter latency compared to scanning systems, where the scanning delay has to be adjusted according to the latency for every scan. False acceptances and rejections are not accounted for by Nomon. Like Dasher and standard scanning systems, these errors will therefore reduce the capacity by .

An average of 1.5–2 cpc is typically necessary to select a word, as measured empirically on able-bodied users. Using Nomon, an expert able-bodied single-switch user can write at a speed of approximately 10 wpm with a low probability of error. An extensive experiment on able-bodied novice users has shown Nomon’s superiority to standard hierarchical scanning systems: at the end of the experiment the average text entry rate for Nomon was 5.8 wpm and 4.3 wpm with the hierarchical scanning system.

Nomon was tested informally for our research purposes. Testing was done with a customised webcam-based switch controlled by smiling and winking. Eight able-bodied participants were asked to write ten phrases. None of the participants were familiar with using facial gestures to communicate. After five phrases they were required to take a break (or after one hour if they wrote less than five phrases). Each user trial took between 1.5 and 2 hours. The users were allowed to increase the speed at any time, but the speed was not increased up to breaking point. The participants were requested to write as accurately as possible (at a comfortable speed), instead of writing as fast as possible. The main goal was to test our customised gesture switch (which has a low error rate), to see if it could be used to communicate. A secondary goals was to determine performance estimates if Nomon is used by controlling it with a gesture switch instead of a joystick button.

All participants were able to write comfortably at speeds between 1.5 wpm and 2 wpm with very low probability of error, averaging approximately 2 cpc. Note that the scanning delay in Nomon had to be slowed down with as much as 800 ms in some cases (some users can easily take up to one second to make a smiling gesture). When the user can click extremely precisely (if the click-timing distribution is close to the delta function), as little as one click per selection may be necessary. Although this is an informal experiment, it provides a strong indication that Nomon is sensitive to even a response latency such as 800ms as expected (mentioned above).

V A Model of Noise-Resilient Scanning

To do a performance analysis, expectations regarding the text entry rate (), the number of clicks (), and the number of character errors () are computed, where is multiplied by an appropriate scan delay where necessary to covert it to . As a performance metric, we compare the first and second-order statistics of the latter entities. Probability mass functions of the quantities in question are derived numerically, so that expectations can be computed.

In all simulations, a phrase is processed one word at a time. Each word can be correct, containing spurious characters, or a time out error can occur if the user takes excessively long. Expectations are computed by averaging results over all processed words.

We make use of the recommended settings in a popular and representative hierarchical scanning system Grid2[6]. The settings are tailor-made for impaired users (who have latencies of greater than 1second). The simulation for this user group (who probably needs it most) is an accurate reflection of the real world. We therefore do not include all error correction modes, e.g., we do not allow for reverse scanning and word predictions. More latent states can be added to the Markov chain (which is used to model user actions) to include more modes.

Spurious row scans are assumed to be corrected through an undo time window, whereas spurious column scans are assumed to be corrected with a delete symbol in the layout. We do not model the effect of limiting the duration between accepted clicks, although this can also be easily simulated, if necessary. The user is assumed to immediately follow the action to correct any errors, which can result in a correction or a new sequence of errors to be corrected.

To model all possible outcomes when interacting with a scanning system is not immediately obvious, as there is potentially an infinite number of possibilities. We therefore define certain system-failure conditions to decrease the number of possibilities. System failure represents a user who gives up on the system as it becomes unusable.

For validation purposes we have tested the free trial-version Grid2 software. We have encountered some problems, specifically related to using it in audio mode. Firstly we have found it difficult to select the first cell of a group scan at a high speed as it is difficult to anticipate when to click. In such cases, it is recommended to increase the scanning delay of only the first scan. However, in practice, it then becomes more difficult to time the other scans. We also found the software difficult to use if the scanning delay is shorter than 1second in audio mode (probably due to some software implementation issues).

For the reasons above, we have implemented our own scanning system to facilitate usage in audio-mode. The code can be downloaded, and may also be useful for the reader to better envision the usage resembled by the simulations. It was also used to validate our simulation results by conducting a pilot study.

In our software, all sound files were overlayed with a soft “tick” sound, marking the beginning of each cell. An additional “tick” sound has also been included before the first letter of each group scan. We have used our own sound files which are faster recordings of the alphabet than the default sound files provided by the Grid2 software. With these small modifications, one can reduce the scanning delay to a mere 300ms, which reflects a fast scanning delay when used in visual mode (see e.g., [19]). The “tick” sound provides a rhythm when used in audio mode, that can help the user tremendously to anticipate when to click, especially if the user has memorised the layout. Although an extra “tick” cell is included per row/column group, the overall speed is increased. Our software also counts the number of scans used to write a word, which is useful for performance comparisons.

V-a Markov Chain

Let be the maximum number of scans that constitutes a group scan, i.e., , where is the visible number of rows in the layout, and is the visible number of columns. The sound file of the first visible cell is seen as a continuation of the “tick” sound that precedes it in audio mode. The scanning delay of each cell is (measured in seconds), except for the first cell which has a scanning delay of .

The user’s intentions are modelled with the random variable

, where : If the user intends to make a selection, . Otherwise, if the user waits for the undo time window to pass .

Initially it is a assumed that the user’s click-timing is well represented by a Gaussian distribution. In later sections, however, we show that the derivations to follow apply equally will with any other distribution that is continuous in time.

If the user’s intention is to select cell , the click-timing  is modelled by , where , where is the starting time of cell . The beginning of a group scan (after the “tick” sound) is therefore .

The Gaussian mean is therefore aligned with the centre of each cell’s scanning-window time if . The latency

and the standard deviation

of the Gaussian distribution are assumed to be the same for all cells. Note that, although the Gaussian-noise assumption simplifies the problem somewhat, provides much more information regarding the click-timing noise than only the average response times which was used in [4] to adapt the user settings, and can be measured as easily.

An important consideration is how to model false positives. Consider the practical example, where an Impulse EMG switch generates false positives when the user’s body temperature rises. If the scanning system is used for 5 minutes, one would expect more false positives than when using it for 30 seconds. A noise model that can take the false positive rate into account is the homogeneous Poisson process [21]. Following, the number of events (spurious clicks) in a finite time interval

, always has a Poisson distribution, i.e.,

, where is the average number of false positives per unit time. Hence, the longer one waits, the smaller the probability that , which is an accurate and important reflection of the reality for our application.

False negatives are represented by the Bernoulli distribution: The probability that a switch-event is erroneously ignored is

. The switch noise parameters, and , can be measured in practice, and are in many cases, part of the switch specifications. All noise sources are assumed to operate independently from each other.

It follows that the probability to receive no clicks () while aiming for and scanning cell  is:

(4)
(5)

where

(6)

is the indicator function.

It follows that the probability to receive a click in the same scenario is:

(7)

The distributions for false- positives and negatives are generic and will probably be suitable for most switches (switch error rates are typically defined in their user manuals). However, the click-timing distribution is expected to vary between users. For example, the Gaussian can be replaced with a mixture model, where the second component represents an unusual lag in response (e.g., to represent a state of fatigue). Other than statistical independence, the simulation program is invariant to the choice of noise distributions. Replacing the Gaussian will require only recalculating Equation 4 and Equation 6. With the Gaussian example, it is easy to see that the should probably be increased linearly with and , otherwise the user can make unintentional selections with a high probability that must be corrected afterwards

A Markov chain is constructed for each intentional word to represent the user actions defined in the manual (for the target group), up to the point where it becomes infeasible to use the scanning system. A word selection is assumed after spacebar (“_”) or fullstop (“.”) is selected. The dynamic process is terminated if a word is selected or if the system fails. Three terminating states are defined as such. If the system fails, the terminating failure state will be reached, as the user is unable to complete a word. The correct state is reached if , whereas the error state is reached if . The simulation is done over a finite duration seconds, where . If the user takes longer than seconds to write a word, system failure is assumed (as a user might want to give up trying to write the word). This type of failure is called a time-out failure.

After an erroneous row selection, the user is assumed to wait for group scans to undo it. The undo time window is represented by the random variable . During a row scan . Otherwise, , where , representing the number of column groups scans that have passed. After an erroneous letter selection, the user is assumed to immediately proceed with an attempt to delete it. The number of spurious letter selections is represented by , where . If the system will fail.

The simulation parameters are summarised by

(8)

where

  • : Gaussian click-timing parameters.

  • : False negative probability.

  • : False positive rate (per second).

  • : Scanning delay (seconds).

  • : Number of column scans required to trigger an undo.

  • : Number of spurious letter selections before system failure.

  • : specifies the maximum time that can be spent on a word before system failure is assumed.

A simple layout is shown in Figure 2. The corresponding Markov Chain for is shown in Figure 3. Each state , where , is associated with a unique ensemble:

(9)

where , and . The letter associated with cell  in the layout is represented by . A specific intentional letter is written as (the ’th letter in ) and denotes the set of letters to of the input word. For a row scan, and, by definition, . For a column scan and .

Fig. 2: (a) A simple Grid2 layout, where the symbol “” will delete the previous output symbol.

R/C

1
R
1
a
a

2
R
1
t
a

3
C
1
a
a

4
C
1
_
a

5
C
1
t
a

6
C
1
a

7
C
1
a
a

8
C
1
_
a

9
C
1
t
a

10
C
1
a

11
R
1
a
a

12
R
1
t
a

13
C
1
a
a

14
C
1
_
a

15
C
1
t
a

16
C
1
a

17
C
1
a
a

18
C
1
_
a

19
C
1
t
a

20
C
1
a

21
R
2
a
_

22
R
2
t
_

23
C
2
a
_

24
C
2
_
_

25
C
2
t
_

26
C
2
_

27
C
2
a
_

28
C
2
_
_

29
C
2
t
_

30
C
2
_

31
R
2
a
_

32
R
2
t
_

33
C
2
a
_

34
C
2
_
_

35
C
2
t
_

36
C
2
_

37
C
2
a
_

38
C
2
_
_

39
C
2
t
_

40
C
2
_

Error (41)

Failure (42)

Correct (43)

Fig. 3: A state diagram to visualise the latent states of Figure 2 for “a_”. Each state (rectangle) is associated with an (Equation 9), where for readability, R/C corresponds to , indicating a row/column scan. The transitions to the terminating correct, error, and failure states are rendered with green, pink and blue arrows, whereas all other transitions are rendered as red (click) and black (miss) arrows, with corresponding transition probabilities and (see Equation 10).

For the example, in Figure 2 and Figure 3, , , , and . The intentional can be deduced from and . For example, state 17 represents a column scan () while the intentional letter is “a”, and the system is busy scanning cell “a”. One erroneous letter has been written (), and one column group scan have passed (). Since , the user’s intention is to select the delete symbol, but the wrong column group is currently being scanned. The user must therefore first wait for , i.e., proceeding to state 18 and then state 11 to first undo the erroneous row selection. If the user accidentally selects “a” when at state 17, two erroneous letters will be written and a transition to the system failure state 42 will be made.

The last three states in the chain represent the terminating states (which have only self-loops with transition probabilities of 1.0). All other (non-terminating) states have only two possible transitions, namely one associated with a click (Equation 7), and the other with a miss (Equation 4). After determining the Markov chain topology, all non-zero transition probabilities can be computed from:

(10)

where , , denotes the hidden state at time step , indicates a click/miss that can result in a state transition (applicable to transitions from non-terminating states), whereas indicates that clicking is not possible (applicable to transitions from terminating states). Initially, .

The click and miss probabilities, or , associated with non-terminating states () can be computed from Equations 4, 7 and 9 (if there exists a link between states  and ). More specifically, is firstly computed from Equation 9 to determine , so that or . Transition probabilities from terminating states are 1.0. Transition probabilities from all non-terminating states at the last time step are given by .

The probability of any state sequence can be computed from:

(11)

A possible sequence for Figure 3 is: The user selects row “a” too late, thereby accidentally selecting row “t”, waits for the undo time window to pass, but selects column “t” unintentionally, resulting in a spurious output symbol “t”. When trying to undo/delete the spurious “t”, the user then accidentally clicks on row “a”), and then waits for two column scans to undo the erroneous row selection. The user then manages to delete the spurious “t” and writes “a_” faultlessly. The corresponding state sequence is shown in the second column of Table I.

m v’ v
0 1 - - - - - - - - 1.0
1 1 1 1 a 1 a 0 - 1
2 2 1 1 a 2 t 0 - 1
3 5 0 1 a 1 t 0 0 2
4 11 1 1 a 1 t 1 - 2
5 13 0 1 a 1 a 1 0
6 14 0 1 a 2 _ 1 0
7 17 0 1 a 1 a 1 1
8 18 0 1 a 2 _ 1 1
9 11 1 1 a 1 a 1 - 2
10 12 1 1 a 2 _ 1 - 2
11 15 0 1 a 1 _ 1 0 2
12 16 0 1 a 2 1 0 2
13 1 1 1 a 1 a 0 - 1
14 3 0 1 a 1 a 0 0 1
15 21 1 2 _ 1 a 0 - 1
16 23 0 2 _ 1 a 0 0 2
17 24 0 2 _ 2 _ 0 0 2
18 42 - - - - - - - - 1.0
TABLE I: An example state sequence from the Markov chain in Figure 3. The state-sequence probability is the product of all the probabilities shown in the last column (Equation 11).

Computing requires the summation over all possible state sequences:

(12)

where , and .

There is not a direct mapping between  and , as the user can sometimes miss a click. Thus, all possible have to be considered at each time step. Note that can not change once a terminating state is reached. As a first step towards computing a probability mass function for , let

(13)

where and . Finally,

(14)

The probability mass function for is computed in a similar way:

(15)

where represents the number of scans associated with state . That is, if is associated with the first cell, otherwise . As an initial condition, . Finally,

(16)

If (i.e., ignoring the extra “tick” sound), then Equation 16 simplifies to:

(17)

To compute the number of erroneous characters, one can firstly consider the case where :

(18)

In all other cases () the hidden number of correct letter selections and the hidden number of spurious clicks at each time step  are considered in conjunction with . Let

(19)

where , and . When , a terminating state () can only be reached from a non-terminating state () if the user clicked at the previous time step (). To represent termination with a spurious click, let:

(20)

where and represents the third and fourth elements of the given by Equation 9.

Once any of the terminating states have been reached, the user can not click anymore, and the number of correct and erroneous letter selections remains fixed if the simulation is continued. To model the latter scenario, let

(21)

where

(22)

If the system has not terminated at the last time step , a transition to the failure state is enforced, so that:

(23)

Finally,

(24)

where .

Example expectations for the number of scans are given in Figure 4.

Fig. 4: Example result for “standing_” using the configuration in Figure 1. The distribution defined by Equations 16 is shown (black). The parameters (see Equation 8) are s, s, , /s (one false positive every two minutes on average), , s. The red line (at 77 scans) indicates the best possible outcome. This can be verified by manually counting the number of scans to write the word, including the extra “tick” sound. The blue lines indicate the mean- and standard deviation of the number of scans.

V-B Model Parameters

This section provides a motivation for the parameters that were tested. According to an expert field analyst [22], it is rare that a non-speaking individual with motor disabilities can use a hierarchical scanning system with s, mainly due to the latency associated with detecting a switch event such as an eye-brow raise. It is assumed that a novice user who can use a facial-gesture type of switch well, would start an average of about s (0.57wpm) [6] and progress to an average of about s (1.7wpm) in noiseless conditions.

We carried out a pilot study with a single participant to determine approximate values for in different conditions, and to validate our simulation model. The participant made use of our custom-made scanning system, controlled by “space bar” on a standard keyboard. The participant was gradually trained to use the system blind-folded, after several hours of practise. Synthetic noise was added to the software at the end of the pilot study. Since the participant reached expert-level performance before adding the noise, the effect of the synthetic noise could be isolated. This allowed us to directly compare against our simulation results for validation purposes.

Synthetic noise was added by sampling from the click-time- and switch-noise distributions. For example, after the user clicked, a time delay was sampled from , and added to the user’s actual click time. Likewise, false positives were generated from a Poisson process, and each true click was accepted with probability or rejected with probability .

At the beginning of the pilot study the participant represented an able-bodied novice user, i.e., the user could use the system error-free with ms. Before the synthetic noise were added, the participant had enough training to represent and expert able-bodied user (with ms). The results of the pilot study are presented in Figure 5.

(a) (b)
Fig. 5: Box-and-whisker plots of the audio pilot study, indicating the 25th, 50th, and 75th percentiles. Results for two sessions are shown in (a)-(b). Each session was 15 minutes long. Each session is numbered (x-axis). (a) Results for a trained participant (with at least two hours of practise) communicating in an environment with little noise, where was varied. The first session was recorded when the user had at least one hour of practise. The last session was recorded when the user could comfortably use the system blindfolded. (b) Results for the same participant in (a) but simulating a non-speaking individual with motor disabilities (by including synthetic noise and using the system blindfolded). Session 1 presents a user who can click precisely, but with some latency; s, ms, , , and s. The latency during Session 2 was increased and some false positives were randomly generated with s, ms, and /s, and s. Results from (b) can be compared to results from Session 2 in (a), as this was the result when the user had the most experience in a noiseless environment.

We focus on simulations that can be used for a variety of facial gestures, where the user may have a long latency, but is able to click precisely (with small variance). Motivated by our pilot study, the Grid2 reference manual [6] and measurements from existing literature (e.g., [19]), the click-timing parameter ranges were set to seconds, and ms during our simulations.

V-C Modelling Results

During all simulations we computed results for the pangram “the quick brown fox jumps over the lazy dog .”, where “_” is used to enter a space. This sentence is used to test algorithms in many papers, as it contains all the letters of the alphabet, and all the words frequently occur in English. In all simulations .

The first simulation investigates the robustness to variations in latency . Results are shown in Figure 6.

Fig. 6: The click-timing delay () is varied for ms, , /s (on average, one false positive every 17 minutes). The black lines represent results for a fixed s and variable . The green lines represent the results when is varied according to , where seconds (i.e., avoiding error corrections due to a click-timing delay).

The green graph indicates that it is significantly better to increase linearly with , instead of correcting the corresponding errors due to which is too large compared to . This result is consistent with results from previous work in the literature [4].

For validation purposes, note that s, , and for s, , which is consistent with our pilot study text entry rates for the same , as shown in Figure 5(b).

The second simulation tested the influence of varying . was set to a large value, and gradually decreased to a small value while keeping fixed. Results are shown in Figure 7.

     
Fig. 7: The effect of varying the scanning delay is investigated for s, ms, , /s (one false positive every 17 minutes). Average output results are shown in the direction of increasing (starting at ). The direction is indicated by the thin arrow line.
     
Fig. 8: The effect of varying is investigated for ms, ms, , and ms. Average output results are shown in the direction of increasing (starting at ). The direction is indicated by the thin arrow line.

Figure 7 indicates that many erroneous characters eventually lead to the system failure, as the text entry rate and error rate are both high. There is a small working range for the program, where a reasonable accuracy () and click rate () can be expected. The latter performance range was measured for . A practical example would be ms and ms when used by an expert able-bodied user. This is easily achievable with our (and other) typical scanning software, especially when other software is used in visual mode. When ms (used by a novice able-bodied user), ms is required, and when s (used by an expert non-speaking individual with motor disabilities), ms is required. It was noted that the standard deviation increases as a user experiences fatigue (when using a facial gesture such as blinking to communicate). This highlights the potential benefits of taking the full click-time distribution into account to determine dynamically.

The third simulation tested the effect of false positives. Results are shown in Figure 8. The point where the scanning system started to degrade substantially was measured at /s. It is unlikely that more error correction actions will help to increase robustness to a high false positive rate. If the scanning system is able take the full noise distributions into account, one might be able to ignore certain clicks avoiding to undo them afterwards.

Vi Improving Scanning using a Noise Model

In this section we present a novel modification to a standard scanning system to potentially benefit users with long response times. We call the proposed method the fast-scan method, and the standard scanning method the slow-scan method. The standard scanning system is based on the most basic system the standard scanning system Grid2 recommends, as mentioned before.

We make use of our model for scanning systems to measure the efficacy of making use of an explicit noise model (in the form of a probability distribution) as part of the text entry method interface. For illustrative purposes, false positives are ignored, as computing their probabilities for the fast-scan method involves a bit more work, and is the topic of a future study.

Assuming,

, the average click-timing delay is known (it can be estimated by measurement), the idea is this: instead of immediately generating a switch event when a click is received, defer until the end of a group scan to make a decision. Then use Bayesian inference to infer the user’s intentions. As a start, choose the most probable cell according the well-known maximum a posteriori decision rule:

(25)

which is the result when maximising over the posterior probability

, and assuming a uniform prior over , i.e, . Waiting until the end of a group scan enables one to subtract the average response time before doing inference.

From a simulation point of view, the states in our state diagram are the same as for the slow-scan method, but the transition probabilities are different. More specifically, if . That is, the miss probability is one for all cells except for the last one in the group. At , i.e., at the end of the last scan in a group, from Equation 7 is computed for all cells, as before but with , where is the scanning delay associated with cell . However, after computing as before, compute the final .

As before, represents the probability of error associated with the click-timing distribution (the probability to click while scanning cell , with the intention to select cell ). To compute use Equation 6, and set .

, which is used to specify , can be any distribution that is continuous in time, but for comparative purposes it is initially assumed to be Gaussian. When using a Gaussian, one measures the difference in performance if both methods know that the received click times will fall within a specified range.

Since false positives are ignored, the only other addition to the algorithm that is necessary, is that (see Equation V-A), where .

Figure 9 depicts the reduced set of transition links (compared to Figure 3).

The simulation of the fast scan method requires an extra parameter:

(26)

where is the scanning delay of all the cells except the last cells in the group. If an additional “tick” sound is included when using the system in audio mode, the scanning delay of the first cell will be seconds. The scanning delay of the last cell in a group is seconds. In general, we let , which should ensure a small probability of error, where is the scanning delay of the slow-scan method. To summarize:

The goal is to get an indication of the performance in the case where both the old and new interface know exactly what the noise distribution is.

Vi-a Modelling the Average Response Time

In the first set of simulations, the effect of false positives and the standard deviation of the click-timing response is ignored for illustrative purposes. As a first example, assume that the false negative probability , the false positive rate , and that , so that . In this case, the number of scans can be counted manually for both methods and compared to the results from our simulation model for verification purposes. The expected text entry rate (number of scans) from the simulation should converge to the same value with a probability close to 1.0.

By deferring the decision to the end of a group scan and trusting the noise model, the relationship between the scanning- and click-timing delay is decoupled at all cells except for the last one in the group. The last scanning delay in the group will have to be at least as long as the average click-timing delay (when ignoring the false- positives and the standard deviation of the click-timing).

R/C

1
R
1
a
a

2
R
1
t
a

3
C
1
a
a

4
C
1
_
a

5
C
1
t
a

6
C
1
a

7
C
1
a
a

8
C
1
_
a

9
C
1
t
a

10
C
1
a

11
R
1
a
a

12
R
1
t
a

13
C
1
a
a

14
C
1
_
a

15
C
1
t
a

16
C
1
a

17
C
1
a
a

18
C
1
_
a

19
C
1
t
a

20
C
1
a

21
R
2
a
_

22
R
2
t
_

23
C
2
a
_

24
C
2
_
_

25
C
2
t
_

26
C
2
_

27
C
2
a
_

28
C
2
_
_

29
C
2
t
_

30
C
2
_

31
R
2
a
_

32
R
2
t
_

33
C
2
a
_

34
C
2
_
_

35
C
2
t
_

36
C
2
_

37
C
2
a
_

38
C
2
_
_

39
C
2
t
_

40
C
2
_

Error (41)

Failure (42)

Correct (43)

(a)
(b) (c) (d) (e)
Fig. 9: (a) The state diagram for the fast-scan method. (b)-(e) Comparing the slow-scan (black) and fast-scan (red) methods. In all cases