Using Behavioral Interactions from a Mobile Device to Classify the Reader's Prior Familiarity and Goal Conditions

04/24/2020 ∙ by Sungjin Nam, et al. ∙ adobe 0

A student reads a textbook to learn a new topic; an attorney leafs through familiar legal documents. Each reader may have a different goal for, and prior knowledge of, their reading. A mobile context, which captures interaction behavior, can provide insights about these reading conditions. In this paper, we focus on understanding the different reading conditions of mobile readers, as such an understanding can facilitate the design of effective personalized features for supporting mobile reading. With this motivation in mind, we analyzed the reading behaviors of 285 Mechanical Turk participants who read articles on mobile devices with different familiarity and reading goal conditions. The data was collected non-invasively, only including behavioral interactions recorded from a mobile phone in a non-laboratory setting. Our findings suggest that features based on touch locations can be used to distinguish among familiarity conditions, while scroll-based features and reading time features can be used to differentiate between reading goal conditions. Using the collected data, we built a model that can predict the reading goal condition (67.5 model. Our model also predicted the familiarity level (56.2 accurately than the baseline. These findings can contribute to developing an evidence-based design of reading support features for mobile reading applications. Furthermore, our study methodology can be easily expanded to different real-world reading environments, leaving much potential for future investigations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Readers interact with documents in distinct ways based on various levels of expertise or prior familiarity with a document’s content. For example, a person may read to learn about a new topic or may reread a document to refresh their memory. The reading behaviors exhibited in these two scenarios can be quite different, even for the same reader. Readers may also differ in their reading goals. Some may be looking for specific information, while others read to develop a deeper understanding of the topic. These different reading goals can lead to different reading strategies and behaviors. For information-finding, skimming the document can be more efficient. On the other hand, obtaining an in-depth understanding may require more careful reading and integrating information across the text.

Understanding different types of reading is a prerequisite for building personalized reading tools that can help users be more effective at their tasks by adapting to the content and the readers’ goals. Prior studies have suggested support features for mobile reading, including varying font sizes by reading goal (Wang et al., 2018), identifying difficult sentences that have longer dwell times (Oh et al., 2014), and highlighting the text to restore the reader’s attention and improve comprehension and engagement (Mariakakis et al., 2015). However, these support features were limited to identifying the reader’s state (e.g., struggling, distracted) when interacting with the reading material. In this study, we suggest methods that can identify the reader’s initial condition going into the reading, specifically the reading goal and familiarity with the topic.

Though reading occurs in many contexts (e.g., books, desktop displays, E-readers), our study focuses on reading in the mobile context. Compared to other devices, such as desktop, people prefer mobile devices for particular types of reading, such as reading news articles (Mitchell and Weisel, 2017). Compared to paper-based media, reading on mobile devices has advantages in portability and easy access, but disadvantages in reduced readability and ease of navigation (Shimray et al., 2015). Recognizing the reader’s prior familiarity or reading goal conditions would be crucial for the mobile reading system to determine more effective personalized support features and improve the mobile reading experience.

In this paper, we introduce methods to capture two reading conditions: reader familiarity with the text content (familiar vs. unfamiliar) and reading goal (literal vs. contextual reading). We run our studies in a non-laboratory setting with hundreds of participants recruited through Amazon’s Mechanical Turk (MTurk) to approximate reading behaviors in-the-wild on a general population pool. Our statistical analyses of user interactions during reading on mobile devices show that the different reading conditions lead to differences in touch and scroll interaction patterns (see Fig. 1

). We also develop machine learning models that can automatically predict the reading conditions from user interactions achieving 56% accuracy in predicting familiarity level, and 68% accuracy in predicting reading goals. Our work opens up exciting possibilities for future reading tools to incorporate reading condition prediction to automatically adjust reading formats 

(Baudisch et al., 2004; Wallace et al., 2019; Chi et al., 2005) and personalize supporting materials (Kumbruck, 1998; DeStefano and LeFevre, 2007).


(a) Touch Features (b) Mobile Device (c) Scroll Features

Figure 1. From the mobile device (b), we collected interaction signals related to reading time, touch features (a), and scroll features (c). Touch features are represented as X/Y screen coordinates. Scroll features include scroll distance, speed, direction (regular downwards scrolling, or regression scrolling backward in the document), and time spent between scrolls. The colors in the graphs indicate example interactions captured from two participants (P001 and P002) that had different assigned reading goals (blue for C: contextual and orange for L: literal), but read the same (U: unfamiliar) article. The contextual reader (P001_UC) scrolled in bursts, while the literal reader (P002_UL) used continuous scrolling by touching the screen more consistently in the bottom right region.

Interaction signal types collected from mobile devices.

2. Related Works

2.1. Distinguishing Different Types of Readers

Readers may have different prior familiarity with the article topic they are to read about. In previous studies, familiarity with the article’s topic was considered a significant factor for the reader’s engagement level (Pedro et al., 2013) and comprehension level (McNamara and Kintsch, 1996). Other studies (DeStefano and LeFevre, 2007; Ozuru et al., 2009) have suggested that more coherent text can improve reading comprehension for readers with low prior knowledge levels, while readers with more prior knowledge understand less structured and less consistent documents better. The reader’s prior knowledge can also affect how the reader consumes and interacts with the article (O’Brien et al., 2016). In this paper, we chose the familiarity level as the first of our reading conditions.

One way to control a participant’s familiarity is to exploit their domain expertise. For example, studies have used the degree program of undergraduate students to determine their familiarity with an article (McNamara and Kintsch, 1996; Ozuru et al., 2009). However, this experimental setting relies on each participant’s background experience, which limits the articles and domains that can be used in the experiment. In contrast, in our study, we categorized familiarity into two conditions, familiar and unfamiliar, and manipulated the condition by controlling a participant’s prior exposure to the article’s contents. In this way, we could control the familiarity condition more easily with diverse article topics and participants.

Readers may also have different reading goals based on their circumstances. Studies of taxonomies in education (Bloom et al., 1956) and reading (Barrett, 1968) suggest there is a hierarchy of comprehension goals. From literal comprehension to synthesizing and applying new knowledge, different reading goals can significantly affect reading behavior. For example, if the reader’s goal is to extract relevant information within a limited time frame, the reader might skim the article until she is satisfied with the findings (Duggan and Payne, 2011). If the reader’s goal is father to build comprehensive knowledge about a topic, she may explore the article more exhaustively and try to mentally connect relevant parts together (Sullivan and Puntambekar, 2015). Understanding the reader’s goal would be necessary for a reading system to suggest different supporting features that can facilitate a more literal or contextual comprehension of the article. In this study, we simplified reading goals into two levels - literal: read to recognize or identify explicit information, and contextual: read to provoke deeper thinking, or to make a critical judgment. We controlled the reading goal condition by the instruction given to participants before they read an article.

Overall, for this paper, we use four different types of reading conditions, combining familiarity and reading goals into a 2x2 framework that can cover a wide variety of reading scenarios (Table 1). More details about the experiment settings can be found in Section 4.1.

Familiar Unfamiliar
Literal Goal A paralegal reads a contract while searching for specific legal phrases. A student reads a textbook chapter to answer fill-in-the-blank questions on a new topic.
Contextual Goal A researcher reads an academic paper to get new ideas related to a topic that she has expertise in. A government official reads a report on a new proposal in order to make budgeting decisions.
Table 1. Example reading scenarios for different familiarity and reading goal conditions.

2.2. Using Implicit Signals to Understand Reading Behaviors

Previous studies used various methods to record behavioral signals to understand the reader’s cognitive state. For instance, attention signals, such as eye movements, have often been considered a proxy of different cognitive states (Mak and Willems, 2019; León et al., 2019). Mak and Willems (2019) showed that different types of mental simulations during reading, such as simulating motor movements vs. perceptual recognition of the paragraph just read, are related to the duration of gaze. León et al. (2019) suggested that different types of instructions provided before reading (i.e., reading goal) can elicit different fixation and saccade patterns. These results show how low-level attention features are related to various cognitive states. However, these studies require the use of external sensors (e.g., eye trackers) in controlled lab settings to collect data from participants. In our study, we focus on behavioral features that can be collected from any mobile device and environmental setting.

Other studies in human-computer interaction show more convenient ways to predict different user states. Implicit signals, such as dwell time (Lagun and Lalmas, 2016) or mouse movements (Guo and Agichtein, 2008), were used to determine the reader’s attention location or intention in desktop web browsing. Reading on a mobile device is very different from reading physical media or reading on a desktop display because the screen size is smaller, and the interaction methods are different (Oh et al., 2014). Prior studies have used interactions unique to mobile devices, such as scrolling behavior and touch locations, to understand reading behaviors on mobile devices for inferring different levels of comprehension and engagement (Guo and Wang, 2018), predicting satisfaction with visual aspects of the document (Wang et al., 2018), and identifying important sentences from the article (Oh et al., 2014).

Mobile devices are making reading more accessible to different parts of the world, thereby increasing the total time people spend reading (West and Ei, 2014). Although previous studies have investigated different cognitive states of readers on mobile devices, including engagement (Guo and Wang, 2018) or satisfaction (Wang et al., 2018), there were no studies that examined the relationship between implicit behavioral interactions and the reader’s initial condition going into the reading. Understanding the reader’s prior condition, such as familiarity and reading goals, can be crucial for designing a reading system with personalized features. In our study, we suggest experimental methods to control different reading conditions and collect behavioral interaction features from a mobile device in a non-laboratory setting. We also investigate if we can predict the reader’s condition based on behavioral interaction signals that are collected non-invasively.

3. Research Questions

Motivated by previous studies, we put forth three research questions to address in this paper. These research questions investigate the relationship between behavioral interaction signals that are recorded from mobile devices across reading conditions (i.e., familiar or unfamiliar reading, and literal or contextual goals).

  • RQ1: Are there differences in behavioral signals recorded from mobile devices across reading conditions?

  • RQ2: Can we use behavioral signals to predict a person’s reading condition?

  • RQ3: Can we use nested experimental conditions and individual differences to boost prediction performance?

The first research question explores how each behavioral interaction variable that we measure during reading can be distinguished by the reading conditions. Previous studies showed that behavioral interactions, such as scroll patterns or touch locations, can be used to infer various cognitive states, including satisfaction (Wang et al., 2018) or engagement (O’Brien et al., 2016) with the content. In our study, we show how each behavioral signal recorded from a mobile phone can be used to differentiate the reading conditions that are related to the reader’s goal and familiarity with the topic.

The second and third research questions investigate if we can build computational models that can effectively predict different reading conditions. For the second research question, we develop a lasso logistic regression model with behavioral interaction predictors. The results from our feature importance analysis show which features are more important for predicting each reading condition type.

Based on a preliminary study (Appendix C), we observed different individual scrolling styles: frequent vs. occasional scrollers. Taking into account behavioral interactions at an individual level (i.e., incorporating individual reading styles) may therefore lead to extra insights. Motivated by this observation, the third research question focuses on developing a more sophisticated mixed-effect model that additionally includes variables for nested experimental conditions and individual differences in reading.

4. Reading Conditions and Behavioral Interactions

For our study, we defined different reading conditions to represent the reader’s prior familiarity and reading goals. We then extracted various behavioral features based on interaction signals to analyze and predict these reading conditions. This section describes how all these variables are operationalized and recorded by the application used in our experiments.

4.1. Reading Conditions

In this study, we used a 2x2 framework to capture the reader’s familiarity levels (familiar vs. unfamiliar) and reading goals (literal vs. contextual). Figure 2 shows how we manipulated the familiarity levels and reading goal conditions in our experiments. All participants completed a practice session at the beginning of the experiment, whereby they read a short reading passage, corresponding to an article summary. We used this practice session to manipulate the familiarity condition for a given study. Specifically, we either had the summary correspond to an article the participant would read in the main study (familiar condition) or be about a different article altogether (unfamiliar condition).


(a) Familiarity Condition (b) Reading Goal Condition

Figure 2. The familiarity condition (a) was controlled by exposing participants, during the practice phase, to a summary of the article they would read later. In the unfamiliar condition, a summary of an unrelated article would be presented instead. The reading goal condition (b) was controlled by changing the wording of the instructions and sample questions provided before the article.

Assigning familiarity and reading goal conditions

To control the reading goal condition, we presented different instructions before the article. The instructions informed participants that two comprehension questions would be presented after reading. Each question was multiple-choice with three options. For the literal reading condition, participants were told that the answers to the comprehension questions could be found “directly and explicitly from the article”. Also, the exact question phrasings were provided to encourage the literal search process during reading. On the other hand, for the contextual reading condition, participants were told that answering the questions would “involve combining an understanding of the article with your knowledge and intuitions”. Sample question phrasings were provided (as guidance), but were not identical to the questions asked at the end of the reading.

4.2. Experiment Application and Behavioral Interactions

We used a web-based application to show different articles and comprehension questions to participants. While a participant read articles inside the application on a mobile device, the application also collected data, including meta-data about the accessing device and behavioral signals based on scroll and touch interactions during reading. The data features collected were used in different ways, including for filtering participant data and for analyzing the results of the preliminary and main studies.

4.2.1. Browser Information

As the participant’s mobile phone loaded the web-page for reading, our application recorded the browser’s metadata, including the browser’s name, the screen size (width and height) of the device, and each article’s width and height as rendered on the device. As described later in Section 5.3, this data was useful to confirm the authenticity of data collected from mobile phones. It also included information about how each article was rendered on different screens, which was essential to track which paragraphs were displayed on the screen and compare recorded features from different devices.

4.2.2. Raw Interaction Signals

A benefit of using interaction signals as features is that they can be collected non-invasively from a mobile phone, without using external sensors like eye tracking devices. Our application recorded various signals during reading, including timestamps, the vertical location of the scroll bar, and X/Y touch locations. Data was recorded every time the vertical location of the scroll bar changed by more than 1% of the accessing device’s screen height. When a participant finished reading an article, data was saved as a JSON file in the application’s server.

4.2.3. Extracting Predictive Features

Based on the raw interaction signals recorded, we extracted features that correspond to more meaningful information about reading behavior. Specifically, we extracted four types of behavioral interaction features: Reading Time, Scroll, Regression Scroll, and Touch Location. All extracted features were log-transformed and standardized for the analysis. The full list of features are listed in Table 2.

Reading Time

features record how much time a participant spent reading an article. For the analysis, we calculated the total time spent on the article and (average and standard deviation of) time spent per paragraph, except for the instructions. All reading time features were normalized by the heights of article paragraphs when rendered on each participant’s device.

Scroll features include frequency, travel distance, and speed of each scroll sequence. We considered a set of vertical locations of the scroll bar as a single scroll sequence if they share the same touch location that initiated the scrolling. Scroll information can be interpreted as a proxy for the participant’s attention, as it provides information about the participant’s view port per unit time, as the reading progresses (Oh et al., 2014).

Regression Scroll features are similar to Scroll, but only consider scroll sequences in the opposite direction to the text flow. This corresponds to points during reading where a participant revisits earlier parts of the article.

Touch Location features measure the relative distribution of X/Y locations for touch inputs. We arbitrarily divided the device’s screen into three horizontal (left, mid, and right) and three vertical (low, mid, and high) locations, and counted the frequencies of touch interactions in each location block. To disregard the differences between participants’ dominant hands, we grouped off-center (left and right) inputs together and compared them in frequency to central (mid-horizontal) touch locations. Frequency information per location was recorded as a fraction of the total number of touch interactions.

Feature Type Feature Name F vs. U C vs. L
Reading Time Total article read time C ¡ L
Read time per paragraph (Avg., Std.) F ¡ U (Std.) interaction (Avg.)
Scroll Scroll frequency
Time between scroll sequences (Avg., Std.)
Travel distance per scroll sequence (Avg., Std.) C ¡ L (Avg.), (Std.)
Scrolling speed per sequence (Avg., Std.) C ¡ L (Avg.), (Std.)
Regression Scroll Scroll frequency C ¡ L , interaction
Travel distance per scroll sequence (Avg., Std.) C ¡ L (Avg.)
Scrolling speed per sequence (Avg., Std.) C ¡ L (Avg.), (Std.)
Revisiting the instruction paragraph C ¡ L
Touch Location X-Axis: Ratio of touch initialization from the left or right sides
X-Axis: Std. of left/mid/right touch ratios
Y-Axis: Ratio of touch initialization from the low screen F ¿ U
Y-Axis: Ratio of touch initialization from the mid screen F ¡ U
Y-Axis: Ratio of touch initialization from the high screen
Y-Axis: Std. of left/mid/high touch ratios F ¿ U
Table 2. The list of extracted behavioral interaction features. Based on a two-way ANOVA analysis, Reading Time and Scroll features were significantly different across reading goal conditions (contextual vs. literal reading). Touch features were significantly different across familiarity level conditions (familiar vs. unfamiliar). (: , : , : , : )

5. Main Study Design

Based on our initial investigations, we sought to launch large-scale crowdsourcing studies to investigate how participant reading behaviors, measured by reading time, scroll, and touch features, differ by familiarity and reading goal conditions.

5.1. Pre-screening Questionnaire

For the main study, we recruited participants from Amazon’s Mechanical Turk (MTurk). First, we developed a pre-screening questionnaire that would serve as a qualification task for the main study. The questionnaire collected information about a participant’s native language, demographic information (e.g., age group, education level), and daily reading related behaviors (e.g., frequency of reading for work and leisure, types of reading done, reading device usage). Participants’ geographic locations were limited to native English speaking countries. We collected 1,129 pre-screening responses. All participants were paid $0.30 for their responses. More details of the questionnaire responses can be found in Appendix E.

5.2. Experiment flow

We conducted two experiments (Experiment 1 and Experiment 2). Both experiments shared the same experimental settings, while utilizing different articles and comprehension questions. In total, we collected data on four different articles. An English reading specialist curated and shortened articles from project Gutenberg111https://www.gutenberg.org and composed comprehension questions, norming them at an eighth-grade level. We used four history articles for our experiments: Experiment 1: Manhattan in the Year 1609 (533 words), The Beginning of American Railroads (456 words); Experiment 2: Water Vessels in the Pueblo Region (453 words), Navaho Houses (465 words).

An experiment consisted of three parts: an initial survey, practice session, and test session. The survey asked questions about the device used for participation, the time of participation, and to briefly describe the current environment (time of day, type of light, and name of the mobile device). Participants were instructed not to use other plug-ins for reading or the browser’s back button during the task to prevent navigation between the article and comprehension questions (the application also was explicitly programmed to prevent this with a warning). Participants were told to take as much time as needed for reading.

After the survey, participants were guided to the practice session. In the practice session, participants became familiar with the task by reading a summary paragraph (about 65 words long) and answering comprehension questions. The practice session provided two sets of summary paragraphs and comprehension questions. In the test session, participants were also presented with two sets of articles and comprehension questions. The test articles were about 500 words long. At the end of the study, participants received a code to enter on the original study page to receive compensation. The study took approximately 11 minutes. Participants received $3.00 USD, and a bonus of $0.50 if they successfully followed the instructions.

5.3. Participants and Data Filtering

Based on the pre-screening questionnaire, we selected MTurk workers whose native language is English and who reside in English-speaking countries. We recruited workers with HIT rates above 98%, and more than 1000 successful HIT records. Participants who fit these criteria and accepted our HIT were provided with a URL to open on a mobile device.

Since our study involved a relatively complex reading task on a mobile device, we carefully selected data for analysis (Appendix B). Initially, 372 MTurk workers (Experiment 1: 181, Experiment 2: 191) began our task, and 342 of them (Experiment 1: 170, Experiment 2: 172) completed it. Among those who completed the task, we filtered out participants who did not properly follow the instructions, as detailed below.

The first filtering criterion was reading time (R. Time). We filtered out participants who spent too little (less than 30 seconds) or too much (more than 300 seconds) time on any article. Traditional reading speed studies suggest that the average reading speed for native English speakers is around 250 words per minute (Fry, 1963; Nation, 2009). As our experimental conditions may be a little different than the previous studies, we set more generous ranges to include reading speeds that are two times faster or slower than regular reading.

The second criterion was the accessing device’s name (Dev. Name). We wanted to keep this study strictly on mobile. Based on the browser metadata that we collected (Section 4.2.1), we filtered out participants’ data if the browser agent information did not contain the string “iPhone” or “Android”.

The third criterion was the window size ratio (Wndw. Size). We filtered out data if the article was read in landscape orientation, or with other irregular screen size settings.

The fourth criterion was if behavioral interactions were successfully recorded (Input Rec.). Missing records happened if the participant’s device was too old to be compatible with our experiment application, or the participant read the article without touch interactions (e.g., using a desktop browser with a mobile browser profile).

As a result, we ended up using 285 participants’ data (Experiment 1: 141 participants, 77.90% of recruited participants; Experiment 2: 144 participants, 75.39%) for the analysis. Note that we did not use the comprehension questions as a filtering criterion (more details can be found in Appendix B.1).

6. Main Study Results

6.1. Post-Experiment Survey

For a subset of participants (52 participants from Experiment 1, 60 from Experiment 2), we asked some follow-up questions right after the experiment. The questionnaire asked participants to reflect on the different reading conditions that they were assigned to, as well as whether and how they read differently per condition.

The first two questions asked about the summaries provided in the practice session. Of the participants who read at least one familiar condition article (read a summary in the practice session that corresponded to an article in the test session), many reported (74%) that the summary gave them a quick idea of the longer article, made the main article easier to skim, and helped them remember the details better. These results indicate that participants made use of the fact that the summary familiarized them with the topic of the article.

The next two survey questions asked about the reading goals that were assigned via the instruction wording. For these questions, more participants reported (58%) that the instruction did not affect their reading (i.e., that they read normally), and did not use any special strategies given the relatively short article lengths. These results might imply that our instructions may not have been effective in guiding participants towards a particular goal, or it could indicate that introspective self-assessment of reading behavior is difficult. The last survey question asked about the perceived difficulty of the comprehension questions. Most participants reported that the difficulty was OK (89%), and only a few participants found the questions too easy (11%). No participants reported that the questions were too hard.

6.2. Comparing Observed Features between Reading Conditions

Recall RQ1: Are there differences in behavioral signals recorded from mobile devices across reading conditions?

To answer this research question, we conducted a two-way ANOVA analysis comparing which behavioral features (including interactions) differ by either familiarity level or reading goal (Table 2). For the familiarity level (F vs. U), we found that participants reading familiar articles tended to touch the lower part of the screen more (), and were less consistent (i.e., higher standard deviation) in their vertical touch locations (). We did not find any significant interactions between the familiarity and reading goal conditions.

Across reading goal conditions (C vs. L), contextual reading corresponded to more linear reading behaviors. With the contextual reading goal, scrolling distance was more consistent (i.e., lower standard deviation; ), and scrolling speed was slower () and more consistent () than for the literal reading goal. The contextual reading goal also correlated to less reading time (), fewer regression scrolls (for revisiting previous paragraphs () and instructions ()), longer regression scrolling distance (), and slower () and more consistent regression scrolling speed (). These results can be interpreted to mean that the contextual reading goal elicited more linear navigation during reading. Regression scrolls were less frequent (but longer if needed), eventually leading to a shorter overall reading time.

We also conducted a 1-way mixed ANOVA analysis for each familiarity condition and reading goal separately. This analysis includes repeated measures in the analysis, such as the fixed-effect conditions of each participant, to more correctly reflect our study design. However, we found no features to be significantly different across conditions in this analysis (Appendix D).

In summary, we found that touch location-based features can effectively distinguish between the familiarity conditions. Features based on scrolling and reading time were also effective for differentiating the reading goal conditions.

6.3. Predicting Reading Conditions

In the previous section we identified differences in behavioral interactions across reading conditions. To answer the second and the third research questions, we developed prediction models to investigate if these behavioral interactions can be used together to predict each article’s familiarity level or reading goal conditions.

Recall RQ2: Can we use behavioral signals to predict a person’s reading condition?

6.3.1. Non-Mixed-Effects Modeling

We trained Logistic Regression models with an L1 penalizing term (Lasso-LR) to predict either the familiarity level or the reading goal individually. The penalizing term tries to prevent the model from over-fitting by encouraging multiple feature coefficients to be zero. We selected the L1 penalizing term of each Lasso-LR model through the nested cross-validation process. First, we divided the data into ten outer-folds. Second, from each outer fold, we randomly divided the training set into ten inner-folds, and selected the L1 penalizing term that provided better prediction accuracy scores across ten validation sets. Finally, we selected the L1 term that performed the best across the ten outer-folds. After we decided on the L1 penalizing term, we conducted the regular 10-fold cross-validation for the same dataset to get the prediction results on each fold’s test set.

Baseline Lasso-LR GLMER
F vs. U Accuracy 0.511 (0.472, 0.549) 0.562 (0.505, 0.618) 0.556 (0.530, 0.582)
Recall (N+P) - 0.560 (0.505, 0.616) 0.476 (0.431, 0.520)
Precision (N+P) - 0.561 (0.503, 0.618) 0.462 (0.358, 0.567)
F1 (N+P) - 0.553 (0.494, 0.611) 0.390 (0.348, 0.432)
C vs. L Accuracy 0.519 (0.501, 0.538) 0.675 (0.629, 0.721) 0.672 (0.646, 0.699)
Recall (N+P) - 0.673 (0.625, 0.721) 0.663 (0.633, 0.694)
Precision (N+P) - 0.680 (0.631, 0.728) 0.709 (0.661, 0.757)
F1 (N+P) - 0.670 (0.622, 0.718) 0.651 (0.620, 0.681)
Table 3.

Average performance of predicting the familiarity level (F vs. U) and reading goal (C vs. L) conditions with Lasso-LR and GLMER models. Recall, precision, and F1 scores are reported as the average of negative (N) and positive (P) labels. Also, these scores for the baseline models are omitted since this model always predicted the same label. The numbers in parentheses are 95% confidence intervals. The best prediction score, if significantly better than the baseline (or another model), is marked in bold. Scores only marginally better than the baseline (or another model) are italicized.

Table 3 shows the average prediction performance of the baseline and the Lasso-LR models across 10-folds. The baseline models represent the prediction performance without using any behavioral interaction features (i.e., predicting the majority label). For predicting the familiarity level (F vs. U), our Lasso-LR model performed marginally better than the baseline accuracy. For predicting the reading goal (C vs. L), the Lasso-LR model performed significantly better than the baseline accuracy.

Lasso-LR Reading Time Scroll Regression Scroll Touch Location
F vs. U Accuracy - - - -0.068 (-0.116, -0.020)
Recall (N+P) - - - -0.073 (-0.117, -0.030)
Prec. (N+P) - - - -0.246 (-0.353, -0.140)
F1 (N+P) - - - -0.182 (-0.262, -0.101)
C vs. L Accuracy -0.109 (-0.154, -0.064) -0.029 (-0.076, 0.017) -0.000 (-0.035, 0.035) 0.002 (-0.036, 0.040)
Recall (N+P) -0.111 (-0.157, -0.066) -0.028 (-0.077, 0.020) -0.000 (-0.036, 0.035) 0.001 (-0.036, 0.038)
Prec. (N+P) -0.111 (-0.160, -0.062) -0.024 (-0.071, 0.023) -0.000 (-0.035, 0.035) 0.003 (-0.037, 0.044)
F1 (N+P) -0.119 (-0.165, -0.074) -0.031 (-0.082, 0.020) -0.000 (-0.036, 0.035) 0.001 (-0.036, 0.038)
GLMER Reading Time Scroll Regression Scroll Touch Location
F vs. U Accuracy -0.007 (-0.031, 0.016) -0.018 (-0.031, -0.004) 0.011 (-0.014, 0.035) -0.011 (-0.035, 0.013)
Recall (N+P) 0.004 (-0.022, 0.030) -0.002 (-0.028, 0.024) -0.015 (-0.057, 0.026) 0.020 (-0.020, 0.059)
Prec. (N+P) -0.008 (-0.063, 0.047) -0.012 (-0.080, 0.056) -0.064 (-0.142, 0.014) 0.007 (-0.091, 0.105)
F1 (N+P) -0.009 (-0.038, 0.020) -0.012 (-0.044, 0.019) -0.028 (-0.070, 0.014) -0.006 (-0.044, 0.032)
C vs. L Accuracy -0.097 (-0.136, -0.058) -0.051 (-0.086, -0.017) -0.018 (-0.047, 0.011) 0.002 (-0.028, 0.031)
Recall (N+P) -0.105 (-0.146, -0.064) -0.052 (-0.088, -0.017) -0.018 (-0.047, 0.011) 0.002 (-0.028, 0.032)
Prec. (N+P) -0.102 (-0.167, -0.037) -0.045 (-0.104, 0.015) -0.013 (-0.051, 0.024) 0.001 (-0.035, 0.036)
F1 (N+P) -0.137 (-0.179, -0.094) -0.067 (-0.112, -0.022) -0.023 (-0.053, 0.007) 0.003 (-0.029, 0.034)
Table 4.

Lasso-LR and GLMER model ablation experiments: average differences in prediction performance when each feature is removed from the corresponding Lasso-LR model. The smaller the value, the more the feature contributed to the full model. Recall, precision, and F1 scores are reported as the average of negative (N) and positive (P) labels. The numbers in parentheses are 95% confidence intervals. The contributions that are significantly different from 0 are marked in bold. If there are no significant contributions found, the best marginal contributions per score type are italicized. Some scores for the Lasso-LR model are omitted if coefficients are estimated as zero with the selected penalized term.

Table 4 includes a more detailed analysis of each feature’s contribution to the corresponding Lasso-LR model. For predicting the familiarity level (F vs. U), touch location features were more important than other features in contributing to the Lasso-LR model’s accuracy, precision, and F1 scores. These results match the ANOVA results from Section 6.2, showing that touch location features are closely related to a reader’s familiarity levels with the reading topic.

For predicting the reading goals (C vs. L), reading time was the most important feature type, significantly contributing to the model’s accuracy, precision, and F1 scores. Scroll-based features also marginally contributed to the model’s recall score. The results for predicting the reading goal conditions were slightly different from the ANOVA results from Section 6.2, which showed that many scroll-based features were significantly different between the contextual and literal reading goal conditions. Touch location features were consistently less important features in both ANOVA and Lasso-LR models.

Recall RQ3: Can we use nested experimental conditions and individual differences to boost the prediction performance?

6.3.2. Mixed-Effects Modeling

From the pilot study, we observed that individual readers might have distinctive scrolling styles when reading on a mobile device. For example, some engage in frequent scrolling, while others only occasionally scroll during reading (Figure 1, Appendix C). Motivated by these observations, we developed a mixed-effects logistic regression model (GLMER) to predict reading conditions and address the third research question. The GLMER models are expected to capture biases that may be caused by these repeated measures from the experimental settings, and to account for individual scrolling styles when using scroll-based features for prediction. We used the lme4 package (Bates et al., 2015) to build the GLMER models222We set the parameter nAGQ=0 for GLMER models to help with convergence..

Figure 3 shows how the GLMER model interpreted behavioral signals of readers with different scrolling styles, even if they have the same reading goal condition. We selected the features from each feature type with above marginal contributions for predicting the reading goal condition (Table 4). For example, the plot illustrates that if you are a frequent scroller (i.e., generate more scroll sequences than the median), the literal reading goal may evoke more consistent reading time per paragraph, a longer and more inconsistent regression scroll distance, and faster but inconsistent scrolling speed. However, if you are an infrequent scroller, the same literal reading goal may lead to opposite behavioral patterns.

Figure 3. Each plot shows how the random slope coefficients for the behavioral features from the GLMER model are negatively (red) or positively (blue) related to the literal

reading condition, given by the participant’s reading style (frequent or infrequent scroller). The numbers on the x-axis are log odds.

Interaction signal types collected from mobile devices.

For the GLMER model, scroll and regression scroll features are the more important features for predicting the familiarity levels. For predicting the reading goal, reading time and scroll features were the most important features (Table 4). However, Table 3

shows that the GLMER model did not improve upon the Lasso-LR model in predicting the familiarity and reading goal conditions. Although the model can capture various behavioral patterns by individual scrolling styles, further investigation on how behavioral features interact within the different reading conditions, and more thorough feature selection process for better model convergence may be required to start to see prediction performance benefits.

7. Discussion

From our model analysis results, we can identify areas for future study. First, based on the study results, we can suggest a few personalized reading support features that may help mobile readers. From the ANOVA analysis (Section 6.2), we found that features based on scroll behaviors are closely related to the reading goal conditions. Providing highlights or summaries along with an article can reduce regression scrolling and non-linear navigation behaviors in mobile reading. We hypothesize that by having an overview of an article, participants do not need to go back and forth in the text as much to integrate the content. Considering touch locations and the limited screen size of a mobile phone, we noted that when touching the upper portions of the screen, a participant’s hand might obscure more of the visible text. Reminding readers to keep a larger viewing area can help readers, especially when reading less familiar articles on mobile devices, by providing more context to readers at a time.

Second, more differentiated experimental settings may expose larger differences in behavioral interactions across reading conditions. We used different instructions to control the reading goal conditions. However, based on the post-survey results, many participants did not report reading the article differently depending on the instructions. One factor can be the short length of our articles (about 500 words long). Some participants noted that they did not read the articles differently based on instructions because the articles were not too long (Section 6.1). Similarly, the relatively low model performance for predicting familiarity conditions might be related to the nature of the articles. The articles we used in our experiments were explicitly designed to be short and to cover general topics, which means participants may not have required specific prior knowledge to understand them. In future studies, we can modify the experimental setting, for instance limiting the reading time or using more cognitively-demanding follow-up tasks, to see whether reading behaviors can be better differentiated by the reading condition under more realistic constraints.

Third, our experimental method can be expanded to different reading scenarios. For this study, we used historical articles as stimuli. In the future, we are interested in investigating reading behaviors on specialized articles, such as scientific reports or legal contracts, under more goal-driven, time-constrained, and other professional work situations. On the other hand, leisurely reading can evoke very different reading behaviors because the reader self-selects the material (e.g., fiction, blog articles, social network feeds) and may not have immediate reading comprehension goals in mind. Also, non-linear reading across multiple documents is a popular use case for information gathering, research, and literature search. It would be interesting to investigate how to facilitate such reading tasks in a mobile environment.

Lastly, there are some limitations of this study. First, we used a 2x2 framework to model possible reading conditions, but reading behavior outside the experiment can be affected by many other factors. For example, engagement and motivation with the specific topic (O’Brien et al., 2016) or environmental (e.g., distractions) and physiological factors (e.g., sleepiness, hunger) (Schunk and Mullen, 2012; Skinner and Pitzer, 2012)

can affect comprehension and learning levels. Identifying more comprehensive factors relating to reading conditions would be a promising area for future work. Second, our models for predicting the familiarity condition did not perform significantly better than the baseline. More exogenous predictive features might yield better prediction results. Also, we only tried a limited number of machine-learning algorithms for prediction. The results from other ML models, such as tree-based models, SVMs, or deep neural networks, might be promising candidates.

This paper provides an initial step to understand how different behavioral features are related to various prior conditions of mobile reading. Future applications would gain the largest benefits from models that can make early predictions based on a small set of initial interactions, in order to efficiently customize the reading experience for the user, and help the user more quickly accomplish their reading goals.

8. Conclusion

As more people read on mobile devices, accurately predicting different reading conditions for mobile readers will be important for designing personalized support tools. In this paper, we presented user studies that used behavioral interaction signals to understand different reading conditions for mobile reading. We analyzed behavioral interaction data, collected from 285 MTurk participants non-invasively, and predicted different familiarity and reading goal conditions. Our findings suggest that features based on touch locations can be useful for distinguishing a user’s familiarity with the topic. Scroll-based features and reading time are also helpful for distinguishing the reading goal (whether participants read content contextually or literally). We designed computational models that can predict the familiarity level conditions (56.2%) and reading goal conditions (67.5%) more accurately than a baseline model. The findings from our study can contribute to future evidence-based designs of reading support features for mobile reading applications.

Acknowledgements.

References

  • (1)
  • Barrett (1968) Thomas C Barrett. 1968. Taxonomy of Cognitive and Affective Dimensions of Reading Comprehension. What is Reading (1968).
  • Bates et al. (2015) Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67, 1 (2015), 1–48.
  • Baudisch et al. (2004) Patrick Baudisch, Bongshin Lee, and Libby Hanna. 2004. Fishnet, a fisheye web browser with search term popouts: a comparative evaluation with overview and linear view. In Proceedings of the working conference on Advanced visual interfaces, AVI 2004, Gallipoli, Italy, May 25-28, 2004. 133–140.
  • Bloom et al. (1956) Benjamin S Bloom et al. 1956. Taxonomy of Educational Objectives: Handbook 1. Cognitive Domain. New York: McKay (1956), 20–24.
  • Chi et al. (2005) Ed Huai-hsin Chi, Lichan Hong, Michelle Gumbrecht, and Stuart K. Card. 2005. ScentHighlights: highlighting conceptually-related sentences during reading. In Proceedings of the 10th International Conference on Intelligent User Interfaces, IUI 2005, San Diego, California, USA, January 10-13, 2005. 272–274.
  • DeStefano and LeFevre (2007) Diana DeStefano and Jo-Anne LeFevre. 2007. Cognitive load in hypertext reading: A review. Computers in Human Behavior 23, 3 (2007), 1616–1641. https://doi.org/10.1016/j.chb.2005.08.012
  • Duggan and Payne (2011) Geoffrey B. Duggan and Stephen J. Payne. 2011. Skim reading by satisficing: evidence from eye tracking. In Proceedings of the International Conference on Human Factors in Computing Systems, CHI 2011, Vancouver, BC, Canada, May 7-12, 2011. 1141–1150.
  • Fry (1963) Edward Fry. 1963. Teaching faster reading: a manual. University Press.
  • Guo and Agichtein (2008) Qi Guo and Eugene Agichtein. 2008. Exploring mouse movements for inferring query intent. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20-24, 2008. 707–708.
  • Guo and Wang (2018) Wei Guo and Jingtao Wang. 2018. Understanding Mobile Reading via Camera Based Gaze Tracking and Kinematic Touch Modeling. In Proceedings of the 2018 on International Conference on Multimodal Interaction, ICMI 2018, Boulder, CO, USA, October 16-20, 2018. 288–297.
  • Kumbruck (1998) Christel Kumbruck. 1998. Hypertext reading: novice vs. expert reading. Journal of Research in Reading 21, 2 (1998), 160–172.
  • Lagun and Lalmas (2016) Dmitry Lagun and Mounia Lalmas. 2016. Understanding User Attention and Engagement in Online News Reading. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, February 22-25, 2016. 113–122.
  • León et al. (2019) José A. León, José David Moreno, Inmaculada Escudero, Ricardo Olmos, Marcos Ruiz, and Robert F. Lorch Jr. 2019. Specific relevance instructions promote selective reading strategies: evidences from eye tracking and oral summaries. Journal of Research in Reading 42, 2 (2019), 432–453. https://doi.org/10.1111/1467-9817.12276
  • Mak and Willems (2019) Marloes Mak and Roel M Willems. 2019. Mental simulation during literary reading: Individual differences revealed with eye-tracking. Language, Cognition and Neuroscience 34, 4 (2019), 511–535.
  • Mariakakis et al. (2015) Alexander Mariakakis, Mayank Goel, Md Tanvir Islam Aumi, Shwetak N. Patel, and Jacob O. Wobbrock. 2015. SwitchBack: Using Focus and Saccade Tracking to Guide Users’ Attention for Mobile Task Resumption. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI 2015, Seoul, Republic of Korea, April 18-23, 2015. 2953–2962.
  • McNamara and Kintsch (1996) Danielle S McNamara and Walter Kintsch. 1996. Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes 22, 3 (1996), 247–288.
  • Mitchell and Weisel (2017) Amy Mitchell and Rachel Weisel. 2017. Americans’ Attitudes About the News Media Deeply Divided Along Partisan Lines. Technical Report. Pew Research Center.
  • Nation (2009) Paul Nation. 2009. Reading Faster. International Journal of English Studies 9, 2 (2009).
  • O’Brien et al. (2016) Heather L. O’Brien, Luanne Freund, and Richard W. Kopak. 2016. Investigating the Role of User Engagement in Digital Reading Environments. In Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval, CHIIR 2016, Carrboro, North Carolina, USA, March 13-17, 2016. 71–80.
  • Oh et al. (2014) JongHwan Oh, SungJin Nam, and Joonhwan Lee. 2014. Generating highlights automatically from text-reading behaviors on mobile devices. In CHI Conference on Human Factors in Computing Systems, CHI’14, Toronto, ON, Canada - April 26 - May 01, 2014, Extended Abstracts. 2317–2322.
  • Ozuru et al. (2009) Yasuhiro Ozuru, Kyle Dempsey, and Danielle S McNamara. 2009. Prior knowledge, reading skill, and text cohesion in the comprehension of science texts. Learning and instruction 19, 3 (2009), 228–242.
  • Pedro et al. (2013) Maria Ofelia Clarissa Z. San Pedro, Ryan Shaun Joazeiro de Baker, Sujith M. Gowda, and Neil T. Heffernan. 2013. Towards an Understanding of Affect and Knowledge from Student Interaction with an Intelligent Tutoring System. In Artificial Intelligence in Education - 16th International Conference, AIED 2013, Memphis, TN, USA, July 9-13, 2013. Proceedings. 41–50.
  • Schunk and Mullen (2012) Dale H Schunk and Carol A Mullen. 2012. Self-Efficacy as an Engaged Learner. In Handbook of Research on Student Engagement. Springer, 219–235.
  • Schwartz et al. (2019) Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. 2019. Green AI. arXiv preprint arXiv:1907.10597 (2019).
  • Shimray et al. (2015) Somipam R Shimray, Chennupati Keerti, and Chennupati K Ramaiah. 2015. An overview of mobile reading habits. DESIDOC Journal of Library & Information Technology 35, 5 (2015).
  • Simsarian (2019) Kristian T. Simsarian. 2019. Design education can change the world. Interactions 26, 2 (2019), 36–43. https://doi.org/10.1145/3305362
  • Skinner and Pitzer (2012) Ellen A Skinner and Jennifer R Pitzer. 2012. Developmental Dynamics of Student Engagement, Coping, and Everyday Resilience. In Handbook of Research on Student Engagement. Springer, 21–44.
  • Sullivan and Puntambekar (2015) Sarah A. Sullivan and Sadhana Puntambekar. 2015. Learning with digital texts: Exploring the impact of prior domain knowledge and reading comprehension ability on navigation and learning outcomes. Computers in Human Behavior 50 (2015), 299–313. https://doi.org/10.1016/j.chb.2015.04.016
  • Vajjala and Lucic (2019) Sowmya Vajjala and Ivana Lucic. 2019. On Understanding the Relation between Expert Annotations of Text Readability and Target Reader Comprehension. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, BEA@ACL 2019, Florence, Italy, August 2, 2019. 349–359.
  • Wallace et al. (2019) Shaun Wallace, Rick Treitman, Nirmal Kumawat, Kathleen Arpin, Jeff Huang, Ben Sawyer, and Zoya Bylinskii. 2019. Fonts for Interlude Reading: Improving Readability in the Digital Age. https://shaunwallace.org/readability/. (Accessed on 09/20/2019).
  • Wang et al. (2018) Junxiang Wang, Jianwei Yin, Shuiguang Deng, Ying Li, Calton Pu, Yan Tang, and Zhiling Luo. 2018. Evaluating User Satisfaction with Typography Designs via Mining Touch Interaction Data in Mobile Reading. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21-26, 2018. 113.
  • West and Ei (2014) Mark West and Chew Han Ei. 2014. Reading in the mobile era: A study of mobile reading in developing countries. UNESCO.

Appendix A Assigning Experimental Conditions

To keep the task of a reasonable length, each participant only read two articles in the experiment session. We could not test all four possible conditions of familiarity levels and reading goals with a single participant. Instead, we randomly chose each participant to be tested on one of the dimensions (manipulating either familiarity or reading goal) while keeping the other dimension fixed.

Fixed Cond. Within Cond. (practice) Within Cond. (experiment)
Table 5. Example assignments of reading conditions (Experiment 1–using articles 1 and 2). The participant was assigned to one of the conditions in a round-robin way. A single fixed condition was shared during the entire experiment, while within conditions alternated between the articles in the session. The order of articles was counterbalanced. The same design was applied to Experiment 2, except using articles 3 and 4 for the experiment session. (Contextual and literal reading goals are noted as and . Familiar and unfamiliar conditions are noted as and . Numbers represent the article number).

To reduce the number of possible combinations of articles, we preassigned the summary-article pairs for the unfamiliar condition. For example, for all participants who read Article 1 in the experiment session with the unfamiliar condition, the summary used in the practice session corresponded to Article 3. Similarly, Article 2 was paired with Article 4. When a participant began our study and submitted the initial survey questions, one of the conditions from Table 5 was assigned in a round-robin way. For example, if the participant was assigned the familiar condition (F) as a fixed condition, the reading goal conditions (literal (L) or contextual (C)) were selected for each article as a within condition. The order of articles was counterbalanced to reduce any ordering effects.

Appendix B Filtering Participants

We ended up using 285 participants’ data (Experiment 1: 141 participants, 77.90% of the recruited participants; Experiment 2: 144 participants, 75.39%) for the analysis. Table 6

breaks down the number of participants that entered and completed the study, as well as the final number that were used for analysis. We list the number of participants filtered out due to various criteria (reading time, device name, mobile window size, and whether behavioral interactions were successfully recorded; See Sec. 5.3 in the paper). Rows of the table further break these numbers up by experimental condition to validate that the filtering did not significantly skew the number of participants left for analysis in each condition.

Fixed Cond. No. Subjects No. Filtered
Entered Completed Analyzed R. Time Dev. Name Wndw. Size Input Rec.
Exp 1 Familiarity F 44 43 40 2 2 3 3
U 44 42 33 4 4 5 3
Goal C 46 44 36 4 4 4 2
L 47 41 32 5 2 4 4
(Total) 181 170 141 15 12 16 12
Exp 2 Familiarity F 47 45 37 5 2 3 1
U 49 42 38 3 3 3 3
Goal C 48 45 38 7 5 1 1
L 47 40 31 2 1 6 6
(Total) 191 171 144 17 11 13 11
Table 6. The number of recruited and filtered out participants in Experiment 1 and Experiment 2. Four filtering criteria under No. Filtered are not mutually exclusive with each other.

b.1. Comprehension Question Results

An English reading specialist carefully normed the comprehension questions used for our studies. Based on the post-experiment survey responses, participants did not find the comprehension questions too difficult. Despite this reporting, we noticed significant differences between the quality of responses across reading conditions. Across both familiarity conditions, participants had significantly lower response accuracy scores for the literal reading questions. Specifically, across the familiar articles, the average response accuracy for the contextual reading was 84.18%, compared to 44.08% for the literal reading questions (Kolmogorov-Smirnov test ). Similar results were also found for the unfamiliar article questions (contextual: 83.72% vs. literal: 43.29%, Kolmogorov-Smirnov test ). These results indicate that the difficulty of the multiple-choice questions was not equal across the familiarity conditions. Regardless, since the level of comprehension was not the scope of our study, we did not include the comprehension question results for further analysis, nor use them for data filtering (to avoid skewing the final participant numbers per condition).

Appendix C Initial Investigations

c.1. Pilot Studies

We ran two pilot studies to explore the possibility of using implicit interaction signals for distinguishing different types of reading. The goal of the first study was to investigate whether eye gaze and scroll behaviors could be used as features to differentiate types of reading. We recruited two participants in our lab and asked them to read articles and answer comprehension questions on a mobile phone (iPhone XR). Both participants were user-experience researchers. We controlled for familiarity by having them read a design article (Simsarian, 2019) (familiar condition) and a machine-learning article (Schwartz et al., 2019) (unfamiliar condition). Both articles were about 2000 words. During the reading, we collected participant gaze data (using a Pupil Core eye-tracker333https://pupil-labs.com/products/core/) and behavioral interaction data (using a preliminary version of our web interface).

In the second pilot study, we tested our experimental web application in a non-laboratory setting. We recruited 33 participants from MTurk. All participants were assigned to one of the four conditions (combinations of familiar/unfamiliar and contextual/literal) and read two articles. We used four shorter news articles from The Guardian (about 500 words each) (Vajjala and Lucic, 2019). Participants spent an average of 12 minutes for the entire task and were paid $1.50. For the second pilot study and the main study, we used jQuery 3.4.1, Flask 1.0.3, and SQLite to build the application for the experiment. The application was hosted on Amazon Web Service.

c.2. Preliminary Observations

From the first pilot study, we found that when reading articles on a familiar topic, participants read less linearly (or exhaustively). Compared to the unfamiliar condition, reading in the familiar condition showed fewer fixations and more saccades ( test of independence, ), and faster average saccade movement speeds (Kolmogorov-Smirnov test ). The scrolling data from one of the participants also showed less linear reading behavior in the familiar condition, backed by the faster scrolling speed and larger scrolling distance (Kolmogorov-Smirnov test ). These initial results showed promise for using implicit signals to distinguish among different reading conditions.

From the second pilot study, we found differences between touch locations across literal and contextual reading conditions, in that the former tended to initiate the scrolling from mid-horizontal and lower part of the screen (Kolmogorov-Smirnov test ). This result confirmed that touch locations could be promising features for distinguishing different reading goal conditions. We also noticed that different participants exhibit different scrolling styles. For example, some participants made very frequent scrolls. Other participants made occasional scrolls by touching the screen less frequently and pausing more often during reading. These differences in reading behaviors motivated a within-subject design for the main experiment, in order to collect data on different reading conditions while controlling the individual participant’s scrolling style.

Appendix D 1-way Mixed ANOVA Results

We conducted a one-way mixed ANOVA analysis for each familiarity condition and reading goal condition (Table 7). The ANOVA comparison was conducted with a subset of data that shares the same type of fixed condition (i.e., using familiarity level or reading goal as a fixed condition). In this way, we could compare how individual features can be distinguished by the within condition given by the fixed condition. As a result, we only found marginal differences between reading familiarity conditions ().

Feature Type Feature Name F vs. U C vs. L
Reading Time Total article read time
Read time per paragraph (Avg., Std.) (Std.)
Scroll Scroll frequency
Time between scroll sequences (Avg., Std.)
Travel distance per scroll sequence (Avg., Std.)
Scrolling speed per sequence (Avg., Std.)
Regression Scroll Scroll frequency
Travel distance per scroll sequence (Avg., Std.)
Scrolling speed per sequence (Avg., Std.)
Revisiting the instruction paragraph
Touch Location X-Axis: Ratio of touch initialization from the left or right sides
X-Axis: Std. of left/mid/right touch ratios
Y-Axis: Ratio of touch initialization from the low screen
Y-Axis: Ratio of touch initialization from the mid screen
Y-Axis: Ratio of touch initialization from the high screen
Y-Axis: Std. of left/mid/high touch ratios
Table 7. One-way mixed ANOVA analysis results. We used repeated experimental conditions as mixed-effect factors in the ANOVA. (: , : , : , : )

Appendix E Pre-screening Questionnaire Responses

We used a pre-screening questionnaire to pre-select native English speaking MTurk workers for the main study. Geographic locations of workers were limited to native English speaking countries, such as the United States, Canada, the United Kingdom, Ireland, Australia, and New Zealand. Overall, 1053 of 1129 respondents reported as native English speakers (93.27%). More than 98% of them (1110 of 1129) also reported that they are comfortable with reading English articles. The charts below summarize other question responses from MTurk workers.

(a) What is your age group?
(b) What is your highest education level?
(c) What do you read for leisure or personal interest? (Top 10 key words)
(d) What do you read for work or study? (Top 10 key words)
(e) What device do you read on for leisure or personal interest?
(f) What device do you read on for work or study?
(g) How often do you read English written articles for leisure or personal interest?
(h) How often do you read English written articles for work or study?

Appendix F Materials

Here we provide the full text of all the articles, summaries, and comprehension questions that were presented to participants in our studies. As these were developed by an education expert for our studies (by modifying copyright-free articles from https://www.gutenberg.org/), we make them freely available to the research community for reuse.

f.1. Article 1: Manhattan in the Year 1609

f.1.1. Practice Summary

In 1609, Manhattan was a wild spot with beautiful trees, sand hills, and grass lands. One day, a ship from the Netherlands, called Half Moon, arrived on the shore. The ship was owned by the company of Dutch merchants. The company hired an Englishman named Henry Hudson to find a shorter course to East Indies, other than a dangerous route that go around the Cape of Good Hope. However, Hudson had not found the passage to India. Instead, he arrived at the river shore of Albany, New York.

f.1.2. Test Article

The long and narrow Island of Manhattan was a wild and beautiful spot in the year 1609. There were no tall houses with white walls glistening in the sunlight, no church-spires, no noisy hum of running trains, and no smoke to blot out the blue sky. Instead, there were beautiful trees with spreading branches, stretches of sand-hills, and green patches of grass. Beautiful birds and wild animals lived there in coexistence with the native people who made their houses from trees and vines. One day, the native people who lived here gathered on the shore of their island and looked on with wonder at a boat that was approaching. It was vastly different from anything they had ever seen.

The ship was called the Half Moon, and it had come all the way from Amsterdam in the Dutch Netherlands. The Netherlands was quite a small country in the northern part of Europe, not nearly as large as the State of New York. Little did they know that the Dutch people owned many lands across the globe. They had islands in the Indian Ocean, the East Indies, that were rich in spices of every kind which other European countries needed. The company of Dutch merchants who did most of the business with them was called the East India Company. They had many ships to carry out this business, and the Half Moon was one of them.

It was a long way to the East India Islands from Holland. At this time, there was no Suez Canal to separate Asia and Africa, and the ships had to go around Africa by way of the Cape of Good Hope. Besides being a long distance, it was a dangerous passage. From its name, one might think the Cape of Good Hope is a very pleasant place, but it was actually quite dangerous due to high winds and treacherous waves that were so strong they could smash ships to pieces. Therefore, the merchants of Holland, and of other countries for that matter, were constantly thinking of a shorter course to the East Indies. They knew very little about North or South America, which they believed were simply islands, and that it was possible for a passage to exist between them as a safer and more convenient shortcut to the East Indies. So, the East India Company built the ship, Half Moon, and hired an Englishman named Henry Hudson to take charge of it. Hudson was chosen because he had already made two voyages for an English company where he tried to find that same short passage between the Americas; therefore, he was supposed to know much more about it than anyone else.

When the Half Moon sailed up the river, Hudson was sure that he had found the passage to the Indies, but when the ship got as far as where Albany, NY is now, the water became shallow, and the river-banks were so close together that Hudson gave up. He declared that he had not found the passage to India, but only a river. Soon after, he returned to Holland to share his discoveries in America including the river which he called “The River of the Mountains.”

f.1.3. Comprehension Questions

For Practice Summary
  • Literal questions

    1. In 1609, the island of Manhattan was covered in:

      • Buildings

      • Farms

      • Trees

    2. The Dutch merchants owned a:

      • Trading company

      • Gold company

      • Tea company

  • Contextual questions

    1. The company wanted to find a shorter route to East Indies to:

      • Build a new city at Manhattan

      • Develop a better trade route

      • Explore the wild area

For Test Article
  • Literal questions

    1. (Literal) The ship, Half Moon, came from:

      • Germany

      • The Netherlands

      • Ireland

    2. (Literal) The Dutch merchants came to the Americas to find:

      • Spices to trade

      • Furs to bring home

      • A shortcut to the East Indies

  • Contextual questions

    1. On their travels around the Cape of Good Hope, explorers tended to feel:

      • Excited about the journey

      • Nervous about the rough seas

      • Indifferent about the travel

    2. After he did not find the ocean, Hudson returned to The Netherlands feeling:

      • Disappointed

      • Angry

      • Ecstatic

f.2. Article 2: The Beginning of American Railroads

f.2.1. Practice Summary

The United States today is largely the result of mechanical inventions, like the railroad. Before the invention, Americans crossed mountains and sailed on rivers to trade. The development of the railroad was possible from the use of mechanics for speed and the use of a smooth iron surface to eliminate friction. Its development became a solution for difficult travel and transportation in the United States.

f.2.2. Test Article

The United States, as we know it today, is largely the result of mechanical inventions, and in particular of agricultural machinery and the railroad. One transformed millions of acres of uncultivated land into fertile farms, while the other allowed for transportation of crops to distant markets. Before these inventions appeared, Americans had crossed mountains, traversed through valleys, and sailed on rivers to allow for trade.

During this time, it seemed that American tradesmen would have to rely on the internal waterways or horse and buggy to transport goods. This was an old, timely system. However, transportation changed drastically in the nineteenth century when the railroads were built. The railroads revolutionized America’s transportation system making it more efficient for the American people to travel and receive and distribute goods. It spread from coast to coast and was unlike anything that this nation had seen before.

The development of the American railroad came from two fundamental ideas: the first idea was to develop speed with the use of mechanics, and the other idea was the use of a long, smooth surface to eliminate friction. These principles grew from the ancestors of the railroad track. Three hundred years before this, people used wooden rails where little “cars” were pulled to transport coal in mines. Over time, the large coal wagons drove along the public highway and made deep ruts in the road. Eventually, someone began repairing the damage by laying wooden planks in these holes. The coal wagons drove over this new roadbed so well, that certain builders started constructing special planked roadways from the mines to the river. Logs, forming what we now call “ties,” were placed across at intervals of three or four feet. Then, they placed thin “rails” of wood lengthwise. This design reduced friction and allowed cars to carry the amount of coal that two or three teams of horses had difficulty carrying. In order to preserve the road, a thin sheet of iron was laid on top of the wooden rail.

Next, the overall strength and durability of the wagons needed to improve. One change was making the wagon wheels out of iron. It was not, however, until 1767, when the first rails were cast entirely of iron with a flange at one side to keep the wheel steadily in place, that the modern roadbed as it appears today came to be.

The development of the railroads came as a stepping stone in technology for improving transportation. While the railroad was a familiar sight in the mining districts of England, the development of the railroad in the United States became a solution for difficult travel and transportation. It was the answer that Americans had been looking for, which stretched from sea to shining sea.

f.2.3. Comprehension Questions

For Practice Summary
  • Literal questions

    1. Agricultural machinery and the railroad were seen as:

      • Expensive inventions

      • Electrical inventions

      • Mechanical inventions

    2. The development of the railroad came from two ideas, to develop speed and to:

      • Eliminate friction on a rough surface

      • Eliminate friction on a smooth surface

      • Create friction for better grip

  • Contextual questions

    1. Due to the development of railroads, travel most likely :

      • Increased

      • Stayed the same

      • Became too demanding

For Test Article
  • Literal questions

    1. Before railroads, American tradesmen had to rely on:

      • Walking on foot

      • Internal waterways

      • Early locomotives

    2. One change that was made to the wheels was constructing them out of:

      • Aluminum

      • Metal

      • Iron

  • Contextual questions

    1. Due to the development of railroads, trade most likely:

      • Increased

      • Stayed the same

      • Became too demanding

    2. Railroad beds are mostly:

      • The same design as its first development

      • Somewhat different from its first creation

      • Developing as technology advances

f.3. Article 3: Water Vessels in the Pueblo Region

f.3.1. Practice Summary

In the Pueblo region, water is scarce. And historically, the transportation and preservation of water was important for people in the region. Earlier people tried animal skins. Materials like gourds was used as a more effective vessel, but it was fragile. Later, people developed the water-tight basket, which was much durable. Today, these vessels considered as a tangible source of regional history.

f.3.2. Test Article

The Pueblo region of the United States is historically known as an area where pottery was practiced and worked to perfection. The architecture and arts of this region originated from the indiginous people of the desert regions of North America. Understanding the history and geography of the region gives an insight into the development of the art of ceramics and pottery.

The Pueblos’ first necessity of life is the transportation and preservation of water. Water is scarce in these regions, and can be found in small quantities, or at specific points that are extremely far away from one another. In an effort to transport water, the people once tried using skins of animals, but this was unsuccessful as the water sitting in animal skins in a hot climate would contaminate it. Later, a more successful water vessel was created. They were tubes of wood or sections of canes. These sections of canes were said to have been used by priests who filled them with sacred water from the ocean of the ”cave wombs” of earth where men were born. Therefore, they were considered important vessels. Although these canes grew in abundance in this area, especially along the rivers, another, more effective water vessel came into play. Gourds, which also grew in abundance in these places, were better shapes and held a larger volume of water. The name of the gourd as a vessel is shop tom me, from shó e (canes), pó pon nai e (bladder-shaped), and tóm me (a wooden tube). The gourd itself is called mó thlâ â, “hard fruit.”

While the gourd was large and convenient in its form, it was difficult to transport because of how fragile the vessel was. To help protect and transport the gourd, it was encased in a net of coarse wicker like rope which was made out of yucca leaves or of flexible splints.

The use of this wicker with water-vessels points toward the development of the water-tight basketry of the southwest. This explains the resemblance of many types of basketry to shapes of gourd-vessels. Eventually, these watertight baskets would inevitably replace gourd vessels. This was because, although they were difficult to manufacture, the gourds only grew in specific areas; however, the materials for basketry were everywhere. In addition, the basket vessels were much stronger and more durable. Transporting these vessels full of water at long distances were less of a danger or a hassle. Finally, because of their rough surfaces any leakages instantly stopped by a dab of mineral asphaltum, coated with sand or coarse clay externally to harden it. Today, these vessels are not only considered antiques, but a tangible source of history showing the development of a pivotal tool over time.

f.3.3. Comprehension Questions

For Practice Summary
  • Literal questions

    1. Water sources tend to be described as   in the Pueblo Region

      • Scarce

      • Plentiful

      • Contaminated

    2. A more effective vessel, which held a larger volume of water, was the:

      • Gourd

      • Animal Skin

      • Tube

  • Contextual questions

    1. Developing a good water vessel was important for:

      • A few people

      • Some people

      • Many people

For Test Article
  • Literal questions

    1. The first vessel used to transport water was made out of:

      • Canvas

      • Animal skin

      • Leaves

    2. The development of baskets resulted in the water vessel being:

      • Unbreakable

      • Water tight

      • Temperature regulated

  • Contextual questions

    1. The use of baskets as a water vessel were most likely:

      • Used by many people

      • Only used by some

      • Rarely used

    2. We can infer that the development of water vessels after the creation of water tight baskets:

      • Continued to develop

      • Stopped developing

      • Reverted back to use of tubes and canes

f.4. Article 4: Navaho Houses

f.4.1. Practice Summary

The land on the Navajo reservation is barren and void of life, and this condition affected on their houses. Building winter huts follows a set of rules and referred as ritually “beautiful”. However, these rules do not apply for building summer huts and considered as a makeshift shelter.

f.4.2. Test Article

The Navajo reservation is a large area of land in the northeastern part of Arizona and the northwestern corner of New Mexico. The total area is over 11,000 square miles. About 650 square miles are in New Mexico. Unfortunately, a large part of this region consists of land that is barren and void of life. The condition of this land has had an important effect on the people, their arts, and especially their houses.

The Navaho recognize two distinct classes of hogáns—the keqaí or winter place, and the kejĭ´n, or summer place; in other words, winter huts and summer shelters. Winter huts are a staple of Navajo culture. On the outside, they resemble a mound of earth hollowed out. However, they are warm and comfortable. Their construction follows a set of rules and is considered a ritual. For example, there are ceremonies that dedicate the homes to a family before they occupy it.

Decoration on the inside or outside of the houses is uncommon, yet, the hogans are usually referred to as “beautiful.” To build this structure, strong forked timbers that are the correct length and flexibility are thrust together so that their ends properly interlock to form a cone-like frame. Stout poles lean against the apex to form the sides, and the outside is covered with bark and heaped with think dirt, forming a roomy warm interior with a level floor. To the Navajo, the house is beautiful when it is well constructed and it adheres to the ancient model.

The rules for building a regular hogán or winter house do not apply to the summer huts or shelters. The level of detail in these huts varies, but the work is done by hand and follows a specific process. This is one of the most primitive and simple shelters the Navajo builds. It starts with a center circle of greenery, generally pine or cedar. Then, it takes two men with axes a half an hour to erect one of the central circles.

In order to start this process, a site for the hut is selected, a tree is chopped down, and the branches are trimmed from the trunk. 4 to 5 feet of branches are piled on three sides of a circle 15 or 20 feet in diameter. A fire is built in the center and blankets are thrown over outstanding branches here and there, creating a great amount of shade during the hot summer days. Although it is a makeshift shelter, it is effective to escape from the brutal heat. It is important to note that these shelters are only possible in a wooded area, and are built only to meet an emergency, like when someone is away from home and there are no hogáns in the vicinity where they can stop.

f.4.3. Comprehension Questions

For Practice Summary
  • Literal questions

    1. The land on the Navajo reservation can be characterized as:

      • Plentiful

      • Rocky

      • Sandy plains

    2. Winter huts are described as:

      • Rough

      • Beautiful

      • Earthy

  • Contextual questions

    1. For Navajos, the type of house to build mostly relied on:

      • Landscape

      • Seasons

      • Community

For Test Article
  • Literal questions

    1. Winter huts are built following:

      • A set of strict rules

      • The person who will live in the hut

      • No rules or rituals

    2. Summer huts are usually:

      • Permanent homes

      • Beautiful

      • Makeshift

  • Contextual questions

    1. The rituals following the construction of winter huts are most likely:

      • Important to the people

      • A nuisance

      • Dependent on the family

    2. These rituals and ways of building winter huts and summer huts are probably:

      • Forgotten in modern times

      • Used only when necessary

      • A highly regarded process in the Navajo culture