As the development of automated driving technology has progressed, it has been used in a variety of scenarios such as traffic systems, goods distribution , and hospital logistics . As a result, interaction between people and automated vehicles (AVs) is expected to increase. This new technology is often not accepted by the public at the current stage of popularization within society  due to a lack of trust in AVs . The essential reason for this distrust is the fear of the unknown , specifically a lack of knowledge regarding the intended actions of the AV while driving, i.e., the current and subsequent actions that would be taken by the AV. Therefore, many studies claim that providing information to pedestrians regarding the driving intention of the AV is helpful in improving the understanding of pedestrians and their perception of safety in interactions [5, 14]. This improved understanding and perception of safety are considered effective in increasing the popularity of AVs in society.
Thus, in this work the following two problems for pedestrian–vehicle interaction are studied:
What timing for the AV to make the pedestrian understand its driving intentions after it is noticed?
What timing for the AV to make the pedestrian feel safe after it is noticed?
For the above questions, we formulate and propose a hypothesis based on a decision making process for pedestrians, including the situation model and the theory of risk homeostasis. Based on this hypothetical model, we design an experiment of pedestrian–vehicle interaction. The participants’ gaze information, and their subjective evaluations of the understanding of driving intention and their perception of safety, are collected. We analyze when pedestrians do not understand the intention of the vehicle, as well as when pedestrians feel danger, by analyzing the participants’ gaze duration on the vehicle with their subjective evaluations.
2 Related Works
In most studies of pedestrian–vehicle interaction, the availability of different information transmission methods has been subjectively evaluated. Stefanie et al. evaluated the communication efficacy of external human–machine interfaces (eHMIs) by various light signals with the use of questionnaires and interviews . Clercq et al. asked participants to continuously evaluate their feeling of safety by pressing a button during a pedestrian–AV interaction .
The pedestrians’ gaze behavior is also used as an objective factor to analyze pedestrian and vehicle interactions. This is explained by the fact that the observation of the vehicle by the pedestrian could be considered as his/her desire to obtain information from the vehicle, e.g., determining the intent of the vehicle and predicting if the interaction is dangerous. For example, Dey et al. found that the gaze point of pedestrians gradually gathered in the driver’s position through a windshield when a manually driven vehicle (MV) was approaching . In our previous study , we found that there was a correlation between pedestrians’ gaze durations and their understanding of the driving intention of the AV. Besides, we considered that pedestrians’ gaze durations on the AV could represent the request for information about the AV. Therefore, we suggested that the AV should send information about its driving intentions to the pedestrian when they interact with each other. However, when AVs should send information to pedestrians is still an unsolved issue. To solve this issue, we analyze changes in the understanding of pedestrians about the driving intention of AVs and in their perception of safety according to the gaze duration in this paper.
3 Decision Making Model of Pedestrian
To improve the perception of safety among pedestrians during interactions with AVs, the generation process for the perception of safety should be clarified. We focus on the decision-making process of a pedestrian. A hypothesis is proposed in Fig. 1 which shows the decision making process of a pedestrian who is interacting with a vehicle. This hypothesis includes three parts: situation awareness, hazard perception, and decision-making based on risk homeostasis.
Situation awareness could be represented by the situation model . Firstly, situation awareness relies on the perception of things in the surrounding environment, e.g., AV’s relative position, relative distance, and relative speed. Secondly, comprehension is taken as the understanding of the current state of the AV in a given situation, such as the driving intention of the AV. Thirdly, based on the result of comprehension, the pedestrian will predict the driving behavior and moving trajectory of the AV.
After establishing situation awareness, the pedestrian realizes hazards, such as anomaly detection by comparing the predicted driving behavior of the AV with his/her experience. The subjective risk is generated by evaluating the perceived hazards. Subsequently, the subjective risk could be seen as the degree to which the pedestrian feels threatened, e.g., the perception of safety or the perception of danger. The pedestrian decides his/her behavior by comparing the perceived risk with his/her acceptable risk level according to the risk homeostasis theory. If the perceived risk is lower than the acceptable risk level, the behavior of the pedestrian will become riskier. In the opposite case, the pedestrian will become more careful.
According to the above hypothesis, the gaze duration of the pedestrian at the AV will increase during interaction if he/she does not clearly understand the driving intention of the AV . Besides, the pedestrian may also perceive further danger due to a lack of the understanding of driving intention. Thus, the gaze duration could be used to objectively evaluate the pedestrian’s understanding of the driving intention of the vehicle and his/her perception of safety in the interaction.
4 Experiment Design
We convened 13 experimental participants within an age range of as pedestrians. They had different educational backgrounds because they came from various disciplines of our university. All of them had no prior experience of interaction with the AV. They were requested to walk from the start point to the goal point as shown in Fig. 3. They were told that a vehicle would interact with them during their walk. Additionally, the participants were informed of interacting with a MV under the control of a driver and interacting with an AV automatically. A wearable eye tracker Tobii Pro Glasses 2 was used to measure the participants’ gaze behavior during this experement.
A robotic wheelchair–WHILL Model CR (Fig. 3) was used as an experimental vehicle to interact with the participants during their walk. The two modes (i.e., manual and automated) were used to drive the vehicle with a maximum speed of 1 [m/s]. In the manual driving mode, an experimenter rode on the vehicle and manipulated it using the AV ailable joystick. The experimenter did not actively send information about their driving intention to the participants, but eye contact could not be ruled out. In the automated driving mode, the vehicle was automatically controlled without a crew using a multi-layered LiDAR (Velodyne VLP–16) and wheel encoders. Importantly, the AV could not recognize participants, so it could not automatically interact with them. To achieve a smooth interaction between the AV and the participants, a wireless remote controller was secretly used by the experimenter to control whether the AV would give the right-of-way to the participant. In other words, when the AV automatically moved along the designed route, it stopped if the experimenter pressed a button on the remote control. If the experimenter released the button on the remote control, the AV resumed the automatic movement. The participants did not know that the AV was being manipulated by the experimenter. For both the MV and AV, the experimenter adjusted the driving behavior through the distance to the participant, as well as the speed and walking direction of the participant, so as to realize interaction with the participant.
After the completion of each trial of interaction, the participants were required to complete a questionnaire on their subjective evaluations. There were two evaluation items on the questionnaire that were answered according to 5 point Likert scales. The first question was used to evaluate the participants’ understanding or confusion regarding the driving intention of the vehicle during the interaction according to the following scales: 1. Completely did not understand, 2. Did not understand much, 3. Neutral, 4. Mostly understood, and 5. Fully understood. The second question was used to evaluate the participants’ perception of safety during the interaction according to the following scales: 1. Very dangerous, 2. Slightly dangerous, 3. Neutral, 4. Slightly safe; and 5. Very safe.
There were three routes designed for the movement of the vehicle as shown in Fig. 3. To simulate the scenario of a pedestrian interacting with the vehicle when crossing the street, route 1 was designed to cross the path of the participant. In order to simulate the scenario of a pedestrian avoiding the vehicle, route 2 was designed to allow the vehicle and the participant move opposite each others on a straight road. Route 3 was designed as a contrast, with no behavioral interaction between them. Overall, each participant interacted with the MV and AV 20 times respectively. Routes of those 40 interactions were chosen randomly.
This experiment was permitted by an ethics review committee of Institutes of Innovation for Future Society, Nagoya University.
5 Experiment Results
5.1 Data preprocessing
Data of route 3 were excluded for analysis because this study focuses on the gaze behavior and psychological states of participants when interacting with the vehicle. The observed gaze data of participant was also excluded because it had a lot of noise. Besides, three trials of participant ’s gaze data for interacting with the MV were excluded due to equipment problems. In total, data from 198 trials (route 1: 114 trials, route 2: 84 trials) of interacting with the MV and data from 204 trials (route 1: 120 trials, route 2: 84 trials) of interacting with the AV were observed.
Tobii Pro Glasses 2 measured the foreground video and sequence of gaze points of each participant. The size of the measured foreground video is pixels. In each frame, the gaze point of the participant was recorded as a two-dimensional coordinate value on the plane of the foreground image. In this experiment, the central visual field of participants was defined as a circle. The gaze point was the center of a circle with a diameter of 108 pixels. If any part of the vehicle area overlapped with part of the circle, then it was determined that the participant was gazing at the vehicle. Under the abovementioned conditions, the total time of the gazes on the vehicle was calculated as the gaze duration in each trial.
The difference between this experiment and our previous experiment  is that each participant’s gaze duration data was not standardized in this study. The reason was that the AV cannot perceive the individual differences in the gaze duration of each participant in practical applications and situations.
5.2 Subjective evaluation results
Referring to the selection ratio of each subjective evaluation in Table 1, the most frequent evaluation for the MV was 5. Fully understood (42.4%). Besides, 4. Mostly understood was the most frequent evaluation for the AV (44.1%). The selection ratio for the MV was significantly less than that for the AV when the participants chose 1. Completely did not understand (MV is 1%, AV is 3.4%) and 2. Did not understand much (MV is 4.5%, AV is 19.1%). The above can be explained as for them more difficult to understand the driving intention of the vehicle when interacting with the AV than when interacting with the MV.
Meanwhile, evaluation results for the perception of safety were similar to the evaluation results for the understanding of driving intention. Table 1 shows that the most selected scale was 5. Very safe for the MV (43.4%) and 4. Fairly safe for the AV (42.2%). 0.5% trails of interaction with the MV and 2.5% trails of interaction with the AV were evaluated as 1. Very dangerous. The ratio of selecting 4. Fairly dangerous when interacting with the AV (10.3%) was also higher than that when interacting with the MV (8.1%). The above results show that participants felt situations to be more dangerous when interacting with the AV than when interacting with the MV.
|1. Completely did not understand||1%||3.249||0.929||3.4%||6.156||2.329||2.907|
|2. Did not understand much||4.5%||3.058||2.459||19.1%||4.478||4.058||1.420|
|4. Mostly understood||36.9%||2.479||1.719||44.1%||3.885||2.867||1.406|
|5. Fully understood||42.4%||1.650||2.149||12.3%||2.179||2.318||0.529|
|1. Very dangerous||0.5%||2.798||0.000||2.5%||5.037||1.899||2.239|
|2. Slightly dangerous||8.1%||3.069||1.023||10.3%||5.479||3.498||2.410|
|4. Fairly safe||32.8%||2.799||2.119||42.2%||3.703||2.773||0.904|
|5. Very safe||43.4%||1.535||2.019||19.1%||2.219||2.379||0.684|
5.3 Gaze durations for each subjective evaluation scale
show the probability density of gaze durations for two subjective evaluations. Fig.5 is the result for evaluation of the understanding driving intention and Fig. 5
is the evaluation result for the perception of safety. The vertical axis and the horizontal axis indicate the probability and the gaze duration, respectively. Note that the horizontal axis signifies the integration time because the gaze duration is the total time of all gazes on the vehicle in the interaction. In those graphs, the green color indicates interaction with the MV and the red color indicates interaction with the AV. Samples of each gaze duration are represented by short vertical lines on the horizontal axis. The median values are represented by long dotted lines. To infer their probability density, kernel density estimation with a Gaussian kernel was used in order to account for individual differences that are potentially included in the gaze durations. The inferred probability density of gaze durations for the MV and the AV are represented by green and red curves in Figs.5 and 5.
Accroding to Fig. 5 and Table 1, the gaze durations for the AV were longer than that for the MV in terms of the median value of gaze durations corresponding to each scale of the evaluation for driving intention understanding. This validated that if the participants did not understand the driving intention of the vehicle, then their gaze durations increased because they needed more time to observe the state of the vehicle and obtain information that could be used to infer the driving intention.
Regarding each scale of the evaluation for the perception of safety, the median values of gaze durations on the AV were higher than on the MV, as shown in Fig. 5 and Table 1. This shows that the gaze durations on the vehicle increased as the participants’ perception of safety in interactions decreased. It also implies that the participants watched the vehicle more attentively when they felt that it was dangerous.
Combining the results of these two subjective evaluations, we consider that the observation time (gaze duration) of the participants for the AV increases in order to prevent the AV becoming a danger to themselves when they did not understand the driving intention of the AV.
Meanwhile, for both the evaluations of the understanding driving intention and the perception of safety, the interquartile ranges (IQR) of gaze durations on the AV were also higher than those on the MV, as shown in Table 1. This indicates that the participants had greater individual differences in their strategy of observation for the AV than for the MV because they did not have much experience interacting with the AV, especially in real world situations.
5.4 What timing for the AV to make the pedestrian understand its driving intentions after it is noticed?
The ratios of 5 point Likert scales for the understanding driving intention were checked for occurrance in every 0.5 [s] interval of gaze durations as shown in Figs. 9 and 9. For the interaction between the participants and the MV, the ratios of 4. Mostly understood and 5. Fully understood were significantly higher than the others for most of the intervals, as shown in Fig. 9. In contrast, for the interaction with the AV, the ratios of 1. Completely did not understand and 2. Did not understand much, increased, and the ratios of 4. Mostly understood and 5. Fully understood decreased as the gaze duration increased, as shown in Fig. 9.
5.4.1 Lower bound of the timing
To determine the lower band of the timing that the AV should make the pedestrian understand its driving intentions after it is noticed, we investigated the time range of gaze durations when the participants felt it difficult to understand the driving intention. When the participants evaluated the AV’s driving intention as 2. Did not understand much, the shortest gaze duration was more than [s]. Although it was a small ratio, it showed that the AV should make the pedestrian understand its driving intentions [s] after it is noticed, such as sending information about the driving intentions to the pedestrian.
5.4.2 Upper bound of the timing
Each scale in the 5 point Likert scales has chance level of 20% to be chosen. Thus, there is a 40% chance that the participants will choose scales about “did not understood” (i.e., 1. Completely did not understand and 2. Did not understand much) or scales about “understood” (i.e., 4. Mostly understood and 5. Fully understood). Therefore, a threshold for the evaluation ratio in 0.5 s intervals of gaze durations was set to 40%. Figure 9 shows that only for interactions with the AV, the shortest gaze duration in the intervals that the participants did not understand the driving intentions in more than 40% trials was over 6.5 [s]. Therefore, we recommend that the AV is better to make the pedestrian understand its driving intentions accurately within the first 6.5 [s] while the pedestrian is gazing at it.
5.5 What timing for the AV to make the pedestrian feel safe after it is noticed?
We calculated the ratios of 5 point Likert scales for the perception of safety for occurrance in every 0.5 [s] interval of gaze durations as shown in Figs. 9 and 9. As shown in Fig. 9, for the evaluation result of interacting with the MV, 4. Fairly safe and 5. Very safe were chosen in most of the intervals. For the interaction with the AV, the ratios of 1. Very dangerous and 2. Fairly dangerous increased, and the ratios of 4. Fairly safe and 5. Very safe decreased as the gaze duration increased, as shown in Fig. 9.
5.5.1 Lower bound of the timing
We also investigated the shortest gaze duration when the pedestrians felt dangerous to determine the lower band of the timing that the AV should make the pedestrian feel safe after it is noticed. Fig. 9 shows that the shortest gaze duration when the participants chose 1. Very dangerous was in the interval [s]. Thus, the AV is better to start to make the pedestrian feel safe 0.5 [s] after it is noticed.
5.5.2 Upper bound of the timing
Similarly, the total of 4. Fairly safe and 5. Very safe has a chance level of 40% to be chosen. Thus, a threshold for the evaluation ratio of the perception of safety was set to 40%. Referring to Fig. 9, there were high ratios where the participants felt danger when interacting with the AV for their gaze durations of over 8.0 [s]. In particular, if the gaze duration was more than 8 seconds, there were no cases that were evaluated as 4. Fairly safe or 5. Very safe. In other words, the AV is more likely to enable the pedestrian to feel safe within the first 8.0 [s] while the pedestrian is gazing at it, e.g., sending related information to the pedestrian.
We found a trend through Fig. 5 that the increase in gaze durations on the AV was gradually greater than the gaze durations on the MV as the participants understood the driving intention of the vehicle less. For example, there was no clear difference between the probability densities of the gaze durations on the AV and the MV when the participants chose 5. Fully understood, but the difference for 1. Completely did not understand was large, as shown in Fig. 5. Referring to the results of the perception of safety in Fig. 5, it also had the same trend. Those trends are also shown in Table 1 as the differences between median values of gaze durations on the AV and the MV for each evaluation scale. The reason for this trend is speculated that the participants sufficiently trust in both AV and MV when they understood the vehicle’s driving intentions and felt safe. As their understanding of driving intentions becomes ambiguous and the perception of danger increases, their trust in the vehicle may decline, while they increase the observation time of the vehicle to gain more information to protect themselves from danger. An important factor to be considerd here is the participants potentially trust in the driver. We speculate that this is also one of the reasons that the participants’ gaze duration on the MV was shorter than that on the AV when they understood the vehicle’s driving intentions and felt safe.
Besides, in Figs. 9 and 9, we found that when the participants interacted with the MV, the earliest occurrences of 4. Mostly understood and 5. Fully understood were earlier than when they interacted with the AV. When the participants only observed the MV [s] during the interaction, there were cases that the MV’s driving intention was evaluated as 2. Did not understand much. The same difference occurred in 1. Completely did not understand, where the earliest occurrences were at [s] for the MV and [s] for the AV. Based on the above results, we consider that the participants trusted the MV more than the AV because they successfully performed the trials even with a brief observation of the MV or an unclear understanding of the driving intention of the MV.
On the basis of the above discussions, we assume that the underlying cause of this may be related to the pedestrians having less trust in the AV than in the human driver of the MV. This may be the key factor that makes it difficult for AVs to achieve popularity in society.
We designed an experiment of pedestrian–vehicle interaction to find a time range that an AV should make a pedestrian understand its driving intentions and feel safe in an interaction. Thirteen participants were invited to interact with the MV and the AV. The participants’ gaze information and their subjective evaluation of the understanding driving intention as well as their perception of safety were collected. By analyzing the participants’ gaze duration on the vehicle with their subjective evaluations, we found that 1) the AV is better to enable the pedestrian to understand its driving intentions accurately within [s] while the pedestrian is gazing at it, and 2) the AV is better to enable the pedestrian to feel safe within [s] while the pedestrian is gazing at it.
This study is limited not considering that the gaze duration of participants depends on the speed of the vehicle. In the experiment, an electric wheelchair was used as an experimental vehicle with a maximum speed of 1 [m/s] (3.6 [km/h]). Therefore, the experimental results of this study can be applied to personal mobility vehicles or mobile robots running at low speed, but it may be difficult to apply them to an automated car running at high speed.
In future, we will use the same experimental method to study the interaction between pedestrians and an automated car. We will also focus on establishing an eHMI for AVs that can quickly, clearly, and kindly convey driving intention to pedestrians, thus improving the acceptability of AVs. Additionally, the results of this study indicate that the gaze behavior of pedestrians on the AV may depends on the trust that the pedestrians have. Besides, some related work has also focused on the feeling of user safety and trust in using AV systems [11, 9, 7]. User trust has to be taken into account in our future study.
This work was supported by JST-Mirai Program Grant Number JPMJMI17C6, and JSPS KAKENHI Grant Numbers JP19K12080, Japan.
-  (2019) External human-machine interfaces on automated vehicles: effects on pedestrian crossing decisions. Human Factors 61 (8), pp. 1353–1370. Note: PMID: 30912985 External Links: Cited by: §2.
-  (2019) Gaze patterns in pedestrian interaction with vehicles: towards effective design of external human-machine interfaces for automated vehicles. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 369–378. Cited by: §2.
-  (2017) Toward a theory of situation awareness in dynamic systems. In Situational Awareness, pp. 9–42. Cited by: §3.
-  (2020) Yielding light signal evaluation for self-driving vehicle and pedestrian interaction. In Human Systems Engineering and Design II, T. Ahram, W. Karwowski, S. Pickl, and R. Taiar (Eds.), Cham, pp. 189–194. External Links: Cited by: §2.
-  (2018) Communicating intent of automated vehicles to pedestrians. Frontiers in Psychology 9, pp. 1336. External Links: Cited by: §1.
-  (2019) On the future of transportation in an era of automated and autonomous vehicles. Proceedings of the National Academy of Sciences 116 (16), pp. 7684–7691. External Links: Cited by: §1.
-  (2019) Overtrust in external cues of automated vehicles: an experimental investigation. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 211–221. Cited by: §6.
-  (2019) Study of sidewalk autonomous delivery robots and their potential impacts on freight efficiency and travel. Transportation Research Record 2673 (6), pp. 317–326. Cited by: §1.
-  (2019) Driving behavior model considering driver’s over-trust in driving automation system. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings, pp. 115–119. Cited by: §6.
-  (2020) What gaze behavior do pedestrians take in interactions when they do not understand the intention of an automated vehicle?. arXiv preprint arXiv:2001.01340. Cited by: §2, §3, §5.1.
Concrete problems for autonomous vehicle safety: advantages of bayesian deep learning. Cited by: §1, §6.
-  (2008) Robot based logistics system for hospitals-survey. In IDT Workshop on interesting results in computer science and engineering, Cited by: §1.
-  (2019) Dimensions of attitudes to autonomous vehicles. Urban, Planning and Transport Research 7 (1), pp. 19–33. External Links: Cited by: §1.
-  (2019) Autonomous vehicles that interact with pedestrians: a survey of theory and practice. IEEE Transactions on Intelligent Transportation Systems (), pp. 1–19. External Links: Cited by: §1.
-  (1982) The theory of risk homeostasis: implications for safety and health. Risk analysis 2 (4), pp. 209–225. Cited by: §3.