Spear phishing is a deceptive attack that uses social engineering to obtain confidential information through targeted victimization. It is distinguished by its use of social cues and personalized information to target specific victims. Previous work on resilience to spear phishing has focused on convenience samples, with a disproportionate focus on students. In contrast, here, we report on an evaluation of a high school community. We engaged high school students and faculty members ( high school students, staff members) as participants in research utilizing signal detection theory (SDT). Through scenario-based analysis, participants tasked with distinguishing phishing emails from authentic emails. The results revealed an overconfidence bias in self-detection from the participants, regardless of their technical background. These findings are critical for evaluating the decision-making of underrepresented populations and protecting people from potential spear phishing attacks by examining human susceptibility.
Phishing, Spear Phishing, High School, User Study, Usable security
Phishing is used to obtain confidential information, install malware, obtain funds, or steal resources . Targeted phishing is a critical component of that; for example, phishing attacks on Zoom increased four orders of magnitude between March and April 2020 and COVID-19-related phishing, including misinformation as well as attacks on the benefits for the newly unemployed. The most targeted form of phishing attack is spear phishing . As spear phishing is a challenge essentially grounded in human behavior and decision-making , solutions should be informed by human subject evaluations as well.
Conversely, studies on phishing show a bias toward machine learning and purely technical solutions, with onlyof published papers on phishing in the ACM Digital Library utilizing human participants or user-centered methodologies . Even when research does involve human subjects, it often studies convenience samples, specifically university students. Investigating high school students is particularly important, as previous research has shown that age is a critical factor in predicting susceptibility to phishing attacks [22, 23, 26]. Improved understanding of participants’ mindsets when they click on a malicious email link can enable robust defensive and offensive techniques against spear phishing attacks. In order to contribute to this understanding, we combined phishing detection with signal detection theory (SDT) to explore how spear phishing cues impact this population . SDT is often used to effectively measure and differentiate between present patterns and figuratively noisy distractions .
Specifically, we conducted a user study focusing on high school students and staff members to explore the less-observed correlation between participant mentalities and email spear phishing attacks. Our goal was to address the following research questions:
RQ1: How confident are participants in distinguishing between legitimate and non-legitimate spear phishing content over email?
RQ2: How does age affect a user’s ability to distinguish between legitimate and non-legitimate spear phishing content over email?
2 Related Work
The U.S. Department of Homeland Security identified the sequence of actions taken to craft a spear phishing attack: (1) identify the target, (2) meticulously craft the message with the intent of the recipient taking immediate action, and (3) deliver the message from a counterfeit email address . Rajivan et al. found that phishing emails with “specific attack strategies (e.g., sending notifications, use of authoritative tone, or expressing shared interest)” were found to be more successful . The use of social engineering through psychological manipulation can establish trust, and, as a result, lure in victims .
Previous research on phishing has focused on software- or hardware-based solutions, such as toolbars, machine learning models, and warning indicators . Although significant advances in technology-based tools have emerged [30, 34, 35], less research has focused on end users . Yet, the need for such research has long been recognized; in 2008, Friedrichs et al. argued that humans must be studied to stop web-based identity theft, including phishing attacks . Such insights become even more important in light of Karakasiliotis et al.’s findings that only 36% of their study’s participants could identify legitimate websites. Only 45% of participants could correctly identify malicious websites . Dhamija et al. found that visual deception can fool even sophisticated users; a good phishing website fooled 90% of the participants in their study . Fewer studies have focused on more vulnerable populations, such as younger students. In our background research, we did not find any studies focused on high school students or staff. Thus, we specifically selected a high school environment for our study.
In 2016, Canfield et al. performed two experiments comparing detection and performance using SDT. They found that “Greater sensitivity was positively correlated with confidence. Greater willingness to treat emails as legitimate was negatively correlated with perceived consequences from their actions and positively correlated with confidence” . We implemented SDT in our research by analyzing the ‘stimulus,’ which triggers the decision-making in users. To evaluate the efficacy of the stimulus, we measured hits, misses, false alarms, and correct rejections (i.e., true positive, false negative, false positive, and true negative). We analyzed how users chose to click or not click links sent via electronic mail. The use of SDT enabled us to evaluate which sections of the phishing email arouse suspicion when they are present .
To explore the relationship between the phishing susceptibility of high school students and their educators, we wanted to see what email cues both groups notice when deciding to click (or not click) on a malicious link. We conducted a non-experimental, quantitative correlation analysis by collecting data through a descriptive survey to check phishing susceptibility outcomes, age differences, and confidence levels. We primarily collected data from high school students and staff at a suburban high school in the United States. We obtained approval from the Ethical Review Board before beginning this experiment.
To begin, we instituted a collaboration with a suburban high school from the Midwestern part of the United States. As most high school students were under the age of 18, parental permission was required on a paper version of an informed consent document. We only allowed people to participate after their form was signed and approved by the staff and the students’ parents. During the recruitment phase, we engaged with language arts classrooms to find willing research participants. English language arts classes were chosen because all students were required to enroll in these classes to graduate. The study was also advertised to every student in the building during the morning school announcements. We also distributed flyers advertising the study to participants. Students who turned in the paper consent forms then received emails that contained an electronic form of the survey. To recruit teachers and faculty members, we sent out emails containing the link to the consent form and questionnaire. Because the study was announced beforehand, teachers and faculty were expecting this recruitment email. The participants received an incentive at the end of the survey by choosing to enter a drawing for Starbucks gift cards. Our power analysis showed that we required sample size of more than participants. We obtained a complete response set from participants in our final data set.
3.2 Survey Instrument and Study Design
The survey consisted of three parts: the informed consent information, the demographic questionnaire, and the actual phishing susceptibility assessment. We utilized Google Forms as the tool to provide the survey questionnaire because it was easily accessible to both students and teachers. The first author anonymized the data so that personally identifiable information would not be shared with anyone else, including other researchers. Participants began by opening a Google Forms link from their email and confirming their status as a student or a staff member of the high school. The staff needed to confirm their consent to the study, while students would move on to the next step due to their parents having already agreed via the consent form. Next, participants answered a set of demographic questions regarding their age group (and not their specific date of birth to reduce the risk of disclosure of identifiable information). Afterward, the participants were presented with ten questions to assess their spear phishing susceptibility through the use of images of phishing emails. We selected images instead of asking them to go through actual emails to mitigate any concern that they may respond to malicious messages. The participants classified the images as “regular email” or “phishing email”. For each question, the participants rated their confidence in their decision, from least to most confident using a five-point Likert scale.
Spear Phishing Susceptibility: Based on prior phishing research, there are three main factors identified in most phishing emails: anonymous senders, suspicious URLs or installations, and a sense of urgency . Figure 2 is an example that shows the present signs of a harmful phishing email such as: an anonymous sender (e.g., “is outside your organization”), a sense of urgency (e.g., “URGENT! CLICK THE LINK”), a suspicious URL (e.g., “http://baoonhd.vn/api/get.php?…”), and a risky action (e.g., clicking on “Open in Docs”). In contrast, Figure 2 shows an authentic email from Google, as seen by the trustworthy email address, the accurate website link, and the valid email format. Non-phishing examples were adopted from personal school emails that the high school staff and students received earlier, and at least one individual reported as suspicious. This data was obtained from the high school staff and IT support, who anonymized the email samples.
Phishing examples were adopted from the Berkeley Phishing Examples Archive (PEA) 111https://security.berkeley.edu/education-awareness/phishing/phishing-examples-archive. The adopted phishing emails were modified to include the name of the school and actual school activities, including grades and exams. The images were edited to address the participants’ real names and roles (teacher or student). Google documents addressed school-specific information to check the participants’ susceptibility to spear phishing emails. The signals that were used in the phishing emails were (a) the greeting, (b) suspicious URLs with a deceptive name or IP address, (c) content that did not match the ostensible sender and subject, (d) requests for urgent action, and (e) grammatical or typographical errors. We selected this set of signals based on a 2016 study by canfield et al. that similarly focused on detection theory, albeit using an online survey of people aged 19-59 .
3.3 Analysis: Method
Once the data collection was complete, we analyzed the data using RStudio and SPSS Statistics. Using SDT, participants’ answers were categorized as four possible outcomes: hit, miss, false alarm, and correct rejection. Table 1
shows the signal detection theory outcomes adjusted to become appropriate for this study. The outcomes from the phishing assessment were analyzed in a one-way analysis of variance (ANOVA) to explore the relationship between the independent variable (age group) and the dependent variables (the number of different outcomes and the average confidence levels). The one-way analysis of variance is used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups. For ANOVA, we usually compare three or more groups. For this study, we divided the data set into seven groups.
|Respond “Regular Email”||Respond “Phishing Email”|
|Authentic Email||Correct Rejection||False Alarm|
4 Findings and Discussions
Our data collection was done over a period of two months. We collected a complete data set of subjects, who provided their consent and participated in it. Of these participants, were students, and were staff members of the high school. Eight participants were from 12 to 17 years old; four participants were from 18 to 24 years old; participants were from 25 to 34 years old; participants were from 35 to 44 years old; were from 45 to 54 years old; seven were from 55 to 64 years old. Thus, the participants’ ages ranged from 12 to 64 years old. This study aimed to determine if there was a significant difference between the age groups (12–17, 18–24, 25–34, 35–44, 45–54, and 55–64 years old), the email outcomes (hit, miss, correct rejection, false alarm), and the confidence levels (Likert scale one through five ratings) using a ten-item test. Results of the ANOVA test are shown in Table 2. A significant difference was noted for the hit or miss email outcomes (F(5, 51) = 2.614, p < .035). The correct rejection, false alarm, and all the different confidence levels had no significant difference between the groups.
|Sum of Sq||df||Mean Square||F||Sig.|
The results illustrate a significant number in the hit or miss category, but few correct rejections and false alarms across all the confidence levels. The ANOVA results of the confidence levels of the participants can be seen in Table 3. Here, we can say that age plays a significant role in responding to a stimulus, as evidenced by the participants either responding with “Authentic Email” or “Phishing Email.” A potential reason for the lack of significance could be that the confidence levels were not precisely represented and that participants’ perceived confidence was subjective. One participant’s response of a 5 (most confident) could be the same as another participant’s 3 (average confidence). Their perceived confidence could also shift throughout the survey; a response of 1 (least confident) could be changed to a 2 (lower confidence) or 3 later on, depending on whether or not the participants believed that the questions were more or less difficult at the beginning of the survey.
Fig 4 shows correct results (yellow, blue) increase with age. Fig 4 show confidence increasing in false alarms in with age (green), with confidence about correct identification ( and misidentification higher for younger age groups. Our data revealed that the highest mean for the hit outcome was from age group six (45-54 years old). The second-highest mean for the hit outcome was from age group five (35-44 years old). Groups five and six also had the lowest mean for the miss outcome. In Figure 4, we show the mean outcome for hit and correct rejection, which has an increasing slope, with a negative correlation with miss and false alarm. Therefore, there is strong evidence that older groups are less susceptible to spear phishing than the younger groups in a high school setting. Figure 4 shows that the other variables were not significant. This result is quite different from that hypothesized under the ‘digital native’ rubrics that argue for younger cohorts’ lifetime exposure resulting in improved decision-making (e.g., ).
|N||Mean||SD||SE||95% CI||95% CI||Min||Max|
|False Alarm Conf||2||6||3.1944||0.62731||0.2561||2.5361||3.8528||2||3.67|
Spear phishing is an effective form of attack because attackers manipulate their targets, either through luring them in with promises of specific benefits or by coercing them with specific threats . These techniques are designed to lead to impulsive or quick decision-making from the end-users. In our findings (Section 4), we leveraged SDT to understand participant decision-making with spear phishing stimuli. When the mean of the outcomes was graphed, the results revealed a positive slope for the hit and correct rejection outcomes, meaning that the older participants tended to be less susceptible to spear phishing. The effects of these relationships can contribute to a better understanding of how people interact with fraudulent acts online. Here we offer recommendations that our findings indicate as ways to increase resilience against spear phishing attacks.
Align Anti-Phishing Training with Self-perceived Expertise: Our work found that older participants were less susceptible to spear phishing than younger participants, as age group six had the highest average number of hits (i.e., correct detection) throughout the experiment. This is aligned with previous research from Sheng et al. . One reason for this gap may be students’ lack of exposure to training geared towards them. For this reason, we recommend introducing phishing training to students at a younger age and aligning it with their self-perceived expertise. Our results show both a high level of incorrect responses and a high level of confidence. This indicates that younger participants may be unaware that they have been the victim of a successful phishing attack.
Targeted Risk Communication: In addition to providing anti-phishing training, organizations should consider providing clear risk communication, especially for younger adults or children. Students may lack an understanding of the technical threats that may be present in their email inbox , believing that they will not be targeted. Thus, the need for context-aware risk communication  that has been identified as necessary for older adults [5, 6, 16] is similarly required for high school student populations.
Enable Multi-Factor Authentication: To create more robust defensive techniques against spear phishing attacks, we need to reduce the risk of compromised credentials. Such compromised credentials can be used to steal sensitive information. Because of this, schools that provide laptops (or require these for online instruction) should consider adopting multi-factor authentication (MFA) for students and staff [4, 8, 28]. The introduction of these (like other training) should be aligned with user risk mental models [9, 10, 11]. The issue of over-confidence above also motivates the importance of another factor for authentication (e.g., a hardware token) in addition to their password, which would mitigate the harm of phishing.
6 Limitations and Future Work
This work, with its focus on the confidence as well as correctness, opens more questions than it answers. Other factors besides age and confidence levels should be studied to gain a holistic understanding of susceptibility to spear phishing. The suburban high school we engaged with has relatively high socio-economic homogeneity, and the study should be repeated with other high schools. To improve diversity, future work should begin with more diverse schools, and then study specific underrepresented populations, such as students with physical or learning disabilities. Interviewing the participants to collect more qualitative data and better understand user decision making is a needed expansion of this work.
With the current rise in spear phishing, especially among vulnerable populations, it is critical to developing tools and educational approaches to train users to differentiate between authentic and malicious emails. To understand spear phishing attack resilience, we studied a population in a high school environment (). We found that age and confidence play a critical role in the identification of spear phishing attacks. Our study concludes by providing recommendations for developing anti-phishing training tools and communicating risks and benefits.
We would like to the participants of the highschool for their valuable contribution, and Stephanie Davis for encouraging the first author throughout the entire data collection process. We would also like to thank Kevin Gingerich from Eli Lilly for their expert advice on phishing and guiding the first author. This research was supported in part by the National Science Foundation under CNS 1565375, Cisco Research Support, and the Comcast Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s). They do not necessarily reflect the views of the U.S. Government, NSF, Cisco, Comcast, Indiana U, or the University of Denver.
-  (2020 (accessed June 29, 2020)) Phishing activity trends report. Note: "https://docs.apwg.org/reports/apwg_trends_report_q1_2020.pdf Cited by: §1.
-  (2016) Quantifying Phishing Susceptibility for Detection and Behavior Decisions. Human Factors 58 (8), pp. 1158–1172. Cited by: §1, §2, §3.2.
-  (2020) User-centered risk communication for safer browsing. In First Asia USEC-Workshop on Usable Security, In Conjunction with the Twenty-Fourth International Conference International Conference on Financial Cryptography and Data Security, Cited by: §2.
-  (2018) Why johnny doesn’t use two factor a two-phase usability study of the fido u2f security key. In International Conference on Financial Cryptography and Data Security, pp. 160–179. Cited by: §5.
-  (2019) Towards implementing inclusive authentication technologies for older adults. Who Are You. Cited by: §5.
-  (2020) Why don’t older adults adopt two-factor authentication?. Das, S., Kim, A., Jelen, B., Streiff, J., Camp, LJ, & Huber, L.(2020, April). Why Don’t Older Adults Adopt Two-Factor Authentication. Cited by: §5.
-  (2019) All About Phishing Exploring User Research through a Systematic Literature Review. In 13th International Symposium on Human Aspects of Information Security & Assurance, Cited by: §1, §2.
-  (2018) A qualitative study on usability and acceptability of yubico security key. In 7th Workshop on Socio-Technical Aspects in Security and Trust, pp. 28–39. Cited by: §5.
-  (2019) MFA is a waste of time! understanding negative connotation towards mfa applications via user generated content. In Thriteenth International Symposium on Human Aspects of Information Security & Assurance (HAISA 2019), Cited by: §5.
-  (2020) MFA is a necessary chore!: exploring user mental models of multi-factor authentication technologies. In 53rd Hawaii International Conference on System Sciences, Cited by: §5.
-  (2019) Evaluating user perception of multi-factor authentication: a systematic review. arXiv preprint arXiv:1908.05901. Cited by: §5.
-  (2020) A risk-reduction-based incentivization model for human-centered multi-factor authentication. Ph.D. Thesis, Indiana University. Cited by: §5.
-  (2006) Why Phishing Works. In SIGCHI Conference on Human Factors in Computing Systems, pp. 581–590. Cited by: §2.
-  (2007) Learning to Detect Phishing Emails. In 16th International Conference on World Wide Web, pp. 649–656. Cited by: §3.2.
-  (2008) The Threat of Political Phishing. In 2nd International Symposium on Human Aspects of Information Security & Assurance, Cited by: §2.
-  (2012) Risk communication design for older adults. In ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction, Vol. 29, pp. 1. Cited by: §5.
-  (1992) ANOVA: Repeated Measures. Sage Publications Sage CA: Los Angeles, CA. Cited by: §3.3.
-  (2010) Social Engineering: The Art of Human Hacking. John Wiley & Sons. Cited by: §1.
-  (2014) Using Personal Examples to Improve Risk Communication for Security & Privacy Decisions. In SIGCHI Conference on Human Factors in Computing Systems, pp. 2647–2656. Cited by: §5.
-  (2018) Social Engineering in Cybersecurity: The Evolution of a Concept. Computers & Security 73, pp. 102–113. Cited by: §2.
-  (2006) Assessing End-User Awareness of Social Engineering and Phishing. In 7th Australian Information Warfare and Security Conference, pp. 60–72. Cited by: §2.
-  (2009) School of Phish: A Real-World Evaluation of Anti-Phishing Training. In 5th Symposium on Usable Privacy and Security (SOUPS), pp. 1–12. Cited by: §1.
-  (2017) How effective is anti-phishing training for children?. In Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017), pp. 229–239. Cited by: §1.
-  (2018) Signal Detection Theory (SDT) Is Effective for Modeling User Behavior Toward Phishing and Spear-Phishing Attacks. Human Factors 60 (8), pp. 1179–1191. Cited by: §1.
-  (2011) Using Data Type Based Security Alert Dialogs to Raise Online Security Awareness. In 7th Symposium on Usable Privacy and Security (SOUPS), pp. 1–13. Cited by: §5.
-  (2020) Investigating teenagers’ ability to detect phishing messages. In EuroUSEC 2020: The 5th European Workshop on Usable Security, Cited by: §1.
-  (2019) The Impact of Digitalization on Literacy: Digital Immigrants vs. Digital Natives. In 27th European Conference on Information Systems, pp. 1–15. Cited by: §4.
-  (2018) Multi-Factor Authentication: A Survey. Cryptography 2 (1), pp. 1–31. Cited by: §5.
-  (2012) Why do Some People Manage Phishing E-Mails Better than Others?. Information Management & Computer Security 20 (1), pp. 18–28. Cited by: §1.
-  (2010) Phishnet: Predictive Blacklisting to Detect Phishing Attacks. In 29th IEEE Conference on Computer Communications, pp. 1–5. Cited by: §2.
-  (2018 (accessed June 29, 2020)) Phishing: don’t be phooled!. Note: "https://www.dhs.gov/sites/default/files/publications/2018_AEP_Vulnerabilities_of_Healthcare_IT_Systems.pdf Cited by: §2.
-  (2018) Creative Persuasion: A Study on Adversarial Behaviors and Strategies in Phishing Attacks. Frontiers in Psychology 9. Cited by: §2.
-  (2010) Who Falls for Phish? A Demographic Analysis of Phishing Susceptibility and Effectiveness of Interventions. In SIGCHI Conference on Human Factors in Computing Systems, pp. 373–382. Cited by: §5.
-  (2006) Do Security Toolbars Actually Prevent Phishing Attacks?. In SIGCHI Conference on Human Factors in Computing Systems, pp. 601–610. Cited by: §2.
-  (2011) Cantina+ A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites. ACM Transactions on Information and System Security (TISSEC) 14 (2), pp. 1–28. Cited by: §2.