Homograph attack was first described by E. Gabrilovic et al. in  using an example targeting to the brand domain microsoft.com. Until 2017, it raised broad attention when the brand domain apple.com was attacked by homographs  such as the one appears under Punycode form xn--pple-43d.com111International Domain Names (IDNs) contain non-ASCII characters (e.g., Arabic, Chinese, Cyrillic alphabet), they are thus encoded to ASCII strings using Punycode transcription known as IDNA encoding and appear under ASCII strings starting with “xn--". For example, the domain xn--ggle-0qaa.com is displayed as ggle.com., which uses the Cyrillic ‘a’ (U+0430) instead of the ASCII ‘a’ (U+0061). Thereafter, many brand domains were found to be attacked by homographs such as Adobe , LLoyds Bank , Google Analytics , etc. According to a recent large-scale analysis about International Domain Names (IDNs) in 2018 
, just for the first 1,000 target brand domains in top Alexa ranking but there are already over 1,516 registered homograph domains. A lot of defenses have been proposed in application-side (e.g., detecting homographs using machine learning on visual similarity metrics, HTML content or optical character recognition (OCR), etc.). However, to the best of our knowledge, there is no previous work focusing on the pro-active defense which can control the attack rather than just responding to it after it has really happened. In this paper, we devise a system that predicts if human demographics, brand familiarity and security backgrounds can influence participants’ ability to identify homographs. This, in turn, allows for various security training courses aiming to appropriate participants.
To do so, we design a survey and applied it to 2067 participants who are Internet users in Japan. We subsequently build a regression model to study which factors affect participants’ ability in recognizing homographs. We find that for different levels of visual similarity, the participants exhibit different ability. For instance, female participants tend to be able to recognize homographs while male participants tend to able to recognize non-homographs. Moreover, 13.94% of participants can recognize non-homographs. Meanwhile, 16.59% of participants can recognize homographs whose visual similarity with the target brand domains is under 99.9%; but when the similarity increases to 99.9%, the number of participants who can recognize homographs drops down significantly to merely 0.19%; and for the homographs with 100% of visual similarity, there is no way for the participants to recognize. Furthermore, we find that people who are working or educated in computer science or computer engineering are the ones who tend to exhibit the best ability in recognizing all kinds of homographs and non-homographs. Surprisingly, we also find that brand familiarity does not influence the ability in either homographs or non-homographs. Stated differently, people who frequently use the brand domains but do not have enough knowledge are still easy to fall in vulnerabilities. We believe that it opens avenues to help users reduce their presumptuousness and improve knowledge and carefulness about security threats.
The rest of this paper is organized as follows. The related work is described in Section 2. The procedure for preparing the survey is presented in Section 3. The methodology is given in Section 4. The experiment is analyzed in Section 5. The discussion is mentioned in Section 6. Finally, the conclusion is drawn in Section 7.
2 Related Work
A number of previous efforts are related to our work. We first present work proposing defenses for homograph attack. We then discuss work analyzing brand familiarity and security backgrounds in security-related issues.
2.1 Homograph Attack
The countermeasures for this attack can be categorized into the following three approaches: (i) disabling the function of automatic IDN conversion by web browsers, (ii) detecting (then blocking) homographs, and (ii) user training.
Disabling the Automatic IDN Conversion
For the first approach, instead of showing a converted form like ggle.com, the browsers now display the domain’s plain form like xn--ggle-0qaa.com in the address bar, for instance Chrome and Firefox , Safari , Internet Explorer , Opera . However, there is a big trade-off when the web browsers stop supporting the automatic IDN conversion because a large number of Internet users are using non-English languages with non-Latin alphabets through over 7.5 million registered IDNs in all over the world (by December 2017) . Furthermore, the homograph attack exploits not only look-alike Punycode characters in IDNs, but also look-alike Latin characters in even non-IDNs such as bl0gsp0t.com, which targets to the brand domain blogspot.com. Also, if the homographs can deceive users before appearing in the address bar of web browsers (e.g., the homographs are given from an email or a document under hyper-links) without the users’ awareness of the browsers, disabling IDN conversion is not meaningful to prevent users from accessing the homographs.
Detecting (then Blocking) Homographs
For the second approach, several methods have been proposed such as applying machine learning to HTML content and visual screen-shots , some visual similarity measures such as Structural Similarity Index (SSIM)  or Optical Character Recognition (OCR) , extracting homographs from a large-scale analysis on the entire database of all registered domains with Whois information , utilizing Confusable Unicode table defined by Unicode Inc. . Several tools [31, 11, 21, 34, 33, 24, 7] generate permutations of homographs from a defined subset of look-alike characters, then look up Whois and DNS to check whether the homographs are registered and active. Although this approach is the most attractive to the research community, so far, none of them have proven 100% effective in stopping the attack while it becomes more progressive and sophisticated today.
For the third approach, as far as we know there is no previous work analyzing human factors for homographs; however, there are several works analyzing human factors for other security threats such as phishing identification. S. Sheng et al.  analyze the relationship between demographics and phishing susceptibility. They find that women are more susceptible than men to phishing and participants between the ages of 18 and 25 are more susceptible to phishing than other age groups. J. Wang et al.  conduct a real-stimulus survey for spear phishing email to examine how users’ attention to visual triggers, phishing deception indicators, and phishing knowledge influences their decision-making processes and consequently their decisions. E. Lin et al.  evaluate the effectiveness of domain highlighting as an anti-phishing measure but conclude that domain highlighting, while providing some benefit, cannot be relied on to prevent phishing attacks. M. Blythe et al.  investigate a survey to find whether participants who are well educated about phishing can identify phishing. They then find that phishing is beginning to look more convincing with better spelling, grammar and visual appeals like logos, and detection rates for phishing with logos were significantly lower than for those without logos.
2.2 Brand Familiarity/ Security Backgrounds in Computer Security
In this section, we present work which studies whether users’ brand (or web) familiarity and security backgrounds (including security warnings, security knowledge, security behavior, and security self-confidence) affect their decisions on security threats.
T. Kelley et al.  simulate several secure non-spoof and insecure spoof domains with different authentication levels such as extended validation, standard validation, or partial encryption. A logistic model is then applied to participants’ respondents to compare how encryption level, web familiarity, security knowledge, and mouse tracking influence the participant accuracy in identifying spoof and non-spoof websites. Their result shows that user behavior derived from mouse tracking recordings leads to higher accuracy in identifying spoof and non-spoof websites than the other factors. Y. Sawaya et al.  apply the Security Behavior Intentions Scale (SeBIS)  to participants from seven countries and build a regression model to study which factors affect participants’ security behavior using a cross-cultural survey. The work shows an interesting result that self-confidence in computer security has a larger positive effect on security behavior compared to actual knowledge about computer security. I. Kirlappos et al.  show that users do not focus on security warnings (or not understand what they are) rather than looking for signs to confirm whether a site is trustworthy. The study reveals that advice given in some current user educations about phishing is largely ignored. It therefore suggests that rather than flooding users with information, we need to consider how users make decisions both in business and personal settings for the user education. M. Sharif et al.  design a survey of security warnings, users’ behavior, knowledge and self-confidence about security to evaluate the utility of self-reported questionnaire for predicting exposure to malicious content. Their result confirms that the self-reported data can help forecast exposure risk over long periods of time but is not as crucial as behavioural measurements to accurately predict exposure. S. Das et al.  find that social processes played a major role in security behavior. Furthermore, conversations about security are often driven by the desire to warn or protect others from immediate novel threats observed or experienced. C. Erika et al.  study user confidence toward security and privacy for smartphone and find that participants are apprehensive about running privacy- and financially-sensitive tasks on their phones as four factors: fear of theft and data loss, misconceptions about the security of their network communications, worries about accidentally touching or clicking, and mistrust of smartphone applications. I. Iulia et al.  compare security behaviors between expert and non-expert and find that while experts frequently report installing software updates, using two-factor authentication and using a password manager, non-experts report using antivirus software, visiting only known websites, and changing passwords frequently. A. Felt et al.  examine whether security warnings from Android permission system is effective to users. Their result shows that only 17% of participants paid attention to permissions during installation, and only 3% of Internet survey respondents could correctly answer all permission comprehension questions. This indicates that current Android security warnings do not help most users make correct security decisions.
In this section, we present how the survey is designed and distributed to the participants.
The survey is prepared in Japanese. Relying on one of the large survey companies, the survey is applied to 2,067 participants who are Internet users in Japan. The web interface for the survey is designed so that the participants cannot go back to previous questions from a current question. Furthermore, the survey cannot be submitted if any of the questions is not answered yet.
We designed a questionnaire that has three parts related to human factors (demographics, brand familiarity, and security backgrounds) and one part related to participants’ ability to identify homographs.
The questions about participant demographics consist of gender (male: 0 and female: 1), age (the values for this feature are integers from 15 to 70), having a job (have a job: 1, freelancer or part-time job: 0.5, and not have a job: 0), whether participant is working or graduated from computer science or computer engineering (yes: 1, no: 0), which languages that participant has studied so far including English, Chinese, Hindi, Spanish, French, Arabic, Russian, Portuguese, Bengali, German, Korean, Vietnamese, Turkish, Italian, Greek, Dutch, and never study anything other than Japanese (for each language, yes: 1 and no: 0). The languages chosen are the most spoken languages in the world and the most popular languages using Punycode.
3.2.2 Brand Familiarity
The survey includes 9 famous brands: Amazon, Google, Coinbase, Wiki, Booking.com, Expedia, Paypal, Sex.com and Facebook. For each brand, the participants have to answer two questions: whether they know the brand (yes: 1 and no: 0), and how frequently they use the brand (never use: 2, occasionally use: 3, and often use: 4). For all the brands, we compute two more features about the number of brands that the participants know and the total frequency of usage by the summations.
3.2.3 Security Backgrounds
This part includes five questions. The first question is whether participant installed anti-virus software on PCs or mobile devices (yes: 1 and no: 0). The second question is what kind of action that participant will take when browsing a website and a browser or anti-virus software issues a warning (continue browsing without being worried: 1, continue browsing if participant absolutely wants to see the site: 2, continue browsing if the site is famous: 3, do not browse: 4, and never see warnings: 5). The third question is about security behavior that consist of 16 sub-questions as described in Appendix A; and for each of them, the participants have five answer options (not at all: 1, not much: 2, sometimes: 3, often: 4, and always: 5). For this question, we do not use the answer of each of 16 sub-questions but compute the summation of the participant’s answers for all 16 sub-questions. The fourth question is about security knowledge that consists of 18 sub-questions as described in Appendix B. For each of the sub-question, the participants have two answer options (true: 1 or false: 0). Then, based on the actual correct answers given at the end of Appendix B, we count the number of correct answers of the participants. The fifth question is about security self-confidence that consists of 6 sub-questions as described in Appendix C. The participants have five answer options (not at all: 1, not applicable: 2, neither agree nor disagree: 3, applicable: 4, and very applicable: 5). Similar to the third question, we also get the summation of the participant answers for all 6 sub-questions. For the last 3 questions (security behaviors, security knowledge, and security self-confidence), we use the designs from the paper . While the paper aims to analyze factors affecting security behavior (and thus uses it as the target function), in this paper we aim to different target function (i.e., ability in distinguishing homographs) and thus use it as one of the features, not the target function.
3.2.4 Ability of Distinguishing Homographs
This part is designed for the target function. We prepare 18 domains mixing between homographs and non-homographs as showed in Figure 1 and explained in Appendix D. The homographs target to the 9 brands mentioned in the design of brand familiarity. The 18 domains are chosen with different purposes. For example, looking at Figure 1, domain #2 (amazonaws.com
) is chosen because participants probably only knowamazon.com and think amazonaws.com is a homograph but actually it is not; or another example is domain #16 (sex.com) which is a pornographic domain and thus participants probably think it is homograph (unsafe) but actually it is not. For each of the 18 domains, the participants will answer whether it is safe or not. The correct answers for 18 domains are given in the final paragraph of Appendix D. Based on that we can extract if participants have a correct answer for each domain. The correct answers are labelled as 1 (true) and the incorrect answers are labelled as 0 (false). Of course, more domains are more accurate for our analysis. However, a survey should not have too many questions because participants will tend to not actually answer, and around 20 is a good limit for our survey.
In this section, we describe the pre-process on the raw data (users’ responses), determine the target functions and define our model.
4.1 Domain Grouping
Since the 18 domains used for measuring the participants’ ability in distinguishing homographs have different characteristics, we therefore group them into different categories. One of the criteria we use to group the domains is based on visual similarity between the images of the domains and the images of their corresponding brand domains. In this paper, we use Structural Similarity Index (SSIM)
for the visual similarity metric. SSIM is a modern and common measure for visual similarity based on visible structures in the images. SSIM outperforms the traditional methods such as Peak Signal-To-Noise Ratio (PSNR) and Mean Squared Error (MSE) which can estimate absolute errors only. The SSIM between two imagesand of the same size is calculated as:
and represent the averages of and respectively. and
represent the variances ofand respectively. and represent the variables to stabilize the division with weak denominator where is the dynamic range of the pixel-values and is typically set to and by default. SSIM values where 1 indicates perfect similarity.
There are 3 following groups:
Group 1: Homographs with SSIM >= 0.999. This group consists of four homographs which are domains #3, #4, #10 and #15 as depicted in Figure 1. The domains #3, #4 and #15 have SSIM = 1 and the domain #10 has SSIM = 0.999 with their corresponding brand domains. The domain #10 does not have SSIM = 1 but 0.999 because of the look-alike letter ‘g’ which can be visually recognized by human but very difficultly.
Group 2: Homographs with SSIM < 0.999. This group consists of seven homographs which are domains #1, #6, #7, #9, #12, #14, and #17 as depicted in Figure 1. This group considers the homographs whose SSIM scores are lower than those in Group 1, but not so low, i.e., ranging from 0.838 to 0.996. The reason we do not consider other homographs with lower SSIM because they are very trivial for the participants to recognize.
Group 3: Non-homographs. In this group, we consider two cases. The first case (namely group 3.1) is the normal non-homographs including the domains #2, #5, #8, #11, #13 and #18 in Figure 1. The other case (namely group 3.2) is the sensitive non-homograph such as the pornographic domain #16 (sex.com). The reason we separately consider group 3.2 because we want to know how participants balance their decisions between a domain that is famous and actually safe with a domain that is notorious for its content category (e.g., pornographic, darknet, terrorism).
For each group, we then apply the model with different target functions in the experiment. The domains for each group with the corresponding SSIM are summarized in Table 1.
Let denote our model which is the ability in distinguishing homographs. As mentioned in the design, we use the following model:
The explanatory variables related to are gender, age, job, work/educated in computer science or computer engineering, and languages. The explanatory variables related to are the number of brands that the participants know and the frequency that the participants use the brands. The explanatory variables related to are security warnings, security behaviors, security knowledge, and security self-confidence.
We now define concrete target functions for as follows. For each group in Section 4.1, we use three target functions. The first target is defined as follows:
where denotes the values indicating whether the final decisions of the participants for distinguishing homographs are correct or not, denotes each domain belonging to the group . The second target is defined as follows:
where denotes the difficulty of the domain and is defined as () in which is the number of participants who have correct decisions for the domain and is the total number of participants. For example, there are 10 participants in which 7 participants answer correctly and thus the difficulty of the question is . The third target is defined as follows:
where denotes the SSIM between the domain and its corresponding brand domain
. To this end, we use a multiple (linear) regression model for each target function. The factors affect the target functions that we are aiming to find are the ones that have the valuein the result of -test and are marked as ‘*’ in the experiment result. The factors that have are considered as significant factors, i.e., significantly affect the target functions, and are marked as ‘***’ in the experiment result. The factors that have are also considered and marked as ‘**’ in the experiment result.
Note that, we do not use SSIM and difficulty as features but as components in the target functions because of two reasons. First, the SSIM and the difficulty are not related to human characteristics but domain characteristics, and our goal in this paper is to analyze the human factors. Second, the SSIM and the difficulty are the same for all 2,067 participants, and thus the regression model will definitely shows that the SSIM and difficulty have that indicates SSIM and the difficulty are always the factors affecting target functions. Furthermore, even though are a geometric progression (a sequence where each term after the first is found by multiplying the previous one by non-zero constants, i.e., the difficulty and SSIM), they always give the same factors but not always give the same -values for each of these factors. Only one case they give the same factors and the same -values is when there is only one domain in the group (there is no summation when computing ). In addition, if there are multiple factors found, they do not conflict together even though some of them seem to be opposite because they are defined as independent variables in the linear regression model. For example, two factors found for a certain target function are: (i) having a job related to computer, and (ii) not frequently use Internet. That is, people influence are having a job related to computer OR (not AND) not frequently use the Internet. Finally, for the group that has a single domain and its SSIM = 1 with their brand domains, and will give the same result.
4.3 Handling of Noises
From the raw dataset, we need to deal with the noises before performing the regression model.
4.3.1 Lucky Answers
A lucky answer is the one that has a correct decision but inappropriate reason. We planned for this step when designing our survey. Therefore, in the final part of participant decisions on distinguishing homographs in 18 domains, the participants also have to answer the reason why they think the domains are homographs. We manually check each reason and flip participant decision from true to false if the correct decisions on homographs have inappropriate reasons in each group. For example, participants who decided goole.co.jp as a homograph and answered correct reasons such as “the letter g is missing" are not considered as lucky answers; but participants who answered unclear reasons such as “I have a feeling that" or incorrect reasons such as “Google only has .co.jp as top-level domain, and thus google.com is unsafe" are considered as lucky answers.
4.3.2 Weak Features
For the features with binary variables, we excluded the ones that have less than 10% of ratio. For example, there are only 2 out of 2067 participants (0.097%) who know Vietnamese, and thus the feature “knowing Vietnamese" is removed. For the features with non-binary variables (e.g., age or security knowledge scores, etc.), the ones that even have non-Gaussian distribution are not considered as weak features and are not removed. This is because the linear regression by itself does not need the Gaussian assumption for the raw data but the residuals which can be estimated by linear least squares.
The programs were written in Python 2.7.11 on a computer Intel(R) core i7, RAM 16.0 GB, 64-bit Windows 10. The multiple (linear) regression model is executed using scikit-learn version 0.18. The SSIM is computed using the skimage version 0.15.dev0. The -test is applied using statsmodels version 0.8.
5.1 Experiment Plans
For each group, we at first find the number of correct answers and lucky answers. We then apply the regression model using two experiment plans as follows:
Plan 1 (Separation): Each domain is considered separately. On a given domain, the target functions (Equation 3), (Equation 4), and (Equation 5) are simply applied for only one column that is the binary values indicating whether the participant decisions for the given domain are correct or not. We then find the factors affecting the target functions for each domain by selecting the ones that have the values . Finally, the common factors are extracted for all the domains in the group. The common factors are defined as the ones that affect over 50% of the number of domains in the group. If the group has only a single domain, there is no common factors but only the ones affecting the single domain.
Plan 2 (Integration): The target functions (Equation 3), (Equation 4) and (Equation 5) are applied to all domains in the group by using the summation. For example, a group has 3 domains (non-homograph), (homograph), (homograph). The decisions of a participant for the domains are: (non-homograph), (non-homograph), (homograph). The accuracy of the participant decisions are: (1: true), (0: false), and (1: true). The target functions are determined as , , and . Finally, the factors which have are extracted for each target function.
For the groups that have only a single domain, only plan 1 is performed. For the groups that have multiple domains, both plan 1 and 2 are performed.
In this section, we present the experiment result and its analysis for each group. We at first should mention that the weak features (as explained in Section 4.3) removed are: knowing several languages (including Hindi, Spanish, French, Arabic, Russian, Portuguese, Bengali, Korean, Vietnamese, Turkish, Italian, Greek, Dutch), warnings issued and continue browsing without being worried, and never see warnings. Furthermore, we recruited 2,067 participants so that our recruited sample’s demographics (gender and age) statistically matches those of Japanese Internet users as reported in the 2015 census by a reliable source (ComScore Inc.) . The survey company administered the survey online on our behalf. The statistics for the gender and age in our survey and the actual gender and age are described as follows:
Gender: the number of male participants is 1,034 (50.02%), the number of female participants is 1,033 (49.98%), the actual percentage of men within the population of Internet users is approximately 50% .
The distribution between gender and age is given in Table 2.
5.2.1 Group 1: Homographs with SSIM >= 0.999
When designing the survey, we expected that nobody can distinguish these domains as homographs because of their too high visual similarity to the brand domains. However, surprisingly the numbers of correct answers for four domains in this group are high (over 30% for each domain, even up to 63.5% for the domain #10) in total 2,067 participants. We then analyzed how the participants explained for the reasons that they think the domains are homographs. We find that also large portions are lucky answers with inappropriate reasons. The domains #3, #4, #15 have even 100% of lucky answers. The domain #10 has almost 99.7% of lucky answers and its number of correct answers is merely 4 over 2067 (0.19%). We categorized the lucky answers into three types: (i) the unclear reasons like “I have feeling so", “I never use the domain so it is better to answer it is homograph", “just random answer", etc.; (ii) the reasons caused by local survey like “Just know that Amazon has only .co.jp, not .com"; and (iii) other incorrect reasons like “the first letter should be capital". The summarized statistics is given in Table 3. Due to the large portion of lucky answers, we only did the statistics without the need to apply the regression model. We are now relieved to confirm that there is no way for the human to distinguish such extremely high-SSIM homographs. This raises the seriousness of homograph attack.
|#3||incorrect: 1411||lucky: 656 (100%)|
|correct: 656||+feeling: 616|
|+incorrect reason: 20|
|#4||incorrect: 1432||lucky: 635 (100%)|
|correct: 635||+feeling: 587|
|+incorrect reason: 20|
|#10||incorrect: 755||correct reason: 4|
|correct: 1312||lucky: 1308 (99.7%)|
|+incorrect reason: 27|
|#15||incorrect: 756||lucky: 1311 (100%)|
|correct: 1311||+feeling: 1263|
|+incorrect incorrect: 40|
5.2.2 Group 2: Homographs with SSIM < 0.999
The number of correct answers including lucky answers for this group is much higher than that of other groups. Table 4 shows that the average number of correct answers is 84.01%, and that when excluding lucky answers is 16.59% which is still dominant over Group 1. For the lucky answers, we do not remove but flip the participant answers from true to false for the corresponding domains.
|average over 2067||84.01%||67.41%||16.59%|
In the experiment plan 1 (separation), the common factors found () for all the 7 domains in this group are as follows 222The detailed results for each domain can be referred in  (anonymized as required):
gender: appear in 5/7 domains (#7, #9, #12, #14, #17) in which all the 7 domains have positive coefficient (female), is a significant factor of domain #12 ().
age: appear in 7/7 domains (#1, #6, #7, #9, #12, #14, #17) in which all the 7 domains have a negative coefficient (younger), is a significant factor of domains #1, #7, #12, #14, #17 ().
work/educated in computer science or computer engineering: appear in 4/7 domains (#1, #7, #12, #14) in which all the 4 domains have positive coefficient (do work/edu in computer science or computer engineering), is a significant factor of the domains #1 ().
security knowledge: appear in 7/7 domains (#1, #6, #7, #9, #12, #14, #17) in which all the 7 domains have positive knowledge (more security knowledge), is a significant factor of all the 7 domains ().
For the experiment plan 2 (integration), , and result in the same factors but different -values, coefficients, and 95% CI since there is more than one domain in this group. The concrete result is showed in Table 5. The final 4 factors found with are: female (since the coefficient is positive), younger (since the coefficient is negative for older), do work/edu in computer science or computer engineering (since the coefficient is positive), more security knowledge (since the coefficient is positive).
|(Intercept)||-1.27||0.006||[-2.174, -0.365]||-0.191||0.006||[-0.326, -0.055]||-0.182||0.005||[-0.311, -0.054]|
|Female||0.263||0.002||[0.098, 0.427]||0.033||0.008||[0.009, 0.058]||0.032||0.007||[0.009, 0.055]|
|Older||-0.019||<0.001||[-0.024, -0.014]||-0.003||<0.001||[-0.003, -0.002]||-0.003||<0.001||[-0.003, -0.002]|
|Having a job||0.001||0.990||[-0.176, 0.178]||-0.002||0.867||[-0.029, 0.024]||-0.003||0.842||[-0.028, 0.023]|
|Know English||0.346||0.106||[-0.074, 0.766]||0.061||0.056||[-0.002, 0.124]||0.058||0.058||[-0.002, 0.117]|
|Know Chinese||-0.117||0.494||[-0.453, 0.219]||-0.034||0.189||[-0.084, 0.017]||-0.032||0.195||[-0.079, 0.016]|
|Know German||-0.043||0.800||[-0.375, 0.290]||-0.01||0.706||[-0.059, 0.040]||-0.008||0.731||[-0.056, 0.039]|
|Know Japanese only||-0.091||0.676||[-0.515, 0.334]||-0.002||0.942||[-0.066, 0.061]||-0.003||0.911||[-0.064, 0.057]|
|#Known languages||0.095||0.506||[-0.186, 0.377]||0.016||0.466||[-0.026, 0.058]||0.014||0.481||[-0.026, 0.054]|
|Installed anti-virus||0.108||0.270||[-0.084, 0.301]||0.019||0.185||[-0.009, 0.048]||0.018||0.186||[-0.009, 0.046]|
|Warning, browse||-0.074||0.622||[-0.367, 0.220]||-0.014||0.525||[-0.058, 0.030]||-0.013||0.533||[-0.055, 0.028]|
|if want to see|
|Warning, browse||0.137||0.351||[-0.151, 0.425]||0.021||0.338||[-0.022, 0.064]||0.02||0.338||[-0.021, 0.061]|
|if site is famous|
|Warning, not browse||0.031||0.778||[-0.184, 0.245]||0.005||0.779||[-0.028, 0.037]||0.005||0.768||[-0.026, 0.035]|
|Know brands||0.09||0.642||[-0.289, 0.468]||0.023||0.418||[-0.033, 0.080]||0.02||0.469||[-0.034, 0.074]|
|Frequently use brands||-0.029||0.631||[-0.147, 0.089]||-0.007||0.414||[-0.025, 0.010]||-0.006||0.460||[-0.023, 0.010]|
|Work/edu in CS/CE||0.484||0.002||[0.179, 0.789]||0.074||<0.001||[0.028, 0.120]||0.072||<0.001||[0.028, 0.115]|
|More sec. behavior||0.004||0.297||[-0.003, 0.011]||0.001||0.293||[-0.001, 0.002]||0.001||0.264||[-0.000, 0.002]|
|More sec. knowledge||0.163||<0.001||[0.133, 0.194]||0.024||<0.001||[0.020, 0.029]||0.023||<0.001||[0.019, 0.027]|
|More sec. confidence||0.013||0.171||[-0.006, 0.032]||0.002||0.244||[-0.001, 0.004]||0.002||0.241||[-0.001, 0.004]|
5.2.3 Group 3: Non-homograph
For this group, we do not need to remove lucky answers because: if the participants answer correctly (i.e., the domains are non-homograph), there is nothing to do; but if they answer incorrectly (the domains are homograph), with any reason participant decisions are wrong. According to the statistics given in Table 6, the number of correct answers for each domain in this group is lowest compared with other groups. This indicates that the domains in this group have higher difficulty. We now consider each case (i.e., group 3.1 and 3.2) as follows.
Group 3.1: Normal Non-homograph
This group has 6 domains: #2, #5, #8, #11, #13, #18. For the experiment plan 1 (separation), the common factors that appear in more than 3 domains (50% of total 6 domains in this group) found with are the followings 333The detailed results for each domain can be referred in  (anonymized as required):
gender: appear in 5/6 domains (#2, #5, #8, #11, #18) in which all the 5 domains have negative coefficients (male), is a significant factor of domain #2 ().
age: appear in 4/6 domains (#5, #8, #11, #13) in which 1 domain has positive coefficient (older) and 3 domains have negative coefficients (younger), is a significant factor of domains #11, #13 ().
having a job: appear in 4/6 domains (#2, #8, #11, #13) in which all the 4 domains have negative coefficients (not having job).
warnings and not browse: appear in 4/6 domains (#2, #5, #11, #18) in which all the 4 domains have negative coefficients (warning, still browse), is a significant factor of domain #5 ().
security knowledge: appear in 5/6 domains (#2, #5, #8, #11, #17) in which all the 5 domains have negative coefficients (less security knowledge), is a significant factor of all the 5 domains ().
For the experiment plan 2 (integration), similar to Group 2, , and result in the same factors but different -values, coefficients, and 95% CI since there is more than 1 domain in this group. The concrete result is showed in Table 7. The final 7 factors found are: male (since coefficients are negative for female), younger (since the coefficient are negative for older), not having a job (since the coefficients are negative for having a job), warning but still browse (since the coefficients are negative for warning and not browse, know brand (since the coefficients are positive for itself), work/edu in computer science or computer engineering (since the coefficients are negative for itself), less security knowledge (since the coefficients are negative for more security knowledge).
|(Intercept)||1.610||<0.001||[1.057, 2.164]||2.204||<0.001||[1.433, 2.974]||3.796||<0.001||[3.026, 4.567]|
|Female||-0.153||<0.001||[-0.246, -0.061]||-0.206||0.002||[-0.335, -0.078]||0.206||0.002||[0.078, 0.335]|
|Older||-0.004||0.007||[-0.007, -0.001]||-0.006||0.003||[-0.010, -0.002]||0.006||0.003||[0.002, 0.010]|
|Having a job||-0.166||<0.001||[-0.266, -0.067]||-0.234||<0.001||[-0.372, -0.095]||0.234||<0.001||[0.095, 0.372]|
|Know English||-0.022||0.910||[-0.402, 0.358]||-0.041||0.878||[-0.570, 0.488]||0.041||0.878||[-0.488, 0.570]|
|Know Chinese||0.032||0.709||[-0.136, 0.200]||0.04||0.740||[-0.194, 0.273]||-0.04||0.740||[-0.273, 0.194]|
|Know German||-0.115||0.138||[-0.267, 0.037]||-0.163||0.131||[-0.374, 0.048]||0.163||0.131||[-0.048, 0.374]|
|Know Japanese only||0.074||0.707||[-0.313, 0.462]||0.085||0.757||[-0.454, 0.624]||-0.085||0.757||[-0.624, 0.454]|
|#Known languages||0.045||0.199||[-0.024, 0.115]||0.069||0.160||[-0.027, 0.166]||-0.069||0.160||[-0.166, 0.027]|
|Installed anti-virus||-0.042||0.446||[-0.151, 0.066]||-0.06||0.432||[-0.211, 0.090]||0.06||0.432||[-0.090, 0.211]|
|Warning, browse||-0.021||0.805||[-0.185, 0.144]||-0.03||0.799||[-0.259, 0.199]||0.03||0.799||[-0.199, 0.259]|
|if want to see|
|Warning, browse||-0.151||0.066||[-0.313, 0.010]||-0.209||0.068||[-0.434, 0.016]||0.209||0.068||[-0.016, 0.434]|
|if site is famous|
|Warning, not browse||-0.209||<0.001||[-0.329, -0.089]||-0.286||<0.001||[-0.454, -0.119]||0.286||<0.001||[0.119, 0.454]|
|Know brands||0.077||0.014||[0.016, 0.139]||0.114||0.010||[0.028, 0.200]||-0.114||0.010||[-0.200, -0.028]|
|Frequently use brands||-0.011||0.407||[-0.038, 0.015]||-0.015||0.431||[-0.051, 0.022]||0.015||0.431||[-0.022, 0.051]|
|Work/edu in CS/CE||0.273||0.002||[0.102, 0.444]||0.386||<0.001||[0.148, 0.623]||-0.386||<0.001||[-0.623, -0.148]|
|More sec. behavior||-0.002||0.288||[-0.006, 0.002]||-0.003||0.313||[-0.009, 0.003]||0.003||0.313||[-0.003, 0.009]|
|More sec. knowledge||-0.051||<0.001||[-0.068, -0.034]||-0.069||<0.001||[-0.093, -0.045]||0.069||<0.001||[0.045, 0.093]|
|More sec. confidence||0.008||0.148||[-0.003, 0.018]||0.011||0.143||[-0.004, 0.025]||-0.011||0.143||[-0.025, 0.004]|
Group 3.2: Sensitive Non-homograph
This group has a single domain (#16), thus only plan 1 (separation) is performed as mentioned in Section 5.1. All the three target functions result in the same factors, same -values but different coefficients, and 95% CI. The result is showed in Table 8. The final 3 factors found are: age (younger since the coefficient is negative for older), having a job (not having a job since the coefficient is negative for having a job), work/edu in computer science or computer engineering (the coefficient is positive for itself). Since the domain #16 is a non-homograph and is the brand domain itself, SSIM = 1. Therefore, and give the same result for all values.
|Coef.||95% CI||Coef.||95% CI|
|(Intercept)||<0.001||***||0.324||[0.132, 0.517]||0.289||[0.118, 0.460]|
|Female||0.271||-0.016||[-0.045, 0.013]||-0.014||[-0.040, 0.011]|
|Older||<0.001||***||-0.001||[-0.002, -0.001]||-0.001||[-0.002, -0.000]|
|Having a job||0.009||**||-0.041||[-0.072, -0.010]||-0.036||[-0.064, -0.009]|
|Know English||0.969||-0.002||[-0.121, 0.116]||-0.002||[-0.108, 0.104]|
|Know Chinese||0.231||-0.032||[-0.085, 0.020]||-0.029||[-0.075, 0.018]|
|Know German||0.052||-0.047||[-0.095, 0.000]||-0.042||[-0.084, 0.000]|
|Know Japanese only||0.854||-0.011||[-0.132, 0.109]||-0.01||[-0.118, 0.097]|
|#Known languages||0.212||0.014||[-0.008, 0.035]||0.012||[-0.007, 0.032]|
|Installed anti-virus||0.424||-0.014||[-0.047, 0.020]||-0.012||[-0.042, 0.018]|
|Warning, browse if want to see||0.650||-0.012||[-0.063, 0.039]||-0.011||[-0.056, 0.035]|
|Warning, browse if site is famous||0.897||-0.003||[-0.054, 0.047]||-0.003||[-0.048, 0.042]|
|Warning, not browse||0.109||-0.031||[-0.068, 0.007]||-0.027||[-0.061, 0.006]|
|Know brands||0.124||0.125||[-0.034, 0.284]||0.111||[-0.031, 0.253]|
|Frequently use brands||0.341||-0.051||[-0.157, 0.054]||-0.046||[-0.140, 0.048]|
|Work/edu in CS/CE||0.022||*||0.062||[0.009, 0.116]||0.056||[0.008, 0.103]|
|More sec. behavior||0.724||0.0002||[-0.001, 0.002]||0.0||[-0.001, 0.001]|
|More sec. knowledge||0.104||-0.004||[-0.010, 0.001]||-0.004||[-0.009, 0.001]|
|More sec. confidence||0.886||0.0002||[-0.003, 0.003]||0.0||[-0.003, 0.003]|
In this section, we discuss our current limitations and several challenges that we leave for future work.
6.1 Improvement of Survey
First, the responses will be more objective if the survey is distributed to not local (e.g., Japanese in this work) but global participants. In this case, a translation of the survey across the countries should be appropriately considered while preserving its reliability and structure validity. Second, we plan to create a survey in which the participants are required to answer twice for the same domains before and after they read a description explaining about homograph attack. We can then learn whether and how their decision-making change after being trained, and after that whether factors are changed or only coefficients are changed. Third, some other features which may affect participant ability in recognizing homographs but are not listed in the current survey such as how many hours for using Internet per day, factors related to participant psychology like emotional state, demands and environment when answering the questionnaire, etc. Fourth, it would be better if the domains are asked to the participants in an actual simulation rather than in a self-report questionnaire. In this case, we not only can reduce bias but also can extract other information related to participants (e.g., time of accessing domains, scenario of accessing domains, mouse move).
6.2 Improvement of Model and Data
The current target functions only consider the SSIM and the difficulty of the domains. Other values should be also considered such as:
Alexa ranking: This is because some domains are very famous (e.g., amazon.com or google.com), and thus the participants are more familiar with them rather than the ones which are less popular (e.g., coinbase.com). The Alexa ranking can be global scope (if the survey is applied to multiple countries) or local scope such as Japan in this survey (if the survey is applied to a country).
Order of the domains in the questionnaire: There is a bias when the participants tend to carefully answer the first domains but gradually tend to answers the domains randomly; therefore, the domain order in the questions should be added as a component in the target function. However, there is another bias when the participants answer all domains as homographs because they perhaps think that it has a high probability for the domains to be homographs in such a security survey, or think that false positive is better than false negative when they are not sure.
Furthermore, even though we mentioned in the experiment about non-homographs (Section 5.2.3) that there is no need to change user decision for lucky answers in non-homographs, we want to know whether the factors are changed in the case when lucky answers are flipped.
We designed and ran an online study to explore how user demographics, brand familiarity, and security backgrounds affect the ability in recognizing homographs. We collected 2,067 responses to our survey from participants located in Japan and analyzed them using linear regression. Our results shed light on the differences in the ability of homograph recognition for different kinds of homographs. While female participants tend to be able to recognize homographs, male participants tend to able to recognize non-homographs. For homographs with SSIM under 0.999, 16.59% of participants can recognize homographs. However, when SSIM slightly increases to 0.999 (or over), the number of participants who can recognize homographs drops down significantly to merely 0.19%, and for the homographs with SSIM = 1, there is no way for the participants to recognize. This raises the seriousness of the homograph attack. We also find that people working or educated in computer science or computer engineering are the ones who tend to exhibit the best ability in recognizing all kinds of homographs and non-homographs. Especially, we find that brand familiarity does not influence the ability in either homographs or non-homographs. We therefore recommend looking into directions beyond user education to promote more ability in homograph recognition.
-  Anonymous. Experiement results in details. February 2019. Available: https://drive.google.com/file/d/1n-rt_BKlFmt-ZhTcJBqqQ6ZQgHur0sH1/view?usp=sharing.
-  AppleInc. About safari international domain name support. October 2016. Available: https://support.apple.com/kb/TA22996?locale=en_US.
-  M. Blythe, H. Petrie, and J. A. Clark. F for fake: Four studies on how we fall for phish. SIGCHI Conference on Human Factors in Computing Systems (CHI’11), pages 3469–3478, May 2011.
-  E. Chin, A. P. Felt, V. Sekary, and D. Wagner. Measuring user confidence in smartphone security and privacy. Eighth Symposium on Usable Privacy and Security (SOUPS’12), July 2012.
-  G. Cluley. Lloydsbank, iioydsbank - researcher highlights the homographic phishing problem. June 2015. Available: https://www.grahamcluley.com/lloydsbank-homographic-phishing-problem/.
-  ComScore-Inc. 2015 japan digital audience report. May 2016. Available: https://www.comscore.com/layout/set/popup/Request/Presentations/2015/2015-Japan-Digital-Audience-Report?req=slides&pre=2015+Japan+Digital+Audience+Report.
-  A. Crenshaw. Homoglyph attack generator. Available: http://www.irongeek.com/homoglyph-attack-generator.php.
-  S. Das, T. H.-J. Kim, L. A. Dabbish, , and J. I. Hong. The effect of social influence on security sensitivity. 10th USENIX Conference on Usable Privacy and Security (SOUPS’14), pages 143–157, July 2014.
-  S. Egelman and E. Peer. Scaling the security wall: Developing a security behavior intentions scale (sebis). 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15):2873–2882, April 2015. DOI: http://dx.doi.org/10.1145/2702123.2702249.
-  A. P. Felt, E. Ha, and S. Egelman. Android permissions: User attention, comprehension, and behavior. Eighth Symposium on Usable Privacy and Security (SOUPS’12), July 2012.
-  T. Furrer. Idn homograph attack. May 2017. Available: https://github.com/timofurrer/idn-homograph-attack.
-  E. Gabrilovich and A. Gontmakher. The homograph attack. Communications of the ACM, 45(2):128–129, February 2002.
-  I. Ion, R. Reeder, and S. Consolvo. …no one can hack my mind: Comparing expert and non-expert security practices. 11th USENIX Conference on Usable Privacy and Security (SOUPS’15), pages 327–346, July 2015.
-  T. Kelley, M. J. Amon, and B. I. Bertenthal. Statistical models for predicting threat detection from human behavior. Front Psychology, 9(466), April 2018.
-  I. Kirlappos and A. Sasse. Security education against phishing: A modest proposal for a major rethink. IEEE Security and Privacy, 10(2):24–32, March 2012.
-  E. Lin, S. Greenberg, E. Trotter, D. Ma, and J. Aycock. Does domain highlighting help people identify phishing sites? SIGCHI Conference on Human Factors in Computing Systems (CHI’11), pages 2075–2084, May 2011.
-  B. Liu, C. Lu, Z. Li, Y. Liu, H. Duan, S. Hao, and Z. Zhang. A reexamination of internationalized domain names: The good, the bad and the ugly. 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’18), June 2018.
-  M. Maunder. Chrome and firefox phishing attack uses domains identical to known safe sites. April 2017. Available: https://www.wordfence.com/blog/2017/04/chrome-firefox-unicode-phishing/.
-  Microsoft. Changes to idn in ie7 to now allow mixing of scripts. July 2006. Available: https://blogs.msdn.microsoft.com/ie/2006/07/31/changes-to-idn-in-ie7-to-now-allow-mixing-of-scripts/.
-  M. Mimoso. Idn homograph attack spreading betabot backdoor. September 2017. Available: https://threatpost.com/idn-homograph-attack-spreading-betabot-backdoor/127839/.
-  A. Moretto and V. Augusto. Evilurl: Generate unicode evil domains for idn homograph attack and detect them. February 2018. Available: https://github.com/UndeadSec/EvilURL.
-  NTT-Security. Idn homograph attacks. January 2017. Available: https://www.solutionary.com/resource-center/blog/2017/01/idn-homograph-attacks/.
-  Opera. Advisory: Internationalized domain names (idn) can be used for spoofing. February 2007. Available: https://web.archive.org/web/20070219070826/http://www.opera.com/support/search/view/788/.
-  D. Pedia. Search domain zones. Available: https://dnpedia.com/tlds/search.php.
-  I. W. Report. Internationalised domains show negative growth in 2017. December 2017. Available: https://idnworldreport.eu/.
-  Y. Sawabe, D. Chiba, M. Akiyama, and S. Goto. Detecting homograph idns using ocr. 46th Asia Pacific Advanced Network (APAN), August 2018.
-  Y. Sawaya, M. Sharif, N. Christin, A. Kubota, A. Nakarai, and A. Yamada. Self-confidence trumps knowledge: A cross-cultural study of security behavior. Conference on Human Factors in Computing Systems, pages 2202–2214, April 2017.
-  M. Sharif, J. Urakawa, N. Christin, A. Kubota, and A. Yamada. Predicting impending exposure to malicious content from user behavior. 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS’18), pages 1487–1501, October 2018.
-  S. Sheng, M. Holbrook, P. Kumaraguru, L. F. Cranor, and J. Downs. Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions. SIGCHI Conference on Human Factors in Computing Systems (CHI’10), pages 373–382, April 2010.
-  K. Tian, S. Jan, H. Hu, D. Yao, and G. Wang. Needle in a haystack: Tracking down elite phishing domains in the wild. Internet Measurement Conference (IMC’18), pages 429–442, October 2018.
-  M. Ulikowski. Dnstwist: Domain name permutation engine for detecting typo squatting, phishing and corporate espionage. November 2018. Available: https://github.com/elceef/dnstwist.
-  Unicode-Inc. Unicode security mechanisms for uts #39. 2018. Available: http://www.unicode.org/Public/security/11.0.0/confusables.txt.
-  R. Verhoef. Domain name generator. Available: https://instantdomainsearch.com/domain/generator/.
-  R. Verhoef. Homographs: brutefind homographs within a font. April 2017. Available: https://github.com/dutchcoders/homographs.
-  J. Wang, T. Herath, R. Chen, A. Vishwanath, and R. Rao. Research article phishing susceptibility: An investigation into the processing of a targeted spear phishing email. IEEE Transactions on Professional Communication, 55(4):345–362, December 2012.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
-  X. Zheng. Phishing with unicode domains. April 2017. Available: https://www.xudongz.com/blog/2017/idn-phishing/?_ga=2.53371112.1302505681.1542677803-1987638994.1542677803.
Appendix A Security Behaviors
(not at all/ not much/ sometimes/ often/ always):
I set my computer screen to automatically lock if I don’t use it for a prolonged period of time.
I use a password/passcode to unlock my laptop or tablet.
I manually lock my computer screen when I step away from it.
I use a PIN or passcode to unlock my mobile phone.
I change my passwords even if it is not needed.
I use different passwords for different accounts that I have.
When I create a new online account, I try to use a password that goes beyond the site’s minimum.
I include special characters in my password even if it’s not required. requirements.
When someone sends me a link, I open it only after verifying where it goes.
I know what website I’m visiting by looking at the URL bar, rather than by the website’s look and feel.
I verify that information will be sent securely (e.g., SSL, “https://", a lock icon) before I submit it to websites.
When browsing websites, I mouseover links to see where they go, before clicking them.
If I discover a security problem, I fix or report it rather than assuming somebody else will.
When I’m prompted about a software update, I install it right away.
I try to make sure that the programs I use are up-to-date.
I verify that my anti-virus software has been regularly updating itself.
Appendix B Security Knowledge
My Internet provider and location can be disclosed from my IP address.
My telephone number can be disclosed from my IP addresses.
The web browser information of my device can be disclosed to the operators of websites.
Since Wi-Fi networks in coffee shops are secured by the coffee shop owners, I can use them to send sensitive data such as credit card information.
Password comprised of random characters are harder for attackers to guess than passwords comprised of common words and phrases.
If I receive an email that tells me to change my password, and links me to the web page, I should change my password immediately.
My devices are safe from being infected while browsing the web because web browsers only display information.
It is impossible to confirm whether secure communication is being used between my device and a website.
My information can be stolen if a website that I visit masquerades as a famous website (e.g., amazon.com).
I may suffer from monetary loss if a website that I visit masquerades as a famous website.
My devices and accounts may be put at risk if I make a typing mistake while entering the address of a website.
My IP address is secret and it is unsafe to share it with anyone.
If my web browser does not show a green lock when I visit a website, then I can deduce that the website it is malicious.
It is safe to open links that appear in emails in my inbox.
It is safe to open attachments received via email.
I use private browsing mode to protect my machine from being infected.
It is safe to use anti-virus software downloaded through P2P file sharing services.
Machines are safe from infections unless participants actively download malware.
The actual correct answers for the 18 sub-questions are: true for sub-questions 1, 3, 5, 9, 10, 11, and false for the others.
Appendix C Security Self-Confidence
(not at all/ not applicable/ neither agree nor disagree/ applicable/ very applicable)
I know about countermeasures for keeping the data on my device from being exploited.
I know about countermeasures to protect myself from monetary loss when using the Internet.
I know about countermeasures to prevent my IDs or Passwords being stolen.
I know about countermeasures to prevent my devices from being compromised.
I know about countermeasures to protect me from being deceived by fake web sites.
I know about countermeasures to prevent my data from being stolen during web browsing.
Appendix D Ability of Homograph Recognition
Domain #1: xn--mazon-zjc.com (displayed as the sample 1 in Figure 1).
Domain #2: amazonaws.com.
Domain #3: xn--mazon-3ve.com (displayed as the sample 3 in Figure 1).
Domain #4: xn--gogle-m29a.com (displayed as the sample 4 in Figure 1).
Domain #5: google.com.vn.
Domain #6: goole.co.jp.
Domain #7: xn--coinbas-z8a.com (displayed as the sample 7 in Figure 1).
Domain #8: wikimedia.org.
Domain #9: xn--wikipdia-f1a.org (displayed as the sample 9 in Figure 1).
Domain #10: xn--bookin-n0c.com (displayed as the sample 10 in Figure 1).
Domain #11: jbooking.jp.
Domain #12: xn--expeda-fwa.com (displayed as the sample 12 in Figure 1).
Domain #13: expedia.co.jp.
Domain #14: xn--paypl-6qa.com (displayed as the sample 14 in Figure 1).
Domain #15: xn--pypal-4ve.com (displayed as the sample 15 in Figure 1).
Domain #16: sex.com.
Domain #17: faeceb0ok.com.
Domain #18: vi-vn.facebook.com.
The 18 domains are displayed respectively in Figure 1. The correct answers for the 18 domains are as follows:
Homograph: the domains #1, #3, #4, #6, #7, #9, #10, #12, #14, #15, #17.
Non-homographs: the others domains.
The homographs #1 and #3 target to the brand Amazon. The homographs #4 and #6 target to the brand Google. The homograph #7 targets to the brand Coinbase; the homograph #9 targets to the brand Wikipedia. The homograph #10 targets to the brand Booking. The homograph #12 targets to the brand Expedia. The homographs #14 and #15 target to the brand Paypal. The homograph #17 targets to the brand Facebook.