How Do Socio-Demographic Patterns Define Digital Privacy Divide?

01/20/2022
by   Hamoud Alhazmi, et al.
University of Canberra
0

Digital privacy has become an essential component of information and communications technology (ICT) systems. There are many existing methods for digital privacy protection, including network security, cryptography, and access control. However, there is still a gap in the digital privacy protection levels available for users. This paper studies the digital privacy divide (DPD) problem in ICT systems. First, we introduce an online DPD study for understanding the DPD problem by collecting responses from 776 ICT users using crowdsourcing task assignments. Second, we propose a factor analysis-based statistical method for generating the DPD index from a set of observable DPD question variables. In particular, the DPD index provides one scaled measure for the DPD gap by exploring the dimensionality of the eight questions in the DPD survey. Third, we introduce a DPD proportional odds model for analyzing the relationship between the DPD status and the socio-demographic patterns of the users. Our results show that the DPD survey meets the internal consistency reliability with rigorous statistical measures, e.g., Cronbach's α=0.92. Furthermore, the DPD index is shown to capture the underlying communality of all DPD variables. Finally, the DPD proportional odds model indicates a strong statistical correlation between the DPD status and the age groups of the ICT users. For example, we find that young users (15-32 years) are generally more concerned about their digital privacy than senior ones (33 years and over).

READ FULL TEXT VIEW PDF

page 1

page 3

page 5

page 6

page 7

page 9

page 10

page 11

01/15/2018

Privacy attitudes and concerns in the digital lives of older adults: Westin's privacy attitude typology revisited

There is a growing literature on teenage and young adult users' attitude...
04/08/2021

On Telecommunication Service Imbalance and Infrastructure Resource Deployment

The digital divide restricting the access of people living in developing...
06/30/2021

Explaining Caste-based Digital Divide in India

With the increasing importance of information and communication technolo...
10/06/2021

Digital Divide and Social Dilemma of Privacy Preservation

While digital divide studies primarily focused on access to information ...
02/28/2022

How Well Do My Results Generalize Now? The External Validity of Online Privacy and Security Surveys

Security and privacy researchers often rely on data collected through on...
04/15/2021

From Personal Data to Digital Legacy: Exploring Conflicts in the Sharing, Security and Privacy of Post-mortem Data

As digital technologies become more prevalent there is a growing awarene...
11/23/2020

Validity and Reliability of the Scale Internet Users' Information Privacy Concern (IUIPC) [Extended Version]

Internet Users' Information Privacy Concerns (IUIPC-10) is one of the mo...

I Introduction

Recent years have witnessed much progress in defining digital privacy as a functional requirement in information and communications technology (ICT) systems [28, 16, 22, 12]. For the first time in human history, digital privacy is well-defined in regulations and policies, including the general data protection regulation (GDPR) [11]. Digital privacy can now be measured using statistical tools, e.g., differential privacy [9]. Previous works [28, 16, 22, 12] have developed privacy-preserving algorithms that protect the digital privacy of individuals. The literature reflects extensive efforts and attention in digital privacy from the research communities, industries, and governments. Nevertheless, is there still a gap in the levels of digital privacy provided to individuals?

One significant problem of ICT systems is their digital divide and inequality [23, 15, 10, 5, 25, 17]. In particular, ICT users receive various service levels and access qualities based on their socio-demographic patterns, e.g., age, gender, geographical location, occupation, and education. This paper shows that digital privacy is a recent form of the digital divide in ICT systems. In particular, the digital privacy divide (DPD) describes the various levels of digital privacy protection provided to users based on their socio-demographic patterns. This paper provides an in-depth statistical analysis of the effects of socio-demographic patterns on the DPD gap, which is a critical initial step for addressing the DPD problem and privacy protection inequalities.

We conducted an online survey study between May and October 2021 on how people perceive the DPD problem in their countries of residence. We collected responses from 776 ICT users. The study was created using Qualtrics survey software [24], and the ICT users were mainly recruited using manual referrals and Amazon Mechanical Turk (MTurk) [2] for crowdsourcing task assignments. Previous work shows that crowdsourcing in survey research accurately reflects the general population [27, 3]. Then, we applied rigorous statistical analysis to the collected DPD data. First, we show that the DPD survey meets the requirements for internal consistency reliability, e.g., Cronbach’s and McDonald’s

. Second, our DPD data visualization shows that the users’ geolocations cover most parts of Bangladesh, Germany, India, and the United States. Third, the distribution of responses to the DPD questions indicates a similar response pattern among the users in Bangladesh and Germany and those in India and the United States.

We describe how to create the DPD index from a set of eight question variables111For the rest of this paper, we use “DPD question variables” and “DPD variables” interchangeably.. The DPD index is a latent construct, which enables the study of the relationship between the DPD problem and the socio-demographic patterns of users. Our results show that the DPD status, i.e., DPD class, can be defined based on the DPD index. Furthermore, we propose a DPD proportional odds model for analyzing the statistical relationship between the DPD problem and the socio-demographic patterns of the individuals. For example, our results show that young users (15-32 years) are generally more concerned about their digital privacy compared to the more senior ones (33 years and over).

I-a Paper organization

The rest of this paper is organized as follows. Section II presents related works. Section III provides an introduction to privacy and data protection in the digital age, discusses the DPD problem, and introduces the online survey study. Section IV introduces the DPD statistical analysis of the DPD index generation and DPD proportional odds model. Then, numerical results are given in Section V. Section VI presents recommendations for closing the DPD gap. Finally, conclusions and future works are highlighted in Section VII.

Ii Related Works

Related works fall into three areas. We first review the digital divide problem in ICT systems. Then, we discuss digital privacy. Finally, we review related works in survey research with crowdsourcing.

Ii-a Digital divide

The digital divide problem broadly refers to the uneven distribution, access, and usage of ICT, resulting in opportunity gaps between individuals. Recent years have witnessed much interest in the digital divide, e.g., in education [15], digital health [10], and the COVID-19 pandemic impact on rural communities [20]. Chaoub et al. [5] discussed rural wireless connectivity and suggested solutions to narrow the digital divide gap for people living in remote areas, including affordability, accessibility, spectrum, power, and maintenance solutions. Reddick et al. [25] conducted a survey to study affordability and broadband access of people living in San Antonio, the United States. They showed that the digital divide is not limited to only regional locations but can exist within metropolitan cities.

There are various studies conducted during the COVID-19 pandemic. For example, Mathrani et al. [17] discussed the digital gender divide in India and conducted a survey to determine some of the learning challenges that female students encounter during the COVID-19 pandemic. They found that the digital divide gap exists among users who live in metropolitan, semi-metropolitan, and rural areas. Also, the surveyed users tend to agree that e-learning has affected their productivity and interaction with face-to-face discussions.

Ii-B Digital privacy

Protecting the digital privacy of individuals is a critical requirement in any modern system. Winegar et al. [32] concluded that more than 70% of users are concerned about their digital privacy. More concerning, some users do not know that their data is being collected in ICT systems. Jacobson et al. [14] conducted a survey on the privacy concerns of social media users. They found that users are worried about how their social media data is used in targeted advertising.

Redmiles et al. [26] conducted a telephone survey in the United States to study the correlation between the socio-economic status of users and their self-reported data breaches and privacy incidents. They found that advice resources are strongly related to the expected privacy incidents. Also, the authors reported that people from different socio-economic backgrounds might have different views about privacy-related problems.

Digital privacy has several legal and ethical aspects. Minin et al. [8] investigated the legal basis for personal data processing and using social media data in studying the ecology of human-nature interactions. They addressed the digital privacy problem based on the GDPR privacy rights and suggested applying data anonymization and secure data management to reduce the risk of data exposure. Solove [30] argued that the self-management of privacy by users is not practical in digital privacy protection, e.g., users are widely requested to provide personal data to access an online service in a “take it or leave it” setup. Instead, privacy laws and regulations, such as the GDPR, should be enforced on institutions to manage data collection, transmission, and processing.

Fig. 1: The DPD gap exists between protected and exposed users.

Ii-C Crowdsourcing in survey research

Well-designed surveys are an efficient method for collecting responses [29]. There are various methods for creating effective online surveys. For example, Story and Tait [31] suggested applying reliability measures, explaining the importance of the study to increase the response rate, and assuring that the collected responses are confidentially kept.

Using crowdsourcing, e.g., Amazon MTurk, for completing survey questionnaires is a well-studied area [27, 34, 34]. Redmiles et al. [27] showed that using MTurk to recruit users for online privacy and security survey is robust and generalizes to the general population. Yigzaw et al. [34] showed that privacy-preserving statistical analysis could be performed on crowdsourcing surveys using secure multi-party computation. Yin et al. [35] proposed a task recommendation that enables assigning MTurk crowdsourcing tasks to the most fitting users while protecting their digital privacy.

This paper is fundamentally different from all previous works that studied the digital divide and privacy. Our work is the first to address the relationship between the DPD problem and the socio-demographic patterns of ICT users. We used an online survey and recruited 776 users with MTurk crowdsourcing and email invitations. MTurk responses generalize to the general population [27, 3]. Moreover, our DPD survey meets the internal consistency reliability with rigorous statistical measures, e.g., Cronbach’s and McDonald’s . We conclude this paper with a roadmap discussion on closing the DPD gap among ICT users.

Iii Problem formulation and research methodology

This section introduces the reader to the DPD problem and the conducted survey research. First, we provide an introduction to privacy and data protection in the digital age. Second, we describe the DPD problem as a form of digital inequality. Third, we present the procedure of defining the DPD problem using observable questions in the survey study.

Iii-a Primer

Iii-A1 Privacy and data protection in the digital age

Modern ICT systems collect massive amounts of data with various ubiquitous and pervasive sensing technologies, such as the Internet of things and crowdsensing [21]. The plethora of collected data raises genuine concerns about the privacy violation of ICT users. Therefore, data protection regulations are essential for dictating how, when, why, and what data is collected. The GDPR [11] is a fundamental privacy regulation that governs the data collection from residents of the European Union. The GDPR defines personal data in Chapter 1, Article 4 as “any information relating to an identified or identifiable natural person.” For example, personal data includes browsing cookies, biometric records, and email addresses of users. Digital privacy is a concept that describes the right to control how any personal data about users is collected, transmitted, stored, and processed.

Iii-A2 GDPR privacy rights

The GDPR [11] defines eight privacy rights for ICT users.

  • Right to be informed: The users must know who collects and obtains their data.

  • Right of access: The users have the right to obtain copies of their data.

  • Right to rectification: The users have the right to request correcting inaccurate records of their data.

  • Right to erasure: The users have the right to be forgotten by deleting their data and preventing future data collection without a new consent.

  • Right to restrict processing: The users have the right to restrict the processing of their data.

  • Right to data portability: The users have the right to transfer their data to selected recipients.

  • Right to object: The users have the right to grant and withdraw consent on processing and collecting their data.

  • Rights about automated decision-making and profiling

    : The users have the right to opt-out from using their data in automated systems, including machine learning and artificial intelligence (AI).

Iii-B Digital privacy divide (DPD)

Digital privacy divide (DPD) is a concept utilized in this work to describe the gap in digital privacy protection between ICT users who are protected and those who are exposed to privacy attacks. Figure 1

depicts the DPD problem in ICT systems. The DPD problem exists when the ICT users receive distinct levels of digital privacy protection. ICT systems enable linking most modern infrastructures, including transportation, industries, and smart homes. ICT systems can be classified into

privacy-preserving and exposed systems in terms of digital privacy protection. Privacy-preserving ICT systems apply rigid privacy tools [28, 16, 22, 12], mitigating the risk of data exposure. Exposed ICT systems do not include well-defined privacy policies, and privacy tools are not properly implemented. Accordingly, data exposure is more likely to occur in exposed systems, where an adversary could obtain private data about exposed users.

Iii-B1 How does the DPD gap influence our digital life?

The DPD problem produces severe psychological, financial, and social impacts on the exposed users. Aïmeur and Schőnfeld [1] discussed identity theft, which can occur due to privacy breaches. They presented several crimes related to identity theft, including financial losses, medical insurance frauds, loan and banking frauds, and criminal impersonation.

Iii-B2 How does the DPD gap connect to other forms of digital inequalities?

Digital inequalities retain various forms, including physical Internet access and digital literacy. Nevertheless, remarkable progress has been made in recent years to close the gap in physical Internet access and digital literacy. For example, a recent report by Cisco Systems [6]estimates that there will be 5.3 billion active Internet users (66% of the world’s population), 5.7 billion mobile subscribers (71% of the world’s population), and 29.3 billion networked devices (3.6 times the world’s population) by 2023. Given this rapid increase in accessing online services, more additional users will be affected by the DPD problem over time.

This paper analyzes the DPD problem and its correlation to the socio-demographic patterns of the users. Next, we introduce the design of the online survey study.

Iii-C DPD survey study

Iii-C1 Choice of operationalization

Operationalization is the process of representing concepts using observable and measurable questions [29]. We developed the DPD survey based on the digital privacy rights in the GDPR [11]. In particular, the DPD survey includes the following eight questions:

  • Question 1: I receive clear information on how my government collects my personal data, including who is accessing and processing the data and the data collection purposes.

  • Question 2: I can access copies of my personal data, which my government has collected.

  • Q3: I can transfer my personal data, which my government has collected, to third-party recipients, e.g., organizations, of my choice.

  • Question 4: I can correct my personal data, which my government has collected, when it contains inaccurate, invalid, or misleading data.

  • Question 5: I can request deleting specific records of my personal data, which has been collected by my government when the data is no longer needed for the original purpose.

  • Question 6: I have the option and control to restrict the processing of specific categories of my personal data, which my government has already collected.

  • Question 7: I have the control and ability to grant or withdraw consent on collecting and processing my personal data by my government at any time.

  • Question 8: I have the option and control to opt-out from using my personal data, which my government has collected in making decisions and profiling, based solely on automated processing.

Questions 1-8 measure all aspects of the GDPR privacy rights. Utilizing the least possible number of questions in survey research is crucial for improving the response rate and study integrity [31]. The ICT users give their responses in a Likert scale [29] of five agreement levels (strongly agree, somewhat agree, neither agree nor disagree, somewhat disagree, and strongly disagree). Furthermore, we requested the ICT users to provide responses about their socio-demographic patterns, including their age, gender, ethnicity, highest levels of education, occupation, and country of residency. The survey responses are presented and analyzed in Section V.

Iii-C2 crowdsourcing for survey research

We created the survey study using Qualtrics survey software [24], then we recruited ICT users using Amazon Mechanical Turk (MTurk) [2] and manual referrals. The use of survey research with crowdsourcing is a proven method, and the responses generalize to the general population [27, 3]. We collected 776 responses from ICT users residing in Bangladesh, Germany, India, and the United States. We could not recruit many ICT users from Bangladesh with MTurk; therefore, most of the responses from Bangladesh were collected using manual referrals through email invitations.

Iv DPD statistical analysis

This section presents an in-depth statistical analysis of the DPD problem. First, we present statistical methods for defining the DPD index using the observable variables, i.e., Questions 1-8. Second, we present a proportional odds model for determining the probability of a DPD status, given the socio-demographic patterns of the ICT users.

Iv-a DPD index generation

Next, we describe the generation of the DPD index, which provides a single measure of the underlying DPD gap. The DPD index is a latent variable, i.e., the DPD index is not directly observable through responses, which articulates the underlying gap in the privacy protection of the ICT users. The DPD questions presented in Section III are unidimensional, i.e., they measure the DPD gap as a single construct.

Principal component analysis (PCA) can be used to reduce the dimensionality of data. Xu et al. [33] proposed PCA-guided clustering for finding the optimal solution of a clustering problem in the PCA subspace. The DPD status can be generated by applying the PCA-guided clustering as follows:

  • The dimensionality of the DPD variables is first reduced using the PCA technique.

  • The resulting data in the PCA subspace is clustered into different DPD classes using the k-means algorithm, representing varying levels of the DPD gap. The DPD status (amount of DPD measurement) can be defined from a category ordering of

    classes, where is the -th DPD class. We arrange the DPD classes such that .

In Section V

, we show that the first PCA component (PC1) captures most variance of the DPD variables. Furthermore, we show that PC1 is sufficient for defining the clustering class of the DPD responses, i.e., the DPD status can be found using PC1. Accordingly, we use PC1 as the DPD index for providing one scaled measure of the DPD gap.

Fig. 2: The distribution of responses of Questions 1-8.

Iv-B DPD proportional odds model

We next present the DPD proportional odds model for analyzing the DPD survey data. In particular, the DPD proportional odds model enables determining the statistical correlation between a DPD status and the socio-demographic patterns of the ICT users. In addition, proportional odds regression can capture the dependency of an ordinal response on discrete or continuous variables [7, 4].

Our objective is to define the DPD regression models which can compute the probability of each DPD class, given the socio-demographic patterns of an ICT user. Let

be the DPD response random variable for the

collected responses . The probability of DPD class , where , is . The probability of all DPD classes is . To compute , we must first define the cumulative probability of a DPD class. The cumulative probability of DPD class is defined as follows:

(1)

The log-odds of the cumulative probability can be computed using the inverse of the logistic function, such that , where . Accordingly, the log-odds of the cumulative probability can be computed as follows:

(2)
(3)
(4)

In statistics, the proportional odds model can be defined as a liner combination of the explanatory variables [4]. Mathematically, the log-odds of a DPD class can be computed as a liner combination of socio-demographic variables as follows:

(5)

where is the number of socio-demographic variables collected from users. are the regression parameters.  depends on the DPD class .

Then, the cumulative probability is defined as follows:

(6)

Using (6), the probability of a DPD class is computed as follows:

(7)

V Numerical and statistical analysis

This section presents a numerical and statistical analysis of the DPD survey. First, we provide visualizations of the responses collected using the DPD survey. Second, we present a reliability analysis of the collected responses. Third, we analyze the DPD index computation. Forth, we present a socio-demographic analysis of the DPD problem. Finally, we provide numerical results of the DPD proportional odds model.

V-a Visualizing and exploring DPD survey data

Next, we present key insights of the DPD survey data using visualization charts.

V-A1 Distribution of responses

Figure 2 shows the distributions of responses for Questions 1-8 for all surveyed ICT users, i.e., all responses regardless of the socio-demographic patterns of users. Several results can be noted. First, most ICT users agree (“strongly agree” or “somewhat agree”) with the arguments in Questions 1-8. 70.2%, 65.6%, 57.2%, 68.9%, 59.5%, 56.9%, 56.4%, and 55.7% provided agree responses to Questions 1-8, respectively. This indicates that most ICT users are satisfied with their privacy protection. Second, a relatively high percentage of users provide a ”neither agree nor disagree” response (11.2%, 13.7%, 20.7%, 14.9%, 18.9%, 18%, 18.7%, and 18.4% for Questions 1-8, respectively). This can be explained as many ICT users do not have sufficient information on their privacy protection.

Fig. 3: The distribution of responses of Questions 1-8, divided based on the countries of residency of the ICT users.

Figure 3 shows the distribution of responses for Questions 1-8 for each country (Bangladesh, Germany, India, and the United States). The responses vary for different countries. However, there are similarities in the response percentages among India and the United States for Questions 1-8. Furthermore, the response percentages of Bangladesh and Germany look similar for Questions 2-6 and 8.

V-A2 Population sample and geolocation

Fig. 4: The geographical locations of the ICT users in Bangladesh (113 users), Germany (58 users), India (314 users), and the United States (291 users).

Figure 4 shows the geographical locations (longitude and latitude values) of the participating ICT users in Bangladesh, Germany, India, and the United States. The numbers of collected responses are 113, 58, 314, and 291 from Bangladesh, Germany, India, and the United States, respectively. The users are located in various parts of the surveyed countries. Accordingly, the population sample of the users provides a holistic view of the countries and represents people at different geographical locations.

V-B Reliability of DPD survey

We next present statistical measures for evaluating the reliability of the data collected using the online survey study. In particular, we provide the Cronbach’s , McDonald’s , and other alternative reliability measurements [36] of the DPD data.

V-B1 Cronbach’s

A widely-used rule of thumb indicates that adequate internal reliability can be concluded when is greater than 0.70 [19]. Our DPD study is reliable with internal consistency reliability of . This value indicates that Questions 1-8 correlate and measure the DPD problem as one construct.

V-B2 Other reliability measures

Some previous studies reported some limitations of Cronbach’s as a measure of reliability, e.g., see [19, 18]. Therefore, we report alternative measures of reliability on the DPD data in Table I. We refer the reader to [18] for an overview and mathematical definitions of these reliability measures. In summary, it can be noted that the DPD survey meets all of these reliability measures. The internal consistency of the DPD survey is concluded.

Measure Reliability score
Cronbach’s 0.92
McDonald’s 0.94
(hierarchical) 0.86
Revelle’s (total) 0.94
Greatest Lower Bound (GLB) 0.93
Coefficient H 0.93
Coefficient 0.92
TABLE I: Measures of internal consistency reliability.

V-C DPD index generation

Next, we analyze the collected survey responses to extract the DPD index, which provides a unidimensional scale and captures the underlying communality of Questions 1-8. Subsequently, the following analysis steps are applied:

  • Question (variable) analysis: We apply factor analysis, i.e., aggregating the questions linearly, to capture the underlying communality of Questions 1-8. The key objective of this step is aggregating Questions 1-8 into two unobserved underlying variables called DPD factors. We will show that Questions 1-8 measure and reflect the same underlying factor.

  • DPD index analysis: We compute the portions of explained variance in each PCA component. Then, we analyze the influence of Questions 1-8 on the PCA components. Our analysis shows that the first principal component (PC1) captures 64.75% of the data variation. Accordingly, PC1 is used as the DPD index, which provides one scaled measure for the DPD gap by exploring the dimensionality of Questions 1-8 in the DPD survey.

  • Response clustering

    : It is more convenient to analyze the DPD responses using unified DPD classes, e.g., our DPD proportional odds model requires an ordinal variable to represent the DPD classes. Therefore, we cluster the responses into DPD classes of 4 levels using k-means clustering. Furthermore, we show that PC1 is sufficient for defining the DPD class of any response.

V-C1 Question (variable) analysis

Fig. 5: Factor analysis of Questions 1-8.

Figure 5 shows the factor analysis of Question 1-8. The main objective of the factor analysis is capturing the underlying communality of the questions by using a linear combination of factors [13], i.e., factor analysis enables understanding the underlying DPD concept by aggregating the questions. The contributions of questions to the factor is shown as points in Figure 5. It can be noted that Questions 1-8 are located near each other. This shows that Questions 1-8 capture the underlying DPD gap as a single construct.

V-C2 DPD index analysis

Fig. 6: Variance captured by each PCA component.

Figure 6

shows the portions of explained variance and eigenvalues in each of the DPD components using PCA. The x-axis shows the number of DPD principal components, and the y-axis shows the variance explained by each principal component. For example, the variance explained by the first principal component is 5.18. It can be noted that all PCA components, except the first one, have eigenvalues of less than 1. Given that there is only one PCA component with an eigenvalue of 1.0 or higher, it can be concluded that the questions form a unidimensional scale, and they are internally consistent.

Fig. 7: Exploratory graph showing information on both response samples and question variables of the DPD survey data.

Figure 7

shows both the PCA scores (data points) and loading values (red vectors pinned from the origin of PC1 and PC2). Several results can be observed. First, the first principal component (PC1) captures 64.75% variation of the data, i.e., PC1 explains the majority of the variance in the DPD variables. In comparison, the second principal component (PC2) captures only 8.64% of the data variation. Second, a loading vector reflects the degree to which each question influences the computation of PCA components. For example, Question 4 has a strong influence when computing PC2. All questions have a strong influence on the computation of PC1. Third, the score points show the projections of the responses into the PCA subspace. The DPD gap, i.e., various levels of privacy protection for users, can be observed.

V-C3 Response clustering

Fig. 8: Clustering the responses into 4 DPD classes ().

Figure 8 shows the clustering of the observable variables (Questions 1-8) into 4 DPD classes. Each DPD class represents a different level of the DPD gap. It can be noted that the DPD classes are defined based on the first principal component (PC1) only, i.e., the clustering ellipses are vertical. The major axes of the ellipses are parallel to PC2, and PC2 is not used to define the DPD status. Therefore, PC1 is used to represent the DPD index in the next socio-demographic analysis.

V-D Socio-demographic analysis of DPD

Next, we provide an in-depth experimental discussion of the socio-demographic analysis in the DPD problem.

Fig. 9: The DPD index based on the age groups of the ICT users.

Figure 9

shows the distribution of responses based on the ages, genders, and countries of residency of the ICT users. 67.7% and 32.3% of the surveyed ICT users are males and females, respectively. The percents of collected responses are 14.6%, 7.5%, 40.5%, and 37.5% from Bangladesh, Germany, India, and the United States, respectively. Several results can be drawn. First, the median DPD index is significantly high in Bangladesh and Germany, compared to India and the United States (see the third row). This reflects significant concerns on privacy protection among the ICT users in Bangladesh and Germany. Second, there is no significant difference in the median DPD index among all ICT users based on their genders (see the first and second rows). Third, there are high variations among Bangladesh’s ICT users (see the median, first quartile, and third quartile in the first column).

Fig. 10: The DPD index based on the ethnic groups of the ICT users.
Fig. 11: The DPD index based on the highest levels of education of the ICT users.
Fig. 12: The DPD index based on the occupations of the ICT users.

Figure 10 shows the distribution of responses based on their ethnic backgrounds, genders, and countries of residency. The surveyed ICT users come from various ethnic backgrounds, including South Asians (51.7%), Whites (34.4%), African Americans (4.5%), and East Asians (3.7%). Figure 11 shows the distribution of responses based on their highest levels of education, genders, and countries of residency. The surveyed ICT user have various levels of education, such as bachelor’s degrees (51.8%), master’s degrees (32.9%), college diplomas (7.1%), and high school graduates (5.7%). Figure 12 shows the distribution of responses based on their occupations, genders, and countries of residency. The surveyed ICT users work in all occupation sectors, including the private sector (42.4%), self-employment (34.9%), and the government sector (9.7%).

V-E DPD proportional odds model

Fig. 13: Ordinal regression of the DPD index using the socio-demographic patterns of the ICT users (country of origin, gender, and age group).

Finally, we analyze the responses based on the DPD proportional odds model proposed in Section IV-B. Figure 13 shows the predicted probability of reporting a DPD index based on the age group of an ICT user. The likelihood of providing high DPD index increases rather dramatically with age. This result reflects more concerns among young users (15-32 years) on their privacy protection than senior users (33 years and over). The only exception is among the 25-32 years old users in Bangladesh, which tend to indicate less concern about their digital privacy.

We also applied the DPD proportional odds model on the ethnic backgrounds, occupations, and highest education levels of the ICT users. We find that the ethnic backgrounds, occupations, and highest education levels have a minimal statistical impact on the DPD gap perceived by the ICT users.

Vi Roadmap discussion

The DPD problem is a recent form of the digital divide. The DPD gap results in severe financial, social, and physiological difficulties for the exposed users. It can take several years for the exposed victims to recover from a digital privacy breach. The stolen data is generally used in criminal activities [1]. Therefore, all organizations and institutions must invest in protecting the privacy of their ICT users by applying state-of-the-art privacy tools [28, 16, 22, 12].

We recommend the following roadmap for addressing the DPD problem.

Vi-1 Digital privacy regulations must be introduced in all countries

Legislative bodies can play the most significant role in closing the DPD gap. Therefore, we recommend introducing data protection laws comparable to the GDPR by all countries. Furthermore, privacy protection must be enabled as the default preference for all ICT users, regardless of their socio-demographic patterns or country of residency.

Vi-2 Researchers and educators are obligated to increase the awareness of the DPD problem

Teaching curriculums must include significant components covering digital privacy. In addition, the media can provide a valuable medium for reaching the general audience. We must increase the awareness of digital privacy as a fundamental human right.

The majority of the research in cybersecurity focuses on data and network security (preventing unauthorized access to data by third parties). In contrast, digital privacy (regulating how to collect, process, and share data) does not receive equal attention within the research community. Thus, more awareness within the research community will encourage future works on digital privacy.

Vii Conclusion and Future work

In this paper, we have presented a survey study for understanding the DPD problem. We used crowdsourcing task assignments to collect responses from 776 ICT users on the DPD problem. The DPD survey is shown to meet the internal consistency reliability, including Cronbach’s of and McDonald’s of . Furthermore, the DPD index is shown to capture the underlying DPD construct using the PCA-guided clustering method. Finally, we have explored the statistical relationship between the DPD problem and the socio-demographic patterns of the ICT users using the DPD proportional odds model.

Future studies and relevant industries can pursue several important directions based on the results of this paper.

  • There is an urgent need to build third-party privacy impact analysis and compliance checks in ICT systems. For example, privacy star ratings can be issued to ICT systems based on their compliance with the privacy provisions.

  • Open-source privacy tools and analytics opt-out extensions are still underdeveloped. At the same time, there is a necessity to study the privacy protection achieved through existing privacy tools, such as the Google Analytics opt-out browser add-on.

  • Future research is warranted to understand privacy breaches’ social, economic, and cultural impacts on individuals and institutions.

References

  • [1] E. Aïmeur and D. Schőnfeld (2011) The ultimate invasion of privacy: Identity theft. In Proceedings of the Annual International Conference on Privacy, Security and Trust, pp. 24–31. Cited by: §III-B1, §VI.
  • [2] Amazon Mechanical Turk, Inc. (2005) Amazon mechanical turk (MTurk)). Note: https://www.mturk.com/Online; accessed 16 January 2022 Cited by: §I, §III-C2.
  • [3] T. S. Behrend, D. J. Sharek, A. W. Meade, and E. N. Wiebe (2011-03) The viability of crowdsourcing for survey research. Behavior Research Methods 43 (3), pp. 800–813. Cited by: §I, §II-C, §III-C2.
  • [4] C. R. Bilder and T. M. Loughin (2014) Analysis of categorical data with R. CRC Press. Cited by: §IV-B, §IV-B.
  • [5] A. Chaoub, M. Giordani, B. Lall, V. Bhatia, A. Kliks, L. Mendes, K. Rabie, H. Saarnisaari, A. Singhal, N. Zhang, et al. (2021-07) 6G for bridging the digital divide: Wireless connectivity to remote areas. IEEE Wireless Communications. Cited by: §I, §II-A.
  • [6] Cisco Systems, Inc. (2020) Cisco annual Internet report (2018–2023) white paper. Note: https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.htmlOnline; accessed 16 January 2022 Cited by: §III-B2.
  • [7] D. R. Cox and D. Oakes (2018) Analysis of survival data. CRC Press. Cited by: §IV-B.
  • [8] E. Di Minin, C. Fink, A. Hausmann, J. Kremer, and R. Kulkarni (2021-03) How to address data privacy concerns when using social media data in conservation science. Conservation Biology 35 (2), pp. 437–446. Cited by: §II-B.
  • [9] C. Dwork, A. Roth, et al. (2014-08) The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9 (3-4), pp. 211–407. Cited by: §I.
  • [10] E. V. Estacio, R. Whittle, and J. Protheroe (2019-10) The digital divide: Examining socio-demographic factors associated with health literacy, access and use of Internet to seek health information. Journal of Health Psychology 24 (12), pp. 1668–1675. Cited by: §I, §II-A.
  • [11] European Parliament and Council of the European Union (2016) General data protection regulation (GDPR). Note: https://gdpr-info.eu/Online; accessed 16 January 2022 Cited by: §I, §III-A1, §III-A2, §III-C1.
  • [12] M. Gupta, M. Abdelsalam, S. Khorsandroo, and S. Mittal (2020-02) Security and privacy in smart farming: Challenges and opportunities. IEEE Access 8, pp. 34564–34584. Cited by: §I, §III-B, §VI.
  • [13] K. Hartmann, J. Krois, and B. Waske (2018) E-learning project SOGA: Statistics and geospatial data analysis. Technical report Department of Earth Sciences, Freie Universitaet Berlin. Cited by: §V-C1.
  • [14] J. Jacobson, A. Gruzd, and Á. Hernández-García (2020-03) Social media marketing: Who is watching the watchers?. Journal of Retailing and Consumer Services 53, pp. 101774. Cited by: §II-B.
  • [15] J. Lai and N. O. Widmar (2021-10) Revisiting the digital divide in the COVID-19 era. Applied Economic Perspectives and Policy 43 (1), pp. 458–464. Cited by: §I, §II-A.
  • [16] B. Liu, M. Ding, S. Shaham, W. Rahayu, F. Farokhi, and Z. Lin (2021-03) When machine learning meets privacy: A survey and outlook. ACM Computing Surveys 54 (2), pp. 1–36. Cited by: §I, §III-B, §VI.
  • [17] A. Mathrani, T. Sarvesh, and S. Mathrani (2020) Digital gender divide in online education during COVID-19 lockdown in India. In Proceedings of the IEEE International Asia-Pacific Conference on Computer Science and Data Engineering, pp. 1–6. Cited by: §I, §II-A.
  • [18] D. McNeish (2018-09) Thanks coefficient alpha, we’ll take it from here. Psychological Methods 23 (3), pp. 412. Cited by: §V-B2.
  • [19] O. F. Morera and S. M. Stokes (2016-02) Coefficient as a measure of test score reliability: Review of 3 popular misconceptions. American Journal of Public Health 106 (3), pp. 458–461. Cited by: §V-B1, §V-B2.
  • [20] J. T. Mueller, K. McConnell, P. B. Burow, K. Pofahl, A. A. Merdjanoff, and J. Farrell (2021-11) Impacts of the COVID-19 pandemic on rural America. Proceedings of the National Academy of Sciences 118 (1). Cited by: §II-A.
  • [21] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, D. Niyato, O. Dobre, and H. V. Poor (2022-01) 6G Internet of things: A comprehensive survey. IEEE Internet of Things Journal 9 (1), pp. 359–383. Cited by: §III-A1.
  • [22] V. Nguyen, P. Lin, B. Cheng, R. Hwang, and Y. Lin (2021-08) Security and privacy for 6G: A survey on prospective technologies and challenges. IEEE Communications Surveys & Tutorials 23 (4). Cited by: §I, §III-B, §VI.
  • [23] J. B. Pick and A. Sarkar (2015) The global digital divides: Explaining change. Springer. Cited by: §I.
  • [24] Qualtrics (2017) Qualtrics experience management software. Note: https://www.qualtrics.com/Online; accessed 16 January 2022 Cited by: §I, §III-C2.
  • [25] C. G. Reddick, R. Enriquez, R. J. Harris, and B. Sharma (2020-11) Determinants of broadband access and affordability: An analysis of a community survey on the digital divide. Cities 106, pp. 102904. Cited by: §I, §II-A.
  • [26] E. M. Redmiles, S. Kross, and M. L. Mazurek (2017) Where is the digital divide? A survey of security, privacy, and socioeconomics. In Proceedings of the International Conference on Human Factors in Computing Systems, pp. 931–936. Cited by: §II-B.
  • [27] E. M. Redmiles, S. Kross, and M. L. Mazurek (2019) How well do my results generalize? Comparing security and privacy survey results from MTurk, web, and telephone samples. In Proceedings of the International Symposium on Security and Privacy, pp. 1326–1343. Cited by: §I, §II-C, §II-C, §III-C2.
  • [28] M. A. Sahi, H. Abbas, K. Saleem, X. Yang, A. Derhab, M. A. Orgun, W. Iqbal, I. Rashid, and A. Yaseen (2017-10) Privacy preservation in e-healthcare environments: State of the art and future directions. IEEE Access 6, pp. 464–478. Cited by: §I, §III-B, §VI.
  • [29] W. E. Saris and I. N. Gallhofer (2014) Design, evaluation, and analysis of questionnaires for survey research. John Wiley & Sons. Cited by: §II-C, §III-C1, §III-C1.
  • [30] D. J. Solove (2021-01) The myth of the privacy paradox. George Washington Law Review 89, pp. 1–51. Cited by: §II-B.
  • [31] D. A. Story and A. R. Tait (2019-02) Survey research. Anesthesiology 130 (2), pp. 192–202. Cited by: §II-C, §III-C1.
  • [32] A. G. Winegar and C. R. Sunstein (2019-07) How much is data privacy worth? A preliminary investigation. Journal of Consumer Policy 42 (3), pp. 425–440. Cited by: §II-B.
  • [33] Q. Xu, C. Ding, J. Liu, and B. Luo (2015-03) PCA-guided search for k-means. Pattern Recognition Letters 54, pp. 50–55. Cited by: §IV-A.
  • [34] K. Y. Yigzaw, A. Michalas, and J. G. Bellika (2016-09) Secure and scalable statistical computation of questionnaire data in R. IEEE Access 4, pp. 4635–4645. Cited by: §II-C.
  • [35] H. Yin, Y. Xiong, T. Deng, H. Deng, and P. Zhu (2019-09) A privacy-preserving and identity-based personalized recommendation scheme for encrypted tasks in crowdsourcing. IEEE Access 7, pp. 138857–138871. Cited by: §II-C.
  • [36] R. E. Zinbarg, W. Revelle, I. Yovel, and W. Li (2005-04) Cronbach’s , revelle’s , and mcdonald’s h: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika 70 (1), pp. 123–133. Cited by: §V-B.

Biographies

Hamoud Alhazmi (S’21) is working as a Research Assistant at the University of Canberra, ACT, Australia. He graduated with a Master’s degree in Cybersecurity and B.Eng. degree in Network & Software Engineering with (first-class honors) from the University of Canberra in 2021 and 2020, respectively. His current research interests are in computer vision, machine learning, and cybersecurity. He worked as an Assistant IT Manager during his studies in Canberra.

Ahmed Imran (ahmed.imran@canberra.edu.au) is an Information Systems researcher at the University of Canberra with special interests in the strategic use of IT, eGovernment, and socio-cultural impacts of ICT. His vast experience as an IT manager as well as his work in developing countries became invaluable for research and in understanding and providing a rich insight into the socio-cultural context through multiple lenses, resulting in interdisciplinary research opportunities. His research has proven to bring real-world applications to the table, something that cemented its importance and relevance in the eyes of the research community. This recognition was further reflected through the award of the prestigious Australian National University Vice Chancellor’s award in 2010, followed by numerous invitations to international and national forums/universities.

Mohammad Abu Alsheikh (S’14–M’17) is an Associate Professor and ARC DECRA Fellow at the University of Canberra (UC), ACT, Australia. He designs and creates novel privacy-preserving Internet of things systems that leverage both machine learning and convex optimization with applications in people-centric sensing, human activity recognition, and smart cities. Previously, he was a Postdoctoral Researcher at the Massachusetts Institute of Technology (MIT), USA. His doctoral research at Nanyang Technological University (NTU), Singapore, focused on optimizing wireless sensor networks’ data collection. After graduating with a B.Eng. degree in computer systems from Birzeit University, Palestine, he worked as a software engineer at a digital advertising start-up and Cisco.