Computer Users Have Unique Yet Temporally Inconsistent Computer Usage Profiles

05/20/2021
by   Luiz Giovanini, et al.
University of Florida
13

This paper investigates whether computer usage profiles comprised of process-, network-, mouse- and keystroke-related events are unique and temporally consistent in a naturalistic setting, discussing challenges and opportunities of using such profiles in applications of continuous authentication. We collected ecologically-valid computer usage profiles from 28 MS Windows 10 computer users over 8 weeks and submitted this data to comprehensive machine learning analysis involving a diverse set of online and offline classifiers. We found that (i) computer usage profiles have the potential to uniquely characterize computer users (with a maximum F-score of 99.94 recognize profiles (95.14 network-related); (iii) user profiles were mostly inconsistent over the 8-week data collection period, with 92.86 and usage habits; and (iv) online models are better suited to handle computer usage profiles compared to offline models (maximum F-score for each approach was 95.99

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 7

page 8

page 9

03/12/2018

Classifying Online Dating Profiles on Tinder using FaceNet Facial Embeddings

A method to produce personalized classification models to automatically ...
08/15/2017

Continuous User Authentication via Unlabeled Phone Movement Patterns

In this paper, we propose a novel continuous authentication system for s...
04/01/2011

U-Sem: Semantic Enrichment, User Modeling and Mining of Usage Data on the Social Web

With the growing popularity of Social Web applications, more and more us...
03/17/2019

A customisable pipeline for continuously harvesting socially-minded Twitter users

On social media platforms and Twitter in particular, specific classes of...
03/05/2021

Towards a standardised strategy to collect and distribute application software artifacts

Reference sets contain known content that are used to identify relevant ...
10/29/2019

Gait Event Detection in Tibial Acceleration Profiles: a Structured Learning Approach

Analysis of runner's data will often examine gait variables with referen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Computer user profiling is the procedure of constructing a behavior-based digital identity of a person by leveraging their computer usage data, such as network traffic, process activity, and mouse and keyboard dynamics [Frank2010AGR, Kim2010MAT, Yang2015Character]. This procedure has the potential to uniquely characterize computer users by their usage patterns in terms of activity (e.g., process and network events) and temporal consistency (e.g., events repeated weekly) [Tossell2012CWU, Fridman2017AAMDS, Payne2013SEMAA]. For example, a middle-aged CEO from a Fortune 500 company uses his computer very differently from a young software developer in a tech startup in California. While the former mostly uses email client, office software, web browser, and customized company management software, the latter mostly uses IDE, CAD, and software versioning tools, alongside a web browser, email client, and regularly accesses .com.br websites given their Brazilian descent.

Several factors make computer usage profiles (referred to as simply profiles for the remainder of this paper) well-suited for applications in continuous authentication (CA). The automatic recording of profiles can be implemented in a transparent fashion without requiring user intervention, thus facilitating usability and acceptability [Chuang2013Ithinktherefore]. Furthermore, especially with regards to corporate employees, although some might perceive the recording of computer usage as potentially privacy-invasive, most already have a very limited expectation of privacy. In fact, some employee activities (e.g., email and computer usage) are often recorded by their employers in both the public and private sectors [privacy].

This paper does not propose a CA solution targeting personal or corporate computer users. Instead, based on naturalistic data collected from an ecologically-valid user study, we seek to provide initial evidence on whether profiles constitute a useful and feasible source of data to uniquely characterize computer users, discussing challenges and opportunities for applications in CA. We thus focused on investigating temporal changes in profiles (i.e., changes in computer usage habits taking place over time), which impose additional challenges to CA. More specifically, we focused on the following research questions:

  • RQ1: Do computer users have unique computer usage profiles? In other words, are computer usage profiles distinguishable from one another?

  • RQ2: What features (e.g., network events, processes) are most important for determining a unique computer usage profile?

  • RQ3: Are computer usage profiles consistent over time (i.e., do computer usage habits repeat periodically)?

To address these questions, we conducted an Institutional Review Board (IRB) approved user study to collect profiles from 28 MS Windows 10 users over an 8-week (from September 2020 to March 2021) in an ecologically-valid setting, i.e., wherein users interacted with their computers naturally without any interference or probing from the research team111All artifacts created throughout the course of this study, including our dataset of ecologically-valid computer usage data and our profile extractor module, are publicly available at https://github.com/danielaoliveira/Naturalistic-Computer-Usage-Profiles.. Our definition of profiles encompasses process-, network-, mouse-, and keyboard-related events on the users’ computers. The profile data was then submitted to a comprehensive machine learning analysis involving a diverse set of offline and online classifiers in multi-class, one-class, and binary settings aimed to recognize computer users (RQ1) and identify the most important features for this purpose (RQ2

). We also used Self-Organizing Maps to investigate behavioral changes in the profiles over the study period (

RQ3).

The main takeaways of our analyses are: (1) computer usage profiling has the potential to uniquely characterize computer users; (2) network-related features were the most useful to accurately recognize users; (3) profiles were not temporally consistent, with users exhibiting drifts in usage patterns; (4) online learning models are better suited to distinguish profiles than offline models, and; (5) binary learning models are more feasible to handle profiles than one-class and multi-class models.

The remainder of the paper is organized as follows. Section 2 discusses the threat model faced by CA approaches and the assumptions of our analysis. Section 3 reviews related work. Section 4 presents our user study methodology. Section 5 describes the design and implementation of the profile extractor used to gather users’ computer usage data. Section 6 describes the methodology of our machine learning analyses and presents the experimental results. Section 7 discusses study’s findings, limitations, and suggestions for future work. Lastly, Section 8 concludes this paper.

2 Threat Model and Assumptions

This section discusses the main assumptions of this work and the threat model faced by CA approaches. First, we assume that CA methods are better suited to be employed in corporate environments, which are targeted by both outside and inside adversaries. Second, although our discussion is entirely focused on CA, it is important to highlight that this paper does not propose a new CA approach. Third, we consider a reasonable expectation of privacy for corporate employees. Development of a CA solution leveraging profiles involves the recording of system-level events, and we do not advocate the development of CA systems based on privacy-invasive recording of user activity at the level of file contents, keystroke logging, email content, etc. Nonetheless, it is important to point out that in many organizations (public and private sectors), employees already have limited expectations of privacy [privacy]. For example, emails and network traffic can be monitored, and the devices and applications used by employees can be restricted.

Threat model. As discussed in Sec. 1

, the main appeal of a CA system is not to replace traditional, point-of-entry authentication schemes (e.g., passwords or security keys), but to complement them and address their limitations—mainly that, after a user is authenticated into the system, their identity is not subsequently verified. Adversaries can explore this vulnerability in a variety of ways. Outside attacks are represented by all types of standalone malware that can enter the organization perimeter via drive-by downloads, malicious links and attachments opened by employees, etc. For these types of attacks—and particularly for slow-operating, stealthy malware (e.g., APT, advanced persistent threats)—a CA approach can potentially flag events that do not fit a learned user profile, e.g., connections to certain IP addresses or subnets, and patterns of exfiltration of information. Similarly, insider attacks could potentially be mitigated via CA approaches. For example, it is plausible that employees working on the same project will have similarities in profiles (e.g., same applications, files, schedules). Thus, in constructing a group profile, the malicious employee’s outlier behavior (i.e., an insider attack) could be flagged as unusual activity.

3 Related Work

There is a vast literature on leveraging user profiling for CA using a variety of data, such as network traffic [Yang2015Character] and usage of popular applications [Fridman2017AAMDS]. There is also a plethora of proposals involving keystroke dynamics [Ahmed2014BRBFKD, KANG201572] and mouse dynamics [sayed2013biometric, Jorgensen2011MDB]. One line of research that closely resembles ours is that of Payne et al. [Payne2013SEMAA] from the DARPA Active Authentication projects. In this work, the authors collected system events from seven volunteers in a controlled environment using SystemInternal [sysinternal] tools over a period of two hours. However, unlike our user study, time information was not associated with user activities and their experiment was also not ecologically-valid because the users were not using their own devices nor using the computers naturalistically. Similar to our work, López et al. [URUENALOPEZ201938] also used Self-Organizing Maps (SOM) to enhance insight and interpretability about user behavior data by untangling hidden relationships between variables. The authors generated the SOM visualizations using survey results that analyzed user behavior, security incidents, and fraud, as well as data from a malware scanning tool, to contrast Internet users’ digital confidence with the level of malware infection. Their approach focused on drawing qualitative conclusions from their self-reported dataset, whereas we go beyond by exploring a naturalistic dataset of user behavior using multiple machine learning models in a quantitative fashion.

In our review of related work on user profiling for CA, we observed that none of the existing methods evaluated the feasibility of computer usage profiling using ecologically-valid data, as proposed in this study. Instead, many data collection procedures restricted user behaviors to specific tasks [zhao2013CMAUNGTGF, Zhang2015TGBAUAUD] or devices [Ribeiro2015OSPCVUI]. Furthermore, none of the proposals involving profiles investigated whether such profiles were temporally consistent, which may substantially affect the effectiveness of the CA solution. Some CA proposals operate by building profiles based on short-term user activity records [Payne2013SEMAA], while others did not explore changes in user behavior over time [Huang2017APEFKD, Zhang2015TGBAUAUD]. Thus, though these studies provide valuable insight into what technologies can be employed for computer usage profiling, the feasibility and temporal robustness of this procedure remain understudied. Our findings shed light on such aspects, providing actionable recommendation for future research and design of effective, micro-longitudinal, behavior-based CA systems.

4 User Study Methodology

This IRB-approved study requested that participants installed our extractor module on their personal computers and used their devices naturally for 8 weeks. The entire study lasted from September 2020 to March 2021, wherein we successfully captured 8 weeks of computer usage data from 28 of the total participants enrolled. In this section, we detail the methods followed in this user study.

Participants. The study was originally comprised of 60 participants who were recruited via SONA222SONA is an online scheduling software used to recruit participants, manage studies, and provide a study database for university students to sign up in exchange for course credits., flyers, Internet advertising (e.g., the UF Facebook page that advertises studies), and word-of-mouth. Individuals interested in participating were guided towards an online survey to determine eligibility which, if cleared to participate, would ask the participant for informed consent along with demographic information. After study completion, the participants who were recruited via SONA were compensated with two course credits, while the remainder with a $50 Amazon Gift Card. For inclusion in the data analysis, participants were required to complete the entire 8-week study period. This excluded of the original 60 enrolled participants who experienced technical issues as discussed below, thus reducing our sample size to 28 participants ranging from 18–53 years. Table I summarizes the demographics of our participants. On average, we recorded hours of computer usage data from each participant ( hours, range: ), which corresponds to approximately 4.9 hours per study day ( hr/day, range: ; see Appendix A). We found some instances of high computer usage (e.g., 15.3 hr/day) that we hypothesize may be due to the participant leaving their device turned on for extended periods of time.

Category Metric
Total
(N = 28)
Gender Female 18 (64.29%)
Male 10 (35.71%)
Age 18–25 years 13 (46.43%)
26–35 years 11 (39.28%)
36–45 years 2 (7.14%)
46 years 2 (7.14%)
Highest
Formal
Degree
Associate 1 (3.57%)
Bachelor’s 10 (35.71%)
Master’s 0 (0.00%)
PhD/Doctorate 5 (17.86%)
Other 12 (42.86%)
Marital
Status
Single 6 (21.43%)
Married 11 (39.29%)
Divorced 3 (10.71%)
In a relationship 8 (28.57%)
Living
Condition
Alone 5 (17.86%)
With spouse/S.O. 12 (42.86%)
With child(ren) 3 (10.71%)
Assisted living 0 (0.00%)
Other 9 (32.14%)
Employment
Status
Employed 15 (53.57%)
Unemployed 13 (46.43%)
Retired 0 (0.00%)
Household
Yearly
Income
$10,000 9 (32.14%)
$10,000 to $50,000 8 (28.57%)
$50,000 to $100,000 6 (21.43%)
$100,000 5 (17.86%)
Hispanic/Latino
Ethnicity
Not Hispanic/Latino 21 (75.00%)
Hispanic/Latino 7 (25.00%)
Primary
Language
English 21 (75.00%)
Portuguese 4 (14.29%)
Sinhala 2 (7.14%)
Chinese 1 (3.57%)
S.O. = significant other
TABLE I: Summary of study participants’ demographics.

Procedure. At the beginning of the study, enrolled participants first were asked to complete an online survey to determine their eligibility: (i) be above 18 years of age, (ii) have a personal computer (desktop or laptop) that is not shared and (iii) is used regularly, (iv) have regular access to the Internet, (v) have Windows 10 Operating System installed, and (vi) reside in the United States (as per UF IRB regulations for compensation). Once the participants were deemed eligible, they received a consent form fully disclosing the study procedures, minimal study risks, and data protection measures; they were informed that the research purpose was to increase understanding of computer usage profiles. Participants were informed that keyboard strokes, file contents, and any data sent via the Internet would not be recorded, but that the following would be collected throughout the duration of the study: timestamps of when a software was opened and closed, time intervals when software is active, timestamps for when network connections are made, and timestamps for when keystrokes and mouse movements are made. Having read and electronically signed the informed consent form, participants were asked a series of demographic questions (e.g., age, ethnicity) and to install our extractor on their personal computer. This extractor records logs of system-level events and uploads them to our lab servers in a secure fashion. For each participant, the 8-week study period began on the day successful installation could be verified within our systems. Participants were instructed to use their computers naturally and were reminded not to share their personal computers with anyone while in the study. Only IRB-trained research assistants who were part of the project interacted with participants to assist them with any questions or concerns they might have.Upon completion of the study and after uninstalling the extractor, participants were asked to complete a final debriefing questionnaire comprised of seven questions pertaining to the consistency of their computer usage during the study period. After completed the debrief, participants were compensated.

Data attrition. A few issues occurred during the initial run of the study. First, because international participants were ineligible from participating as per UF IRB, we discarded three participants that were located outside the U.S. We compensated them nonetheless (with UF IRB approval) and discarded their collected data. Second, our extractor only collected the destination IP address from participants, thus we were unable to resolve which domains they accessed. To remedy this situation, we began capturing DNS Queries to resolve the destination IP address to its respective domain. All participants who encountered this issue were invited to restart the study, receiving extra compensation for their extended time. The participants who accepted were then asked to sign an addendum and promptly restarted the study once the new software update was available; the remaining who declined were compensated upon study completion and had their data discarded from the study. All these issues were properly reported to the UF IRB and approval was received to restart the study. Lastly, after study completion, we noticed that seven participants had technical issues with the extractor; we therefore opted to discard them from our sample. These issues, coupled with participant attrition, decreased our sample from 60 to 28 participants.

5 Computer Usage Profile Extraction

In this section, we describe the design and implementation of the MS Windows 10 profile extractorour team developed to collect computer usage data from the study participants. We first considered using existing user-level and system event extractor tools (e.g., [tanium, procmon, msetw, sysmon]), but decided against these options either because: (1) the tool was intended for diagnostic tasks, thus incurring prohibitive performance overheads for continuous usage, such as required for our user study [procmon]; (2) the tool required nodes where data was collected to be on the same network, which would make our user study infeasible [tanium]; or (3) the tool did not provide fine-grained process activities, such creation or termination of process/thread, the suspension/resumption of processes/threads, process memory access [msetw, sysmon], which we deemed crucial in our design for the construction of computer usage-based profiles.

Therefore, we developed our own extractor as a kernel driver for MS Windows 10, due to the popularity of this OS in organizations and among computer users in general (see Appendix B for more details). In our design, a user’s profile is comprised of fine-grained information about process and network activities the user engages with, as well as timestamps of mouse clicks and keystrokes.

Process activity logs were comprised of process ID, executable path, event time, and event type (i.e., process creation, mouse click, or keyboard keystroke). A process was considered active at a certain time if it generated a system-level event (the equivalent of a Linux system call); note that Windows does not offer capabilities to record fine-grained system calls as Linux does. We therefore approximated system events generated by a process, leveraging both the creation of and human interaction with the process to estimate the application foreground activity. For example, consider a sample process information log for the Slack application (

PID:1020). At Time: 10-15-2020 11:44:20, the Slack process is created (Event Type: Process Create, Path: C:\...\slack.exe), followed by a mouse click event (Event Type: Mouse, Time: 10-15-2020 11:44:42) and a keyboard event (Event Type: Keyboard, Time: 10-15-2020 11:45:27), both in the same Slack application (i.e., same Path and PID).

Network activity was logged as a series of connection sessions associated with the processes that initiated them and comprised of connection information: IP address and port number (source and destination), average bytes uploaded and downloaded, connection start and end timestamps, and subsequent traffic information (traffic direction, i.e., inbound or outbound, and timestamp of when the traffic was intercepted). An activity that belongs to the same connection session will have the same source, destination IP addresses, and port numbers, as well as a sequence of DNS sessions associated with the processes that initiated them (web URL and its IP addresses). These DNS sessions allowed us to map the domain to the destination IP address of the network connections. Appendix B provides examples of network logs.

6 Data Analysis and Results

This section goes over our data pre-processing steps and the machine learning analysis methodology, which involved a diverse set of methods including online and offline learning models, as well as Self-Organizing Maps.

6.1 Data Pre-processing

We pre-processed the raw data of each participant to generate an matrix where each line represented one minute of computer activity on a given study day (the number of lines therefore varied among users according to their computer usage). The columns contained: (i) a timestamp, (ii) a list of all active processes, (iii) a list of all domains accessed, (iv) the number of clicks associated with the timestamp, (v) the number of keystrokes associated with the timestamp, and (vi) an indicator (binary) of background processes activity that was detected via the occurrence of network traffic coinciding with lack of keyboard/mouse events, indicating a network activity that was generated by the process independently and without the interaction of the user (e.g., automatic updates running without the user’s knowledge, or the user listening to music on YouTube without interacting with the browser for a long time period).

6.2 Feature Extraction

Fig. 1: Example of sliding window with window size minutes.

An important aspect to be considered when generating profiles is the window of profile data needed for deciding whether the current behavior actually belongs to a certain user. We define this period as a sliding window of time. Since our logs were collected on a minute-basis, our window size can assume any integer corresponding to the number of minutes. Every minute, a window size of minutes is analyzed by the ML classifier, which makes a prediction about whether the window of activity belongs to a certain user. As illustrated in Fig. 1 for a window size of minutes, we summed the number of clicks, keystrokes, and background traffic activity (all integer numbers), concatenated the strings of the processes and domains used within consecutive -minute windows, and kept the last timestamp of the window, generating a new matrix that groups computer usage in windows. A disadvantage of this technique is that larger values are more susceptible to the cold-start problem (i.e., when there is a lack of initial information to start the recognition task, common in recommendation systems [jianbo2016]).In our experiments, we tested several lengths of windows sizes (, , , , , and minutes) to analyze the impact of small and large windows in predicting user identity based on the profiles.

Given that the numerical features (# of clicks, keystrokes, and background traffic activity) only need to be normalized before being used by a classifier, our feature extraction process focused on the textual attributes of the matrix (list of processes and domains). For this purpose, we opted to use TF-IDF (Term Frequency-Inverse Document Frequency), which is a statistical measure that evaluates how important a word (in our case, a process or website) is to a text in relation to a collection of texts 

[10.5555/1394399]. Each text is represented by a sparse array that contains their TF-IDF values for each word in the vocabulary. We envisioned that TF-IDF brings many advantages for applications in CA, given that (i) the more a word (e.g., process) appears in an instance, the larger its feature weight is and (ii) the less a word appears in all files, the higher its importance is to distinguish instances.

By combining the sliding window technique and TF-IDF features, the classifier receives as input an array that represents the window, containing the numerical features (sum of number of clicks, keystrokes, and background traffic activity) and the TF-IDF values for the processes and domains in the window, all normalized using maximum absolute scaler [scikit-learn]

. Thus, the textual attributes are used as a “document” that represents the user for a given moment.

6.3 Machine Learning Analysis

Our machine learning experiments aimed to test the uniqueness of our profiles (RQ1), identify top feature in determining unique profiles (RQ2), and investigate profiles’ temporal consistency (RQ3).

6.3.1 RQ1: Do computer users have unique computer usage profiles?

The goal of the first set of experiments was to examine whether the profiles are unique (RQ1). For this purpose, we considered two types of machine learning models: (i) offline, where the classifier is trained only once using an initial portion of the data and then used to perform predictions for the remaining data without retraining; and (ii) online, where the classifier is periodically updated with more recent data. Offline classifiers are more popular in the ML field (mainly in stationary distribution problems where data does not change over time, such as object recognition) while online classifiers are conceptually more suitable to handle behavioral changes in profiles that may take place over time, given that they belong to a non-stationary distribution (i.e., they may change over time) [MOA-Book-2018], so we included both types in our experimental design.

In both cases (offline and online learning), we conducted multi-class classification experiments, where a single predictive model was created based on the data of all users and then used to distinguish them among each other, and one-class classification experiments, where one predictive model was created for each user based on their data only (i.e., regardless of other users’ data) [han2011data]. We decided to evaluate both scenarios because each comes with strengths and pitfalls in the context of CA applications. On one hand, one-class classification is conceptually better suited for applications in CA systems because it does not require retraining every time a new user is added or removed from the group. One-class methods can also handle temporal changes in profiles in a per-user basis, which is a desirable capability given that such changes may be user-specific, depending on factors like travel, role, etc. On the other hand, multi-class approaches are likely to yield more accurate predictions because they have access to all classes when building the model, thus creating better decision boundaries.In addition, multi-class yields better results than one-class classification when a large number of non-target classes (i.e., the data from other users) is available [10.1007/978-3-540-89378-3_32]. Finally, multi-class classifiers can be used as binary discriminant functions, where the data from a given class (a given user) is considered positive, and the data from all the other classes (the remaining users) is considered negative [10.5555/1162264].

In total, we trained and tested models in our experiments, considering all variations of classifiers, parameters, number of runs and window sizes.

Offline Classification

. For the offline experiments, we chose a total of six well-known classifiers: four multi-class (Random Forest, Stochastic Gradient Descent (SGD), Multi-Layer Perceptron (MLP) and LinearSVC 

[scikit-learn]) and two one-class (Isolation Forest, and One-Class SVM with both RBF and linear kernel).

For the multi-class models, we used the default parameters in Scikit Learn [scikit-learn] for Random Forest (

estimators), Stochastic Gradient Descent (hinge loss function), Multi-Layer Perceptron (

hidden layers and relu activation function), and LinearSVC (Support Vector Machine linear kernel) for the SVM classifier. To evaluate these models, we split the data (ordered by timestamps) in two sets: the first seven days of data (of each user) were used for training and the remaining for testing, as shown in Fig. 

2. In addition to the standard multi-class classifiers, where a single model is built with the labels of all participants in the training data, we also created binary models (one for each user, where the task is to detect if the current behavior window belongs to a given user) with each multi-class classifier cited before.

For the one-class models, we also adopted the default settings in Scikit Learn [scikit-learn] (isolation forest with estimators and two one-class SVM, the first with RBF kernel and the second with linear kernel). For each user, we created an outlier set containing the instances of all the other 27 users, excluding the first seven days—that is, the same instances taken as test set in the multi-class experiments, for a fairer comparison among one-class and multi-class results.

We used the F-score (i.e., the harmonic mean of precision and recall) as our evaluation metric, as it provides a more realistic measure of a model’s performance for unbalanced datasets compared to more popular metrics like accuracy 

[han2011data]. For all classifiers, the F-scores were obtained using the average of ten runs with different random states [Breiman2001RF].

Fig. 2: Training and test datasets construction by using the first seven days of data by user in the former, and the remaining data in the latter.

Online Classification. For the online experiments, we selected four classifiers that support online learning (given that the majority are implemented in batch learning only): three multi-class (Adaptive Random Forest; Stochastic Gradient Descent, SGD; and Perceptron [scikit-learn]) and one one-class (Half-Space Trees [10.5555/2283516.2283647]). We leveraged the default parameters in Scikit Multiflow [10.5555/3291125.3309634] and River [2020river] for both Adaptive Random Forest and Half-Space Trees [10.5555/2283516.2283647] (which was the only online one-class classifier available at the time of this writing), and the default parameters in Scikit Learn for SGD and Perceptron. The same training and test sets from the offline experiments were used. The training set was used to train the initial model and the test set to create a data stream, which was considered in a test-then-train evaluation, where each sample is tested by the model (generating a prediction used to compute the F-score) and then used to updated it [MOA-Book-2018]. Again, the F-scores were obtained using the average of ten runs, with different random states.

Results. Figure 3 exhibits the F-scores obtained from the offline classifiers, while Fig. 4 shows the F-scores resulted from the online models used to distinguish among the profiles. For the offline models, the top F-scores were 95.99% in the multi-class setting with the Random Forest (for a window size of min), 94.08% in the binary setting with the LinearSVC ( min), and 66.23% in the one-class setting with the SVM with linear kernel ( min). For the online classifiers, we observed top F-scores of 99.94% in the multi-class setting with the Perceptron model ( min), 99.77% in the binary setting again with Perceptron ( min), and only 0.03% in the one-class setting with the Half-Space Trees ( min)—the reason behind such a poor result is that Half-Space Trees work better when anomalous data are rare, which is the opposite of our scenario [10.5555/2283516.2283647]. In summary, all tested offline classifiers except the one-class ones exhibited F-score greater than 90% in distinguishing among profiles for at least one of the tested windows sizes. The online models yielded even better results; all of them (except the one-class again) reached a F-score greater than 99% for at least one window size. It was possible to distinguish among computer usage profiles with a maximum F-score of 95.99% for offline classifiers and 99.94% for online models.

Moreover, for both offline and online models, we observed that the F-scores improved with the increase of the window size. In general, optimal results were reached for min given that the increase of to 30 min and 60 min did not cause the F-score to improve substantially. For example, the average F-score computed among all tested offline multi-class classifiers increased 16.37 pp (percentage points) when moving from min to 2 min (58.43% vs. 74.80%), 13.70 pp when moving from 2 min to 5 min (74.80% vs. 88.49%), 3.58 pp when moving from 5 min to 10 min (88.49% vs. 92.08%), but only 0.31 pp when moving from 10 min to 30 min (92.08% vs. 92.39%) and 0.48 pp when moving from 30 min to 60 min (92.39% vs. 92.87%). In other words, offline and online classifiers exhibited better F-scores with an increase in window size, with optimal results achieved for a window size of 10 minutes.

When comparing online vs. offline classifiers in distinguishing among profiles, the former yielded better top F-scores than the latter in the multi-class (99.94% vs. 95.99%) and binary (99.77% vs. 94.08%) settings. The opposite was observed in the one-class setting, with a higher top accuracy for the offline models compared to the online one (66.23% vs. 0.03%). Moreover, in both offline and online learning experiments, poorer F-scores were observed for the one-class models compared to the multi-class and binary ones.

(a) Multi-Class.
(b) Binary.
(c) One-Class.
Fig. 3: Offline Classification Results.
(a) Multi-Class.
(b) Binary.
Fig. 4: Online Classification Results.

6.3.2 RQ2: What are the most important features to recognize computer usage profiles?

To analyze the most important features for each user, we first divided their data in weeks, so that we could observe which are the most important over time. We trained a binary Random Forest classifier for each week of each user with a time window of 10 minutes (the optimal value found in RQ1), resulting in models trained. We then used LIME [lime], an algorithm that can explain the predictions of any classifier by approximating it locally with an interpretable model, to explain our models’ predictions of each instance of the corresponding week and user. Given that our models are binary, LIME explanations are composed of positive values for the features that are more important to detect the corresponding user, and negative values for the features that are more important to the remaining users. Finally, we computed these values for each feature and obtained their average, where we collected the top 10 for each user.

Results. After unifying the top 10 features from each of the 28 participants and removing duplicates (i.e., common for 2 users), we obtained a set of unique features in which (%) were websites and (%) were processes. The three remaining features refer to background traffic (which appeared for three users), and number of mouse clicks and number of keystrokes (both appearing for a single user). In other words, the top 10 features of each user were mostly comprised of websites. From these top features, were observed for two or more users (e.g., google.com was considered a top feature for two different users). In summary, among the top features in recognizing computer usage profiles, % were websites; the remaining % were comprised of processes, background traffics, number of mouse clicks, and keystrokes.

6.3.3 RQ3: Are computer usage profiles temporally consistent?

We aimed to investigate whether the profiles changed over time. We decided to examine the profiles over the study weeks under the assumption that computer users usually repeat their behavior on a weekly basis; however, we acknowledge that there may be exceptions where computer usage reoccur on different frequencies, depending on, for example, the user’s occupation (see Sec. 7.1).

Towards this goal, we first checked for significant changes in the number of hours that each computer user used their device over the weeks. To do so, we calculated the coefficient of variation (CV) for each user, which is defined as the ratio of the standard deviation to the mean. The CV measures the level of dispersion around the mean where higher coefficients indicate greater dispersion 

[brown2012applied]. However, it is possible for a computer user to use their device regularly in terms of number of hours per week, but change their behavior in terms of applications used and websites visited. For example, a computer scientist may use a diverse set of applications to run machine learning experiments for a given research project (e.g., Python scripts to process the data, Microsoft Excel to store important results, R Studio to analyze the results) and then switch to a simpler profile to write a report or paper with their findings.

Therefore, we decided not to rely solely on the CV. As a second step, we checked for behavioral changes taking place over time in terms of process and website used. For this purpose, we used visualizations generated by Self-Organizing Maps (SOMs), an architecture of unsupervised Artificial Neural Network that maps high dimensional data into low dimensional space using fully connected neurons 

[SOM]. During training, SOM groups similar neurons, creating similarity clusters that reflect attribute relationships of input data. We used the Unified Distance Matrix (U-Matrix), which visualizes the distances between network neurons, whose clusters are represented in light areas of the image [umatrix]. In our experiments for each user, we incrementally trained a SOM for each week of data and generated their visualizations, creating images and models ( weeks users). After that, for each user, we computed the similarity between the grayscale SOM images for two consecutive weeks, and (i.e., weeks 1 vs. 2, 2 vs. 3, etc.) using the Mean Squared Error (MSE) which estimates perceived errors to quantify image degradations by calculating the sum of the squared difference between the two images. Higher MSE values indicate less similarity between two images ( denotes perfect similarity). We used these MSE values as “internal” measures to help us quantify the differences between the SOM images, thus helping guide our focus during the analysis of our results.

Results. Table II exhibits the number of computer usage hours per week for all users along with some statistics, including the mean, standard deviation, range, and CV. Fig. 5 displays a heat map with the MSE results from the analysis of the SOM images for all participants.

User
(De-Identified)
Statistics
Mean Std Min Max CV
1 17.77 5.00 9.53 27.40 28.16%
2 11.43 27.97 1.05 80.65 244.64%
3 107.42 22.21 72.52 129.18 20.67%
4 40.61 20.99 8.88 77.20 51.70%
5 35.39 5.88 28.92 46.98 16.62%
6 17.47 9.77 7.45 32.17 55.91%
7 49.01 19.09 29.43 76.60 38.95%
8 4.23 4.78 0.00 14.40 112.99%
9 47.37 3.84 38.30 50.45 8.10%
10 21.98 6.48 15.25 36.47 29.47%
11 31.85 17.56 15.80 59.02 55.13%
12 25.40 34.90 0.00 82.15 137.43%
13 33.68 21.04 4.47 59.43 62.47%
14 28.30 11.68 10.07 43.97 41.27%
15 23.58 16.54 3.42 46.63 70.13%
16 27.82 23.70 0.00 53.52 85.20%
17 10.77 12.68 0.00 27.32 117.75%
18 3.93 2.39 0.35 7.90 60.90%
19 76.62 49.73 14.20 152.33 64.90%
20 41.04 16.00 19.97 61.87 38.98%
21 8.23 7.63 0.00 20.57 92.71%
22 31.26 27.82 0.00 63.08 88.99%
23 30.79 11.71 14.00 52.10 38.03%
24 76.30 14.36 56.52 95.60 18.82%
25 11.06 4.39 4.57 16.12 39.69%
26 34.24 7.38 18.00 40.75 21.54%
27 52.07 13.29 25.47 67.88 25.53%
28 53.93 9.11 39.15 64.58 16.90%
TABLE II: Summary of users’ computer usage time.
Fig. 5: Heat map of MSE results.

As one can observe in Table II, some users used their computer less regularly than others in terms of weekly active time over the 8-week study period, thus presenting larger CV results. The largest variability was observed for User , who used their device for hours in the first week (88.17% of their total active time while in study) but for no longer than hours in the following weeks. Interestingly, six users (i.e., 21.43% of the sample) completely stopped using their devices for one or more weeks, including User (no activity for weeks seven and eight), User (no activity for weeks five, seven, and eight), User (no activity for weeks seven and eight), User (no activity for weeks six to eight), User (no activity for weeks six and seven), and User (no activity for weeks six to eight).

Nine users (i.e., 32.14% of the sample) presented a more uniform computer usage time over the weeks (i.e., lower CV scores): Users , , , , , , , , and . However, when observing Fig. 5, large MSE results (i.e., indicating less similarity throughout the weeks) were obtained for most of these nine users, except Users and . Figures 6 and 7 exhibit the SOM images per week for User (relatively regular behavior over the weeks) and User (irregular behavior, especially over the three first weeks), respectively. In other words, among the nine users who presented more uniform behavior in terms of weekly usage time, only Users and exhibited similar SOM images over the weeks, which suggests a more consistent behavior in terms of computer usage patterns (i.e., processes used, websites visited, and number of mouse clicks and keystrokes performed). In summary, only two participants exhibited computer usage profiles relatively consistent over our study’s eight weeks, while the other 26 users (92.86% of our sample) exhibited relatively significant changes in their profiles.

(a) Week 1
(b) Week 2
(c) Week 3
(d) Week 4
(e) Week 5
(f) Week 6
(g) Week 7
(h) Week 8
Fig. 6: SOM images over weeks for User 09 (regular behavior).
(a) Week 1
(b) Week 2
(c) Week 3
(d) Week 4
(e) Week 5
(f) Week 6
(g) Week 7
(h) Week 8
Fig. 7: SOM images over weeks for User 24 (irregular behavior).

7 Discussion

Computer Usage Profiles Uniqueness. Our results from RQ1 partially supported the hypothesis that computer usage profiles comprised of process-, network-, mouse-, and keystrokes-related events constitute a useful and feasible source of data to uniquely characterize computer users. While most of the tested machine learning classifiers yielded promising results (F-scores greater than 94% with the offline models and greater than 99% with the online models), the one-class models achieved poor results in both offline (maximum F-score of 66.23%) and online (maximum F-score of 0.03%) settings. The reason behind such poor results for the online one-class model is that it works better when anomalous data are rare, which is the opposite of our scenario [10.5555/2283516.2283647]. In other words, our profiles can be considered unique depending on the learning model used for the recognition task. Importantly, our top feature analysis (RQ2) revealed that the set of domains accessed by the computer users was more relevant to distinguish them from one another compared to processes and/or mouse/keystrokes activity.

[] Computer usage profiles encompassing process-, network-, mouse-, and keystrokes-related events have the potential to uniquely characterize computer users when analyzed with offline and online classifiers in both multi-class and binary settings, with network events constituting the most useful source of information to distinguish among user profiles.

Computer Usage Profiles Consistency. Our analysis of the temporal consistency of the computer usage profiles (RQ3) revealed that most of the computer users enrolled in our study (26 out of 28 participants, i.e., 92.86% of our sample) did exhibit relatively significant changes in the way they used their device over the 8-week study period. Several users presented a significant variation in their computer usage time over the weeks, which may be caused by a myriad of factors. For instance, it is plausible to assume that a student may use their computer for longer periods of time over certain weeks while others may not (e.g., when working on assignments or projects vs. when studying for tests). Therefore, computer usage behavior may be related to many variables, including user’s employment status and field/occupation. Even two professionals from the same field may present different profiles depending on their sub-area and personal preferences (e.g., one also uses the computer to play games online and listen to music while the other only uses the computer to work).

Interestingly, some users exhibited relatively regular behavior in terms of weekly computer usage time (9 out of 28 participants, i.e., 32.14% of our sample), but most of them (7 out of those 9, i.e., 77.78%) changed their usage patterns over the weeks in terms of websites visited, software used, and/or mouse clicks and keystrokes performed. This likely reflects the behavior of individuals who use their computer on a regular basis (e.g., approximately eight hours per day every day) but perform a diverse set of tasks involving different software and websites. For example, a college student may use different sets of software and websites when working on a project or report compared to when studying for a test.

Moreover, it is important to highlight that the current COVID-19 pandemic has greatly altered how users in general work, learn, socialize, and entertain themselves using their personal devices [Koeze2020-zm], and our data collection happened during the months of the pandemic. Several activities of daily life are now being performed virtually due to safety concerns, such as grocery shopping, bank services, classes in general (e.g., schools, universities, gyms), watching movies or concerts etc., and many of these activities may not happen on a regular basis (e.g., someone might buy groceries twice a week for two weeks and then spend three weeks without buying anything). This is very likely to affect computer usage profiles in a irregular fashion, causing users to use the same computer for personal and work tasks and exhibit drifts in computer usage time and activity (e.g., software used and websites visited).

[] The computer usage profiles were mostly not temporally consistent over the study period.

Challenges and Opportunities of Using Computer Usage Profiling for Applications in CA. In terms of window of data required to properly recognize profiles, our experiments revealed that the longer the time window used by the classifiers, the more accurate the predictions are. We found that a time window of 10 minutes was the optimal trade-off for our dataset, but this can potentially vary depending on the setting. This may be a challenge for CA systems based on computer usage profiles given that the longer the time window needed to make predictions, the more susceptible the system is to a cold-start problem; i.e., more minutes would be needed to start the recognition of the user, causing the device to be initially vulnerable for a longer period of time. Importantly, once the recognition task starts, this is no longer a problem because a new window can be obtained every minute. In other words, the cold-start problem would not necessarily make the system slower at flagging anomalies once the classification starts. Moreover, the longer it takes for the system to recognize an anomaly, the less value the CA system might have for corporate security, because the anomaly could correspond to a malicious activity whose effects persist after detection.

Regarding the feasibility of using computer usage profiles for CA applications, our data suggests that online classifiers are better suited for behavior-based CA tools than offline models. While offline models are trained only once, online models can be periodically updated using recent data, thus capturing changes in the user’s behavior taking place over time. Indeed, most tested online models slightly outperformed the offline ones in this study. Moreover, temporal changes in profiles might be even more pronounced in larger and more diverse groups of computer users, which could potentially increase the suitability of online models compared to offline ones. It’s important to highlight that online learning models are not a standard in many machine learning libraries. For instance, Scikit Learn, one of the most popular machine learning frameworks for python, implements only six online learning classifiers, and all of them are multi-class. To the best of our knowledge, there is a single one-class online classifier available so far which is Half-Space Trees [10.5555/2283516.2283647], implemented in Scikit Multiflow [10.5555/3291125.3309634] and River [2020river]. Aside from one-class classifiers worst classification results, this was one of the reasons we focused on creating binary classifiers instead. Thus, we advocate for more online learning implementations, given that they are essential for non-stationary problems (i.e., subject to temporal changes) in practice such as CA.

7.1 Limitations and Future Work

One limitation of our study lies in our sample, which is small () and comprised mainly of young adults from a large university. Therefore, our sample is not representative of all possible corporate environments. Although a large university of nearly 50K students is an organization, corporate users likely present different patterns of computer usage compared to a university student. These factors thus limit the generalizability of our findings; nonetheless, our study’s focus is on providing initial evidence on the feasibility, challenges, and opportunities of computer usage profiles for applications in CA, cognizant of the need of future studies with more diverse and representative samples.

Importantly, we collected our study’s data during the COVID-19 pandemic, which might have affected user behavior and, consequently, influenced our conclusions towards profiles’ uniqueness and temporal consistency. The yet unforeseeable changes to how our society may shift after the pandemic is unresolved and how this may impact the user profiling analyses presented in this paper will require investigations from future work (e.g., will there be more sparse personal computer usage as users return to work in offices and classrooms?).

Another limitation of our study concerns the integrity of the systems in which computer usage data was recorded: we assumed that the users’ systems were not compromised by malware during the study period. The presence of malware could have adversely affected the learning models, as malicious behaviors would be considered part of the usage profile. In addition, although we debriefed all participants upon study completion to confirm that they did not share their computers while in study, some self-reported responses might not be accurate.

Our analysis of temporal consistency in computer usage profiles also has limitations. First, under the assumption that computer users repeat their behavior every week, we checked for temporal consistency in profiles over weeks. However, depending on the user’s occupation, the usage profile can be consistent on different cadencies (e.g., bi-weekly or even monthly); thus, for some participants, what we may have detected were not real behavioral changes, but normal changes related to their usage pattern if observed over a period longer than one week. Therefore, future analysis of temporal changes over longer periods is warranted. Furthermore, since we analyzed user behaviors using a time window, in some cases, when the behavior is about to change, the classifier may have difficult in detecting the shift because the window will gradually change with time, especially when its size is large ( minutes, for instance). Thus, future work addressing this issue is needed to improve classification performance.

Moreover, to evaluate the performance of the tested learning models, we only considered traditional metrics such as F-score and recall, which might not be ideal for CA systems [sugrim2019robust]. Future work is advised to consider more robust metrics, such as frequency count of scores along with receiver operating characteristics (ROC) curve [sugrim2019robust].

8 Conclusions

We conducted an ecologically-valid user study with 28 participants to systematically investigate whether computer usage profiles comprised of process- and network-related events and mouse clicks and keystroke events are unique and temporal consistent. Additionally, we investigated challenges and opportunities of using such profiles in applications of CA. In our comprehensive set of experiments, offline and online machine learning models were able to accurately recognize the computer users, indicating that computer usage profiles can potentially represent a feasible source of data to uniquely characterize users. After assessing the features considered to be more relevant by the learning models, our data indicated that websites accessed by users were more relevant in recognizing them than the software they used or mouse clicks and keystrokes they performed. Moreover, we found no evidence of temporal consistency in computer usage profiles, with 92.86% of our sample exhibiting behavioral changes over the 8-week study period. Accordingly, our online classifiers (retraining occurring periodically) outperformed the offline ones (training occurring only once) in recognizing computer users. We therefore concluded that online models should be preferred over offline ones for behavior-based CA systems to better capture behavioral changes in computer usage profiles that may take place over time. Lastly, we concluded that binary classifiers are better suited than multi-class and one-class ones in handling computer usage profiles because they reached promising results (unlike the one-class models) while allowing for training on a per-user basis (unlike the multi-class models). Findings from this work have the potential to inform future behavior-based CA research and development regarding important aspects. While computer-based CA can potentially be promising for protecting common users or corporations against intrusions happening after a user is authenticated (e.g., stealthy malware which conflicts with typical use of computer, or an intrusion due to an attacker having discovered a user’s password), more research is warranted before such methods can become the state-of-the-practice.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No 1815557. This material is based upon work supported by (while serving at) the National Science Foundation.

References

Appendix A Dataset Overview

Table III shows the number of active days, usage hours, and average number of hours of computer usage per study day produced by each study participant during the 8-week study period.

User
(De-Identified)
Weekly Usage Hours
Total Active
Days
Total Active
Hours
Average
Hours/Day
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8
1 16.17 19.63 15.37 17.40 19.37 17.32 9.53 27.40 48 (86%) 142 2.5
2 80.65 1.93 1.45 1.08 1.27 1.05 1.88 2.15 54 (96%) 91 1.6
3 72.52 112.08 129.18 74.80 106.45 125.35 125.02 113.98 56 (100%) 859 15.3
4 51.10 32.47 38.27 77.20 39.88 8.88 21.97 55.08 56 (100%) 325 5.8
5 34.37 28.92 35.45 30.98 46.98 37.98 38.43 29.98 56 (100%) 283 5.1
6 10.22 32.17 7.85 25.42 17.65 11.30 7.45 27.72 37 (66%) 140 2.5
7 72.83 76.60 35.53 36.85 64.20 33.25 43.42 29.43 53 (95%) 392 7.0
8 6.67 1.57 14.40 5.52 1.87 3.80 0.00 0.00 29 (52%) 34 0.6
9 47.72 49.37 46.90 49.43 48.97 38.30 50.45 47.85 56 (100%) 379 6.8
10 23.88 20.37 16.92 21.02 36.47 22.38 15.25 19.58 54 (96%) 176 3.1
11 21.13 59.00 59.02 32.52 19.37 20.15 15.80 27.78 55 (98%) 255 4.5
12 82.15 56.00 61.60 0.92 0.00 2.50 0.00 0.00 24 (43%) 203 3.6
13 37.82 44.12 55.37 42.98 15.08 4.47 59.43 10.17 55 (98%) 269 4.8
14 42.28 43.97 27.77 21.78 35.77 19.87 10.07 24.88 53 (95%) 226 4.0
15 10.27 45.17 22.83 46.63 15.25 33.95 11.12 3.42 48 (86%) 189 3.4
16 41.20 29.72 50.62 53.52 45.98 1.52 0.00 0.00 37 (66%) 223 4.0
17 9.32 27.32 23.23 26.08 0.20 0.00 0.00 0.00 24 (43%) 86 1.5
18 4.03 0.35 2.52 3.63 4.33 2.22 7.90 6.43 47 (84%) 31 0.6
19 133.73 93.62 39.02 48.67 94.85 152.33 36.57 14.20 49 (88%) 613 10.9
20 49.85 61.87 61.07 46.88 30.45 19.97 30.75 27.52 56 (100%) 328 5.9
21 20.57 14.05 14.52 8.55 1.65 0.00 0.00 6.50 34 (61%) 66 1.2
22 63.08 46.70 56.87 55.15 28.30 0.00 0.00 0.00 31 (55%) 250 4.5
23 52.10 30.67 27.93 19.10 36.07 37.67 14.00 28.82 50 (89%) 246 4.4
24 95.60 56.52 74.42 79.62 94.02 78.57 57.87 73.80 56 (100%) 610 10.9
25 12.58 16.12 9.50 14.55 7.67 4.57 7.42 16.12 50 (89%) 89 1.6
26 31.48 18.00 32.70 37.12 40.75 34.87 39.95 39.08 55 (98%) 274 4.9
27 67.88 60.33 49.43 62.18 54.35 54.38 42.53 25.47 56 (100%) 417 7.4
28 39.15 59.95 40.60 55.80 58.00 56.23 64.58 57.12 56 (100%) 431 7.7
Average 47.7 (85.2%) 272 4.9
TABLE III: Overview of active time over the 8-week study period per participant.

Appendix B Extractor Module Details

The architecture of our extractor comprises two modules: a profile extractor and a log uploader (Fig. 8). The former relies on the Windows kernel notification about system events and contains two sub-modules: a process monitor and a network monitor. The latter is a user-space application in Python, which compresses and securely uploads the log files to our server using the https protocol every five minutes.

Fig. 8: Architecture of the computer usage profile extractor containing the Profile Extractor and the Log Uploader.

Table IV exhibits a sample network recording for the Chrome web browser at 3:28 PM on a given day. The extractor recorded the establishment of a connection between the user’s computer and 35.51.247.37 (sina.com.cn). Then, at 3:29 PM, the extractor recorded traffic from the server to the user’s machine. At 3:30 PM, traffic was recorded from the user’s device to the server. Finally, the connection was closed at 3:30 PM. Table V displays a sample DNS query made by Chrome in the aforementioned connection to get IP addresses of sina.com.cn and a sample response from DNS with IPv6 and IPv4 addresses.

Connection information Traffic information
PID
Con.
hdl
IP
Prot.
Local
IP addr
(hex)
Local
port
#
Remote
IP addr
(hex)
Remote
Port #
Con.
est.
time
Con.
close
time
Traffic
direct.
Traffic
time
Traffic
direct.
Traffic
time
3844
(Chrome)
125-
288
V4
10.255.
48.22
55321
35.51.
247.37
443 3:28 3:30 0 3:30 1 3:29
TABLE IV: Example of network information recording.
PID Event Time URL
DNS Result
for URL
DNS Query
Status
1408
09-23-2020
18:23
sina.com.cn 35.51.247.37
0
TABLE V: Example of a successful DNS session recording.