ABBA: A quasi-deterministic Intrusion Detection System for the Internet of Things

08/09/2021
by   Raoul Guiazon, et al.
0

An increasing amount of processes are becoming automated for increased efficiency and safety. Common examples are in automotive, industrial control systems or healthcare. Automation usually relies on a network of sensors to provide key data to control systems. One potential risk to these automated processes comes from fraudulent data injected in the network by malicious actors. In this article we propose a new mechanism of data tampering detection that does not depend on secret cryptographic keys - that can be lost or stolen - or accurate modelling of the network as is the case with existing machine learning based techniques. We define and analyse the mathematical structure of the proposed technique called ABBA and propose an algorithm for implementation.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/28/2021

Feature selection for intrusion detection systems

In this paper, we analyze existing feature selection methods to identify...
11/13/2019

Machine Learning Based Network Vulnerability Analysis of Industrial Internet of Things

It is critical to secure the Industrial Internet of Things (IIoT) device...
12/08/2019

Detecting Cyberattacks in Industrial Control Systems Using Online Learning Algorithms

Industrial control systems are critical to the operation of industrial f...
08/11/2019

Efficient Intrusion Detection on Low-Performance Industrial IoT Edge Node Devices

Communication between sensors, actors and Programmable Logic Controllers...
11/08/2017

Privacy Preservation Intrusion Detection Technique for SCADA Systems

Supervisory Control and Data Acquisition (SCADA) systems face the absenc...
12/15/2017

Side-channel based intrusion detection for industrial control systems

Industrial Control Systems are under increased scrutiny. Their security ...
07/24/2020

A Comparative Study of AI-based Intrusion Detection Techniques in Critical Infrastructures

Volunteer computing uses Internet-connected devices (laptops, PCs, smart...

Code Repositories

ABBA_Concept

An intrusion detection mechanism for the internet of things based on pseudo random pulses.


view repo

I Introduction

A vast amount of our modern infrastructure relies on a network of sensors and actuators to automatically perform various tasks. The purpose of this automation is generally to increase efficiency and decrease waste and human risk. This has typically been the case for industrial control systems (ICS) that often form the core of critical national infrastructures (CNI). In recent years the topology of the networks underpinning these infrastructures has changed, moving from somewhat isolated networks using dedicated and often proprietary technologies to transport and process data to more standard internet connected devices that often rely on cloud processing to make critical decisions (figure 1).
Although this new topology increases the capabilities and flexibility of these networks by allowing more interoperability between systems and a better understanding and monitoring of assets and processes, it also opens up more vulnerabilities for cyber criminals to enter networks.
Cyber attacks on CNIs are not theoretical, in 2015 the world witnessed the first known power outage caused by a malicious cyber attack that happened when utility companies in Ukraine were hit by the BlackEnergy malware. In February 2021 a water treatment facility in Florida was attacked, the attacker remotely increased the levels of sodium hydroxide content from 100 parts per million to 11,100ppm putting at risk 15000 people relying on this plant for clean water. More recently on May 7 2021, the US issued emergency legislation after Colonial Pipeline which carries almost half of the East Coast supply of diesel petrol and jet fuel was hit by a ransomware cyber-attack.
In the Siemens report ”Caught in the Crosshairs: Are utilities Keeping Up with the Industrial Cyber Threat?” it is found that 30% of attacks on OT (Operational Technologies) are not detected. Given the complexity of the systems utilised to automate our infrastructures, cars and cities with often millions of lines of codes running on top of various hardware, it is impossible to guarantee that no vulnerabilities will ever be found and be exploited by adversaries.
Cyber criminals exploit multiple routes into their target systems, from phishing attacks to software vulnerabilities. Bad practices from a supplier also can have important repercussions further down the chain. For example, hard-coded admin credentials on a device or a key server breach at a device manufacturer can enable an attack on the end user’s network. The weakest link will be the point of entry into the network.
The purpose of this work is to help secure the fleet of small devices that often relay sensing data to a network controller to be processed an relied upon for critical decision making, this could be for the Advance driver-assistance system (ADAS) of a vehicle or an ICS. We consider that the aim of the attacker is to modify the behaviour of the automated system under attack by feeding fraudulent data to the decision making unit. In essence, we are interested in developing a method to detect such an attack even in the case where the attacker has a copy of the secret key used by a legitimate device to authenticate with the network.

Fig. 1: Modern network topology

Multiple approaches are often utilised to protect digital networks from cyber threats and all participate in making those infrastructures safer. The most common layer of protection is often based on cryptographic techniques to ensure confidentiality, integrity or authenticity of the traffic [7172449]. This layer requires the generation and distribution of secret keys and the management of these keys over the life of a device. Often enough this first layer is inexistent, poorly implemented [fi12030055] or become obsolete due to new vulnerabilities found in key protocols [5601275]

. Other methods focus on the physical layer of the communication stack using channel state estimation to generate secret keys

[9270035] or using jamming to reduce the signal quality of an eavesdropper [8627099, 7524448]. These methods are not adapted for situations where channel variations are limited or where jamming could affect neighbouring networks.
To complement the techniques mentioned before, some systems also implement intrusion detection systems (IDS). IDS often function at a higher level of abstraction, monitoring access permissions and traffic patterns which they compare to a baseline behaviour considered normal for a specific network. Current IDS techniques are often built using machine learning techniques or rules based methods [8365277]. With machine learning, the IDS needs reliable training data to learn the “normal patterns” on the sensor network, in this case the problem is that defining what “normal” means is a challenge. Any bias in the training data will increase the risk of false positives and false negatives. With rule-based methods, attack signatures are listed in the IDS memory to enable it to identify similar attacks in the future. This method fails against zero-day attacks and requires the IDS to be kept up to date when new attacks are discovered. Moreover, IDS systems are often checking the traffic from the internet into the internal network and less so the traffic coming from the sensors.
The technique devised in this paper does not rely on training a model, maintaining an attack signature list or managing secure keys although these can be complementary. We call this method Artificial Behaviour Based Authentication (ABBA), it is a mechanism by which a device or multiple devices generate a pattern in their network to facilitate the detection of anomalies and intruders on the network. The pattern created is the artificial behaviour of the network which is built using a time code defined in this document.​

In the following sections we build the theoretical framework for a robust Intrusion Detection System (IDS) that monitors the communication link from the sensors to the core network for signs of cyber attacks and enables early detection of cyber incidents. We put in place the framework to assess the detection probability of such IDS and provide an algorithm to implement this technique on a physical system. In this work we purposely do not dive into the exact implementation of the technique as this depends on multiple parameters that can be optimised for each individual application. One such implementation will be described in our github repository

https://github.com/abbaiot. Here we focus on describing ABBA and demonstrating its ability to reliably detect anomalies such as data losses or data injections and tampering due to a third party.
The structure of this paper is as follows, section II lays out the structure of ABBA and the different key elements that are required to make it work. Section III describes the theory behind the time encoder that generates the signatures used to authenticate a device. Section IV introduces the encoding used for ABBA and describes its properties. In section V we put together all the pieces of the IDS and describe its behaviour under different types of attacks. Section VI presents the practical algorithm behind ABBA and finally, section VII concludes this paper.

Notations
is the set of natural numbers.
is the set of positive real numbers.
represents a set.
represents a sequence with elements with .
is a pair.
is a sequence of pairs.

Ii System model

We consider an information source that produces events from a finite alphabet

according to the probability distribution

and two parties Alice and Bob where Alice observes the events produced by the information source and communicates that information to Bob over an untrusted communication channel. As is conventional in security research we also refer Eve as the eavesdropper or attacker on the network.
We are not concerned with distortions or losses during transmissions from Alice to Bob therefore we consider the communication channel perfectly error corrected. Only the actions of Eve are of concern to us.
We devise a mechanism with either of the following two desired properties,

  1. Bob can detect malicious actions by Eve with a computable detection probability.

  2. Alice and Bob can trap Eve in the network for a given duration during which she has to spend computing resources to remain undetected whilst increasing chances of her being detected using other techniques.

We want that detection probability and duration to only depend on the specific parameters chosen by Alice and Bob as they implement the intrusion detection scheme.

Fig. 2: Top level principle

The basic principle of the IDS proposed is described on figure 2, where Alice sends messages to Bob over 2 channels. Channel 1 is the ”high bandwidth” communication channel that carries the payload and Channel 2 is the ”Low bandwidth” intrusion detection channel (IDC). The reason for this architecture is to decouple the message from Alice to Bob to the signatures that allow Bob to verify the source of the message. This could be understood as - although not completely accurate - sending a message authentication code for a payload into a different channel than the payload. The aim would be for Bob to detect Eve’s actions by matching the signature received on the IDC with the payload received.
The model described above cannot in itself prevent an attacker to infiltrate the network undetected. As they can tamper with both channels in a way that remains consistent to the receiver.
To solve this issue, we design an IDS that relies on only two assumptions for security that we argue are simple to guarantee or verify in any specific application.

  1. Eve cannot prevent messages on the IDC from reaching Bob.

  2. Eve cannot delay or speed-up messages transiting on the IDC.

For example, assumption 1 can be easily checked in a wireless link by monitoring the noise levels for jamming in the intrusion detection channel between Alice and Bob. A higher level of noise would decrease confidence in the IDS in a way reminiscent of what is used in Quantum key distribution systems. Assumption 2 depends on the topology of the network, for example, a wireless link between Alice and Bob doesn’t allow for Eve to manipulate signals flight time. This is also applicable to some wired networks if the threat modelling discounts a Man-in-The-Middle type attack.

Fig. 3: Internal architecture of the IDS.

The figure 3 illustrates the internal structure of the system at both Alice and Bob. transmitter and receiver both rely on a local clock to implement the intrusion detection algorithm. We will assume that both clocks measure time at the same rate - compensating for local drifts would be required in practice but is not the focus of this paper. The key component of interest to us is the ”Time encoder” that we will design in the following sections of this article. The time encoder produces pulses that are sent to Bob using a modulation as simple as ON/OFF keying (OOK) [6612714, 7399931] requiring low bandwidth and accessible to all devices.

Iii Time encoder

To describe how the time encoder used in the IDS works, we need an appropriate formalism which we develop in this section. We will start by defining how we represent information sources, then describe how one could devise a way to communicate the entire information content of an information source using an encoding solely based on the passage of time. We will then use our new formalism to define the time encoder that is core to the intrusion detection system proposed.

Iii-a Complete description of Information Sources

In information theory, an information source

is usually described using a random variable

with values in a given set of the same name . In this work we will limit ourselves to discrete sets, meaning that there exists a one to one mapping between and the set of natural numbers.
In a typical experiment with an information source, a sequence of events from the set is produced, for example where the index of each element represents the position of the event in the sequence. Each event could be the outcome of a coin toss.
Because all our experiments are done with time evolving in the background, that sequence can always be implicitly redefined as where are the time instants at which these events occurred with the origin of time that can be set arbitrarily before the experiment. Often this information about time is not useful, for example when writing a document, the time at which a key was pressed doesn’t matter only the order and value of the keys do. To provide a complete physical description of the source it is however important to include time.
Once observed, the sequence is an increasing function from to the positive real number line and the sequence is a function from to with no particular properties besides that the frequencies of elements in the sequence will converge to their probabilities as defined by the random variable when increases.
The sequence is thus a function from to .

If we were able to observe the entire sequence of events produced by our information source, we would build a potentially infinite sequence at which point the information source would have been described entirely. In this document we define the information source to be that sequence .

Iii-B A time-based encoding

The foundation of modern digital communications is the encoding of information as binary digits. Hence, before information from our source is transmitted, it is translated into a sequence of bits by a data encoder that are then used to modulate a carrier signal and sent onto the communication channel.
Our objective is to build an intrusion detection mechanism that requires minimum bandwidth utilisation. To achieve this goal we define a new encoding based on the passage of time at both transmitter and receiver. The code generated using that encoding can be used to modulate a carrier using ON-OFF keying to transmit data to Bob.
The main idea behind our time encoding is to define characters as time intervals of a given duration starting from a fixed origin in time. As an analogy, let’s say we could represent time intervals as pieces of strings of various lengths, we could then use a substitution mechanism as described in figure 4 to encode text written in english. However, this simple substitution mechanism wouldn’t work with time intervals if a single origin of time was defined for all letters because repetitions such as with the letter ”l” in the word ”Hello” would not be possible to represent and information about the order of these letters would be lost too. In the example shown in figure 4 we represent every letter independently using a string with variable length for every letter of the word ”Hello”. Using a single dimension and marking the different lengths on the one string, the same word would look like figure 5 where information about the order of the strings and their count is lost.

Fig. 4: This figure shows how the word ”Hello” can be written in pieces of strings of various length
Fig. 5: Projection of the different words onto a single dimension

In the next section we show the existence of a time-code that preserves all information about the message transmitted.

Iii-C Information preserving time-code

Before we take a look at our information source, let’s first work out how we could define a natural time encoding of any positive real number as a time interval.
If we denote the set of all time intervals in time units. The encoding of any positive real number is given by the one to one mapping . We will denote elements of with the greek letter .
This means that for example, the number 10.234 is represented by a segment of time of a fixed duration in time units (whatever unit is chosen).
With this encoding one can represent numbers using chunks of time, for example, numbers from 1 to 5 are .
Now let’s go back to the complete description of our information source by the sequence , and discuss why this sequence can be mapped to a unique sequence of time intervals .
The space can be mapped one to one with and we have shown a one to one mapping between and .
For example the following map defined by

(1)

With the inverse being

(2)

We denote the one to one mapping . In which case

(3)

Now we realise that the sequence doesn’t need to be written as an ordered list as the time information is already contained within each element. Instead, it can be represented as a set containing every element. Similarly, the time sequence can be represented by the set . The set is a codeword of time intervals that uniquely describes the information source .

Note that we can define a superset of sets like where each element represents a possible sequence of events with elements in . is a source of information sources and generates all possible sequences of events. can then be used to map each element of to elements of the superset of time sequences.
The mapping is the dictionary used to describe elements of in terms of codewords in .
We can now describe an information source as an element of or equivalently as a codeword in .

Iv Communication using time-code

Communicating using the dictionary described in previous sections is not practical because it requires Alice to know the entire history of the information source to devise a codeword that she can transmit to Bob. If she was able to do that then, a simple thing for Alice to do would be to use OOK modulation to transmit a pulse at the end of every time interval in the set - counting from a time when Alice and Bob synchronised their clocks. If Bob recorded the time of detection of pulses from time then he would be able to reconstruct after a possibly infinite amount of time. Then using he would be able to translate from back to .
Because Alice and Bob can only move forward in time and usually have a limited lifetime, they need a coding scheme where the instants at which pulses are produced only depend on events that happened in their past and decoding can be done on the fly.
In the next section, we will build a causal time-code that Alice can use to convert her information source output into time intervals in a way that doesn’t require her to know the future states of the source. The cost of that will be Bob’s uncertainty about the information sent by Alice. It is because of that uncertainty that the IDS we propose is probabilistic in nature although, with a detection probability that approaches 1 in some cases as time increases. This will be discussed in section V.

Iv-a A causal time-code

The aim of this section is to build a practical encoding that both Alice and Bob can use to communicate enough information about the source to detect tampering using the intrusion detection channel.
Let’s define a family of sequences (with values in and ) and the family of functions with .

The encoding starts with two initial parameters, a number and time origin . Then the set is created based on as follows.

  1. Define the sequence with first term and .

  2. Define the sequences , s.t and .

  3. Now the set is the codeword describing the source represented by .

It is easy to check that with this encoding, any two different codewords and would necessarily come from different sources and . This property is critical for our intrusion detection system.
It is also true that two sources and could be mapped to the same output . However, the parameters used in the encoding can be tuned to reduce the likelihood of such collisions and mitigate the risk of this being used by a potential attacker. The properties needed for these parameters are defined in appendix.
In summary the properties of this encoding are:

  1. Unlike any general encoding over the space , events produced by a source at time do not influence any letter of the corresponding codeword such that .This means that this encoding can be generated by Alice and Bob.

  2. Neither Alice, Bob nor Eve can predict the next letters of the codeword sequence that encodes the source within a big enough time window if the entropy of the source is non zero.

  3. Transmitting or receiving the sequence only requires a simple physical device that can measure time and generate or measure a pulse.

  4. The pulses produced by this encoding will never collide in time with an event generated by the original source.

  5. This encoding is not one to one in both directions. Two different sources can be translated into the same codeword.

V Intrusion detection system

In this section we put together the time encoding that we have developed in the previous section together with the model described in the system model to define the protocol of our intrusion detection system. We also describe the different types of attacks that can be perpetrated by Eve and discuss how those can be detected by Bob using the IDS based on the proofs and analyses given in appendix A and B.

V-a IDS protocol

The intrusion detection mechanism that Alice and Bob utilise works as follow:

  1. Alice and Bob agree on an initial and . In the general case, these do not need to be kept secret.

  2. Alice sends messages to Bob using the communication channel and the corresponding through the intrusion detection channel.

  3. Bob receives and assumed to be from Alice. He then computes and compares it to .

  4. If then he knows there is an anomaly in the stream coming from Alice.

In the following sections we look at different types of attacks that Eve can perform on the communication channel and discuss their implications.

V-B Message injection by Eve

In this section, we consider the case where Eve injects messages and pulses on both the communication channel and the IDC.
Corollary 1 in appendix A shows that with well chosen parameters for the encoder, if Eve performs this attack, she can only remain undetected if she continues to send messages after the latest transmission of Alice for a time duration fixed depending solely on the encoding. This traps Eve in a position where she needs to correct her influence on the data stream by adjusting for future messages that Alice could transmit on both channels, otherwise Eve will be detected.
The consequence is that after her first data injection, Eve doesn’t have full control over the messages she is sending to Bob to conceal her presence as she needs to account for past and future messages from Alice. Because Eve will have to send additional payload to conceal her presence in the network, the sequence of messages received by Bob could look abnormal based on some ”normal” metric defined for his expected messages. For example if Bob was expecting a sentence, the perturbations of Eve could generate non-sensical sentences at Bob’s end. Additionally in this case, the attacker is required to continuously spend energy to maintain their presence on the network hidden to the IDS. This is an additional constraint put on them compared to other IDS.

V-C Message tampering and deletion by Eve

Eve could attempt to change the value of a message on its way to Bob. In this case, Proposition 3 shows us that she will not be detected only if the new value that she wants to transmit is so that given the current value of used by Alice and Bob, . If the information source of Alice has non zero entropy then Eve cannot know ahead of time which to send to match the value sent by Alice. This means that the detection of Eve depends on the probability that she selects a compatible at random. This probability is also influenced by the functions and sequences used to build the time-encoder and can be computed given the specific choice of parameters for the encoder.
Eve could also avoid detection if her tampering with the data is randomly covered by further data sent by Alice. We study this case in Appendix B and show that the parameters of the encoding can be chosen so that the probability of Eve avoiding detection becomes vanishingly small as time goes on. This means that the actions of Eve will be detected given enough time.
Another possible attack by Eve can be to remove a message sent by Alice from the communication channel. If this happens then corrolary 2 shows that this would also be detected if the functions are chosen not to have a fixed point. And similarly to the previous attack, further transmissions by Alice might delay the detection of Eve however with a probability that can be made to vanish with time (see appendix B).

Vi Practical implementation

To implement the IDS proposed in section V, we need an algorithm to build the codewords out of the sequence of messages produced by the source. We create such an algorithm that we refer to as Time Encoder in the following section.

Vi-a Time Encoder Algorithm

  1. Initialisation

    • Random value

    • Create an empty list

    • Create a variable to store a source message

    • Set a timer counting up from 0

    • start function ”Generate interrupt”

    • Go to next step

  2. Generate random time

    • # Generating a candidate value.

    • Go to next step

  3. Selecting

    • While ( AND ){Wait}

    • IF
      THEN
      # Adding to the list .
      Go to step 2
      ELSE
      Go to next step

  4. Generating s (new seed).

    • #Note that depends on the message .

    • # Resetting .

    • # Resetting interrupt flag.

    • Go to step 2

Function : Generate interrupt
While (1) {

  • Listen to the Source

  • IF a new message is detected
    THEN
    # Changing interrupt flag.
    # Storing the message.

}

Vi-B Remarks

The implementation described above is a generic algorithm that can be implemented choosing specific functions that are adapted to the device and the information source.
Moreover, different applications might have different requirements such as, the speed of detection of an intrusion or the energy consumption of the encoding that will affect the choice of the functions and parameters chosen for the encoding.

Vii Conclusion

In this work we have developed a new theoretical framework for an intrusion detection system that detects data tampering and data injection on a communication channel. This framework allows to build a detection mechanism that doesn’t rely on prior knowledge of the network behaviour or knowledge of possible attack signatures. It is also advantageous compared to message authentication codes in the quality that it doesn’t require the prior exchange or management of secret keys for the receiver to detect tampering with the received data. We also detail an algorithm to adapt the IDS defined in this paper to any device.

Appendix A Characterisation of the time encoding

Proposition 1.

Let’s consider 2 sequences of pairs and of finite length and respectively. Let’s denote the last terms of the respective sequences .
Assuming that , we show that and

Proof.

such that which means

Which we rewrite as

From consecutive pairs of equations listed above, we get that

This means with .
For the second part
and
Knowing that leads to

(4)

Proposition 2.

Let’s consider 2 sequences and of length and respectively. Let’s assume that the last terms of the respective sequences are equal .
We show that if then is a periodic sequence with period and .

Proof.

The proof is similar to that of proposition 1, with the difference that the equivalent set of equation yields

We also show below that .
By construction,




Because and we know that and
Similarly, and . This means that the shift between those sequences is strictly greater than 1.
This completes the proof that is periodic with period .
And finally, from proposition 1 we get the result

(5)

Definition 1 (Non-maximally correlated).

A family of infinite sequence is said to be non-maximally correlated if .

Corollary 1.

If the family of sequence chosen for the encoding is non-maximally correlated then two sequences and of length and respectively such that , can be mapped to the same sequence iff (with and the last terms of and ). In this case (assuming ).

Proof.

This result derives from proposition 1 and 2.

Corollary 2.

Consider that the functions in the family do not have a fixed point such that and the family of sequences () are non-maximally correlated.
Let’s denote () the time code corresponding to the sequence constructed by deleting the last term of and () the time code corresponding to . We show that .

Proof.

Let’s consider the last 2 elements and of the sequence corresponding to with its last element ().
By construction and because doesn’t have a fixed point we find that . Corrolary 1 then tells us that in this circumstance .

Proposition 3.

Consider that the family of sequences () are non-maximally correlated.
Let’s denote () the time code corresponding to the sequence constructed by changing the last term of from () to () and () the time sequence corresponding to . implies .

Proof.

We can easily show the negation of that implication ( and ) is false.
is and means that and due to the property of the family of sequence () we have that .

Appendix B Detection probability of Eve

Characterisation of the detection probability in the case of an infinite transmission sequence.
Consider an infinite sequence ((t,x)). Let’s assume that the element has been changed from to or (deleted). We want to characterise the probability of this event not being detected by the receiver. We’ll focus on the case where is changed to but the case of deletion could be treated similarly adjusting indices accordingly.

We have shown previously that if we truncate the original and the modified sequences after the element, then they would both be mapped onto different time codes .
Even though it is possible that such that and . Meaning that both time codes still match beyond the time when the message was tampered with. If a new message is received from Alice before then it is possible that the newly received message could erase the influence of Eve on the time code. In the following we look at the probability of that happening.

Given the structure of the encoding, the change made by Eve would not be detected by the time when is received if

(6)

This means that the corresponding elements of and with values between and are equal.
Similarly, the change isn’t detected by the time if it wasn’t detected beforehand and

(7)
(8)

The events described by the statements 6, 7 and 8 are probabilistic by nature as they depend not only on the coding parameters but also on the information received from the information source. We define the list of probabilities () that statements of type (6, 8,…) are true and () that statements of type 7 are true.

We can build the following probability tree to determine the probability of Eve avoiding detection.

not detected by

undetectable after

not detected by

undetectable after

not detected by


Note that if the functions are invertible, then , . In this case if then the probability of Eve avoiding detection will become vanishingly small with time.

References