Securing Interactive Sessions Using Mobile Device through Visual Channel and Visual Inspection

03/03/2010 ∙ by Chengfang Fang, et al. ∙ 0

Communication channel established from a display to a device's camera is known as visual channel, and it is helpful in securing key exchange protocol. In this paper, we study how visual channel can be exploited by a network terminal and mobile device to jointly verify information in an interactive session, and how such information can be jointly presented in a user-friendly manner, taking into account that the mobile device can only capture and display a small region, and the user may only want to authenticate selective regions-of-interests. Motivated by applications in Kiosk computing and multi-factor authentication, we consider three security models: (1) the mobile device is trusted, (2) at most one of the terminal or the mobile device is dishonest, and (3) both the terminal and device are dishonest but they do not collude or communicate. We give two protocols and investigate them under the abovementioned models. We point out a form of replay attack that renders some other straightforward implementations cumbersome to use. To enhance user-friendliness, we propose a solution using visual cues embedded into the 2D barcodes and incorporate the framework of "augmented reality" for easy verifications through visual inspection. We give a proof-of-concept implementation to show that our scheme is feasible in practice.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Securing connection to a server through an untrusted network terminal is challenging even if the user has additional factor for authentication like one-time-password token, smartcard, or a mobile phone. One of the hurdles is the difficulty in securely passing information from the terminal to the device, and presenting the jointly verified authentic information to the user in a user friendly manner. Using traditional channel to connect the device and the terminal, like wireless connection or plug-and-play connection, are subjected to various man-in-the-middle attacks. Even if a secure channel can be established, it is still not clear how the additional device can help in authenticating subsequent messages rendered on the untrusted terminal’s display.

A number of recent works utilize cameras in the mobile devices to provide an alternative realtime communication channel from a display unit to a mobile device: messages are rendered on the display unit in a form of, say 2D barcodes, which are then captured and decoded by the mobile device via its camera. Although such visual channel could be eavesdropped by “over-the-shoulder” attacks, it is arguably impossible to modify or insert messages, and thus secure against man-in-the-middle attack. Visual channel has been exploited in a few works in verifying the session key exchanged over an unsecured channel, for instance seeing-is-believing proposed by McCune et al. [18]. There are also proposals on verifying untrusted display, for example, Clarke et al. propose verifying the display screen using stabilized camera device [7]. In this paper, we take a step further by investigating authentication of interactive sessions, with consideration that many cameras are unable to cover the whole screen in a single view with sufficient precision. An example of interactive session is online banking application where a user can browse and selectively view pervious transactions, and carry out new transactions. A typical screenshot would contain important information like the user’s account information, and less sensitive information like advertisements, help information, and navigation information, as shown in Figure 1(a).

During an interaction session, after a session key has been securely established between the server and the mobile device (could be established using seeing-is-believing [18]), there could be many subsequent communication messages that require protection by . These messages may need to be rendered over different pages, or in a scrolling webpage where not all of them are visible at the same time. We remark that it is not clear how to protect them. For instance, one may render the messages as 2D barcodes, each protected by the same . To view the message in a 2D barcode, the user moves the mobile device over the barcode, and the device will capture, authenticate and display the message on its display panel. However, as there are many barcodes associated with the same key, it is possible for a dishonest terminal to perform “rearrangement” attack: replays barcodes or shows barcodes in the wrong order.

The above attack arises due to the limitation that the camera is unable to capture the whole screen with sufficient precision, and not all messages can be rendered together in a single screenshot. We treat the problem as the authentication of messages rendered in a sequence of large 2D regions, where only region in a small rectangular window can be captured at one time. There are a few straightforward methods to overcome the rearrangement attack. For instance, one may prevent the attack by requiring the user to scan all the barcodes with his mobile device, and all the messages will be authenticated and rendered by the mobile device. However, it is troublesome for the user to scan all the barcodes, and there are situations where the user only wants to view some, but not all, of the messages. In addition, it is less preferred to navigate and browse the messages (e.g. a large table of transactions) within the relatively small display panel. In Section 6, we will discuss a few other straightforward methods and their limitations.

Our solution is to use a barcode scheme that given a message and a visual cue , is able to produce a barcode image that not only carries as its payload, but also visually appears as (see examples in Figure 1(b) and Figure 1(c)). Our paper realizes such barcode scheme using technique borrowed from fragile image watermarking [12]. To embed a long messages into several barcodes, our main idea is to have a visual cue on each barcode indicating its position. By visually inspecting the visual cues, the user can readily verify that the barcodes are in the correct arrangement. For example, in Figure 1(b), the visual cues are numeric numbers increasing by 1 from left to right, top to bottom. The black dot beside the number “2” indicates that the barcode is at the end of row, and the black block beside the number “10” indicates that it is the last (i.e. bottom-right) barcode. With the arrangement of barcodes verified, the user can then browse selective barcodes independently with his mobile device.

In our security analysis, we consider the four parties setting where a user, who has a mobile device, wants to interact with a server via a network terminal. We focus on three security models. In the first model, the Internet terminal, including its CPU, keyboard and display unit, is untrusted by the user, whereas the mobile device is trusted. This model is motivated by the challenging problem in securing Kiosks [13, 16], where Kiosks are untrusted public network terminal like workstations in Internet café.

In the second model, motivated by two-factor authentication, we consider scenarios where at least one of the terminal or mobile device is honest. We found that under the first model, it is possible to provide both confidentiality and authenticity; whereas under the second model, although authenticity can be achieved, it is not clear how to achieve confidentiality.

In the third model, we take one more step beyond two-factor authentication and consider a tricky setting where both the terminal and mobile device could be dishonest, but they do not collude in the sense that they do not know how to communicate with each other. This model is motivated by scenarios where the terminal and mobile device are compromised, but independently by two different adversaries, for instance, a dishonest mobile device that always says “authentic” for whatever authentication it is supposed to carry out, and a network terminal that is tasked to deceive the user to accept a message given to the terminal. To detect such dishonest mobile device, our proposed method requires the mobile device to extract and produce a human readable proof from the authentication tag. A corresponding proof is also shown in the terminal’s display and hence the user can visually verify whether they are consistent, as shown in Figure 1(c).

(a) A bank transaction webpage.
(b) Method 1: mobile device is trusted.
(c) Method 2: mobile device could be dishonest.
Figure 1: Illustration of our schemes: Figure 1(a) is the bank transaction screenshot which contains a sensitive transaction table to be protected. Figure 1(b) illustrates method 1 where the sensitive table is replaced by barcodes; and the mobile device captures, verifies and decodes part of the table. Figure 1(c) illustrates method 2 where the sensitive table is displayed with barcodes; and the table is rendered on both terminal and mobile to be compared by the user. The decoded tables are generated by our proof-of-concept implementation which are then “cut-and-paste” to produce the illustration. The green boxes show the captured region and the red dots are for image registration.

In addition to security requirements, user experience is also important. Requiring the user to take snapshot of the screen is rather disruptive from the user’s point of view. We employ augmented reality to provide better user experience in verification. The design of our 2D barcode and the subregion authentication takes useability into consideration and fits nicely in the framework of augmented reality. One example is as shown in Figure 1(b). The screenshot displayed by the terminal is a combination of sensitive data and non-sensitive data like advertisement and menu. The sensitive data are replaced by 2D barcodes with visual cue as described before. The user treats the mobile device as an inspection device and places the mobile phone over the region to be inspected. In realtime, the mobile device captures and verifies the 2D barcode. If it is authentic, the decrypted message is displayed. The non-sensitive portion of the screenshot is also displayed as it is to help the user to navigate. We give a proof-of-concept system where we use a laptop equipped with a webcam to simulate the mobile device to show the feasibility of our methods.


We formally define our problem and three adversary models in Section 2. Assuming the existence of a barcode scheme that is secure against rearrangement attack, we propose two protocols and analyze them under the three adversary models in Section 3. We give a construction for the required barcode scheme using visual cues in Section 4 and discuss the design of visual cue symbols in Section 5. We compare our solutions with possible alternative methods in Section 6. We describe our proof-of-concept implementation in Section 7 and measure its performance in Section 8. A discussion of existing work is given in Section 9. Section 10 gives a conclusion of our paper.

2 Models and Formulation

There are four parties involved in our problem: the user, the server, the mobile device and the network terminal. Let us call them User, Server, Mobile, and Terminal respectively. In our framework, the term “user” literately refers to a person, and the mobile device is equipped with a camera, input device, a small display unit and a chip that can perform decoding of barcodes.

The communication channels among the four parties are as shown in Figure 2. Note that there is no direct communication link between Mobile and Server. With 3G mobile network and WiFi connection widely available, one may argue that the model should consider such a link. Nevertheless, there are situations where the connection is not available due to cost or other constrains. In addition, there are also security concerns if the mobile device has Internet connection during the transactions: if Mobile can directly send messages to Terminal, they may collude and conduct coordinated attack. Table 1 gives a summary of our notations.

We consider the following security models for the channel between Server and User:

  1. Model 1: Terminal is not trusted by User, but Mobile is trusted and we want to protect both confidentiality and authenticity.

  2. Model 2: At least one of Terminal and Mobile is honest and we want to protect authenticity.

  3. Model 3: Both Terminal and Mobile could be dishonest but they do not collude and we want to protect authenticity.

In Model 3, we treat the dishonest Terminal and Mobile as two different adversaries and with two different goals. is the dishonest terminal and its intension is to trick the user to believe that a given message is authentic. The actual value of is not determined prior to the connection. We can view it as a randomly chosen message that is passed to the . The adversary is the dishonest mobile and has a easier goal: it is free to construct any message and trick the user to wrongly believe that it is authentic. An example of is one who always accepts whatever verification it is tasked to do. To capture the notion that they do not collude, we impose the restriction that and do not know how to communicate with each other, and the forge message is randomly chosen and hold by one party. Hence, we exclude the attack where covertly sends the message to through the visual channel.

Figure 2: The communication channels among the four parties.
The message from User to Server.
The message from Server to User.
The key used in message authentication scheme.
The key used in encryption.
The key used in embedding visual cue.
The session key containing tuple .
A visual cue symbol that carries the location information.
A barcode image encoding a message and visual cue with key .
An authentication tag of a message with key .
An encryption of a message with key .
An error correcting encoding of a message .
The entity A sends a message to another entity B.
The entity A sends a message to B using C as a relay point.
Table 1: Summary of Notations.

3 Protocols

We now give our proposed protocols for securing the communication between Server and User assuming we have a barcode embedding technique that can protect the integrity and confidentiality of its payload, and visible visual cue can be rendered onto the barcode to indicate the barcode location as in Figure 1(b). Given a message , a visual cue , and a session key , let us write the barcode (represented as images) as . For clarity in presentation, we first consider the case where the message can be embedded into one barcode block whose size is small enough to be entirely captured by Mobile’s camera with sufficient precision. Thus, we take the visual cue as a single dot, indicating to the user that there is only a single barcode to be read. We will later study the case for multiple messages in Section 4 and Section 5.

We assume that Server has already established a long term shared key with Mobile when the user registers an account with the server. In additional, for model 2 and 3, we assume that User has established a password with Server. Before each interactive session, Server authenticates User and Mobile to get a session key . A secure key exchange can be derived from modified seeing-and-believing [18] and combination of the proposed method in this section. Due to space constrain, we do not include details in this paper.

3.1 Server to User

Consider the case where Server wants to send a message to User. We propose two methods, denoted MS1 and MS2 (message from server), where method MS1 is more user-friendly compared to MS2, but it requires that Mobile is trusted.


To send a message to User, the following steps are carried out. (1) Server generates a barcode image and sends the barcode to Terminal. (2) Terminal displays the barcode. (3) User inspects and verifies the visual cue is valid. (4) Mobile captures the barcode and rejects if the barcode is not authentic. (5) Mobile renders on its display. (5) User reads and accepts from Mobile’s display panel. Below is a summary for MS1:  


  1. : ;

  2. : ;

  3. User verifies ;

  4. : ;

  5. : ;

  6. accepts .



The main difference in this method from the previous MS1 is that, the message is displayed by both Terminal and Mobile for User to verify, and thus User is able to detect if one of them is dishonest. (1) Server first generates a barcode image , then it sends both the barcode image and the message to Terminal. (2) Terminal displays the barcode, side-by-side with . (3) User inspects and verifies the visual cue is valid. (4) Mobile captures the barcode and rejects if the barcode is not authentic, otherwise, displays . (5) User reads from Mobile’s display panel and Terminal’s display. (6) User accepts if the in step (2) is consistent with in step (4). Below is a summary for MS2:


  1. : , ;

  2. : , ;

  3. User verifies ;

  4. : ;

  5. : ;

  6. accepts if .


3.2 User to Server

Now we consider the following methods MU1 and MU2 (message from user) for sending the message to Server. Method MU1 protects both confidentiality and authenticity of the message, whereas method MU2 protects only the authenticity but involves less user operation.


MU1 consist of the following steps to send a message to Server. (1) User enters to Mobile. (2) Mobile computes and shows User the encrypted form in readable characters (for e.g. using uuencode). (3) User sends displayed string to Server through Terminal’s input device. (4) Server accepts if the tag is valid. Below is a summary for MU1:


  1. : ;

  2. : ;

  3. : ;

  4. Server accepts if the tag is valid.



In scenarios where the confidentiality of is not required, we can employ a more user friendly protocol MU2 as follow: (1) User enters through Terminal’s input device, and Terminal forwards to Server. (2) Server generates a barcode , where is a randomly generated nonce and means concatenation. Server sends the barcode to Terminal. (3) Terminal displays the barcode, and User visually verifies that the visual cue is correct. (4) Mobile captures the barcode and rejects if the barcode is not authentic. (5) Mobile renders the message and the nonce on its display. (6) If is consistent with the message User entered in step (1), User enters to Terminal, and Terminal forwards it to Server. (7) Server rejects if the nonce is wrong.

Although involves more steps, MU2 is less tedious from the User’s point of view, since User does not need to enter using Mobile’s input device. The corresponding steps for MU2 are summarized below:  


  1. : ;

  2. : ;

  3. : ;

  4. : ;

  5. : ;

  6. : ;

  7. Server accepts if is consistent, rejects otherwise.


3.3 Analysis

In this section, we analyze our methods under different adversary models.

Model 1 (Mobile is trusted)

In Model 1, we use MU1 for sending message to Server, and use MS1 for Server to send message to User to achieve confidentiality and authenticity of the communication channel.

For both methods, Terminal plays the role of a relay point for passing message and thus a malicious Terminal is the man-in-the-middle. Hence, this is the classical setting where the two end points (Server and Mobile) having a shared key want to communicate over a public channel. The cryptographic technique (encryption and message authentication code) can secure the channel and provide both confidentiality and authenticity.

It is clear that MU2 and MS2 cannot protect the confidentiality under this model as the messages are sent in clear through Terminal, and thus they are not suitable in this model.

Model 2 (At least one is honest)

In Model 2, we use MU2 to send message to Server, and use MS2 for Server to send message to User. We want to achieve authenticity of the message . We are not interested in confidentiality here. It is an interesting future work to investigate whether confidentiality can be achieved under this model. Since we are not sure which of Mobile and Terminal is dishonest, it is not clear whether confidentiality can be achieved.

Suppose Terminal is dishonest. In both directions of the communication, we can treat the barcode as the MAC of the message, and respectively, and Terminal does not have the key. Similar to analysis for Model 1, this is a classical setting and the authenticity of the message inherit from the MAC we used in the barcode construction.

On the other hand, let us consider the case where the Mobile is dishonest. In MU2, Terminal is honest and will forward to Server as it is, thus, it is impossible for Mobile to modify without Server notices. Similarly, in MS 2, since the actual message is displayed by the honest Terminal, User can compare the displayed message and thus any modification can be detected.

Note that MU1 and MS1 is not secure in this model: if Mobile is dishonest and change the message to , there is no way for User or Server to verify it.

Model 3 (No collusion)

It turns out that the protocol we used in method 2, i.e. MU2 and MS2, can achieve authenticity in this model as well.

Let us first analyze MU2. Recall that the goal of a dishonest Terminal is to trick Server to accept a message . To do so Terminal must send Server the message , and obtain a barcode contains and . Server accepts only if the verification code is presented. Since is randomly chosen, Terminal is unlikely to succeed in guessing . Therefore, he needs to get from user. Without any hint from Terminal, Mobile is not able to display the message that the user is expecting.

Now let us analyze MS 2. In this case the dishonest Terminal wants to trick User into accepting a message . To achieve the goal, it must display side-by-side with the barcode. As Terminal does not know the key he is unable to forge the barcode. Now, consider the dishonest Mobile. Recall that there is no communication from the Terminal to Mobile, the Mobile is unable to display the message which is required to trick User to accept .

MU 1 MU 2 MS 1 MS 2
Model 1 C, A, U1 A, U1, U2 C, A, U1, U2 A, U1, U2
Model 2 N A, U1, U2 N A, U2
Model 3 N A, U1, U2 N A, U2

Note: C, A, N are related to security goals and U1, U2 are related to useability.
C: confidentiality is achieved; A: authenticity is achieved; N: none of C and A can be achieved.
U1: no user comparison of messages is required; U2: no user input via Mobile’s input device is required.

Table 2: Summary of Methods.

Table 2 summarizes the security and user friendliness of our methods under different models.

4 Visual Channel

A main component in building our visual channel is the construction of 2D barcode with visual cues: given a secret key , a message , and a visual cue symbol we want to produce a 2D barcode such that the cue is clearly visible, and the message can be extracted under noise. On the other hand, there are security requirements on the confidentiality of and integrity of and . Any modification on and must be detected.

4.1 Construction Overview

There are a number of stages of the visual channel construction:

  1. (Encryption-then-MAC): Given , and the keys , the message is protected using encryption and MAC with key and respectively, and get = .

  2. (Error correcting): Error correcting code is then applied on the result , and get ), let us call this .

  3. (Embedding visual cue): Given a message , a key , and a visual cue represented as a 2D array of bits, the is embedded into a larger 2D array of bits which visually appear as , Section 4.2 gives details on the embedding process.

  4. (Adding control point and rendering): A set of control points(red dots in Figure 1(b)) is then added around for image registration purpose.

Thus, our barcode is a black and white image with red pixels.

4.2 Encoding with Visual Cue

When a message is too large, multiple barcodes are required to encode it. As mentioned in the introduction, multiple barcodes protected by a single session key are subjected to “rearrangement” attack. To detect the attack, we propose binding location information to the barcode using visual cue. This section gives a method in embedding the visual cue. Note that the process of embedding a visual cue to a barcode is essentially digital watermarking, where the visual cue is the host, and the barcode is a message to be “watermarked” to the host.

Given a -bits message , let us arrange it as a by binary matrix where and is even. Let us assume that the given visual cue is a by pixels image where each pixel is either 0 (representing a black pixel) or 1 (representing a white pixel). Therefore, every bits in is associated with pixel of the visual cue, and together they can be represented with black-and-white pixels in the final barcode. The pixels are arranged in a “L”-shape as shown in Figure 3(a). Let us call the pixels as a L-block. The combination of values in a L-block is divided into two groups: and . The L-blocks in have more white pixels and thus the L-blocks appear as “white”. Conversely, the L-blocks in will appear as “black”.

(a) Two groups of L-blocks.
(b) Tile up with L-blocks.
Figure 3: L-blocks for constructing visual cues

Given a binary value of a pixel of the visual cue image, we want to encode two bits into a three pixels L-block, such that the brightness of the L-block can be adjusted according to . For instance, if , the encoding outputs only elements in . Since there are elements in , it is possible to encode the two bits and . Beside for the value of , there is no further constraint on how the encoding of to the elements in is to be done. In order to prevent the adversary from modifying the appearance of the visual cue, the mapping from the bits to the three pixels of the associated L-block, , has to be kept secret. Hence, the key space for encoding a bit pair is .

To decode a barcode, Mobile applies the decoding and decryption functions in a reverse order and ignore the bit . That is, it first extracts the bit pairs from every L-blocks, and get the message . Next, error correcting is applied and the authenticity of the message can be verified.

4.3 Security Analysis

We would like our barcode scheme to achieve the following properties: (1)authenticity and confidentiality of and (2) the integrity of visual cue.

Authenticity and confidentiality of message

The authenticity and confidentiality of the message embedded in our barcode scheme rely on the security of the underlying encryption and message authentication scheme. Bellare et al. [3] show that when the encryption achieve indistinguishability under chosen-plaintext attack (IND-CPA), and the message authentication scheme is strongly unforgeable (SUF-CMA), then the Encrypt-then-MAC composition method achieves IND-CPA, INT-CTXT (integrity of ciphertexts) and IND-CCA ((adaptive) chosen ciphertext attack).

Integrity of visual cue

An adversary may try to modify some L-shape blocks such that the visual cue on two barcode blocks are swapped, and thus, he can rearrange the two blocks without being detected. As discussed in Section 4.2, any modification of an -shape block’s brightness will have chance of not being detected. Suppose at least number of -shape blocks have to be modified in order to deceive the user, then the chances of not being detected will be , where depends on the size of a barcode block, and the visual cue design.

However, the above analysis does not hold when we consider the whole process of decoding, where the error correction is included. Recall that, due to inevitable noise, we need to apply error correcting before extracting . Therefore, when small number of -shape blocks are corrupted, the payload can still be correctly decoded. Hence, the choice of error-correction and the design of the cues cannot be done separately. Furthermore, some error-correction code can correct more errors than its guaranteed level in some situations. Due to the concern of forgery, it is important not to correct those errors.

To prevent an adversary from making small changes that can deceive the user and yet get verified, one design consideration of the visual cue is to choose symbols with large mutual Hamming distance from each other. In our implementation to be described in Section 7, we use numerical digits as visual cue, where the minimum hamming distance for two symbols is “L-blocks” (for example, the number “1” and “7”, “0” and “8”). We choose parameters of error correcting code that is able to tolerate bits noise for every

bits. Note that modifying a “L-blocks” may result in two bits flipped, thus, the probability that an attacker can modify the visual cue of a barcode to another is less than


is the cumulative distribution function of the binomial distribution


Modifying control points

The adversary may try to modify the control points and this may cause failure in decoding, giving a string of “random” bits which is unlikely to pass the MAC authentication check. Hence, modifications of control points at most amount to a denial of service attack, which is not our main concern.

5 Visual Cues for Verification of Multiple Barcodes

In this section, we discuss a few designs of visual cue, in particular, for barcodes appeared in a linear sequence, and barcodes rendered as table. Recall that the main purpose of the visual cue is to bind location information to the barcodes, so that User can visually verified that the barcodes are in the correct arrangement.

Linear Sequential Barcodes.

Consider a sequence of barcodes appearing in the order . The order of appearance gives implicit structure of the encoded message. For instance, the message could be a string divided into substrings where each substring is encoded in a single barcode. Hence, it is important to protect the order of appearance, even if the user may not interested in viewing all of them. A natural visual cue would be a counter, starting from 1, that is, the visual cue of block is . To indicate the end of the sequence, the last block contains a special symbol, say “.” in our example, to indicate end of sequence.

Barcodes in Table Structure.

Consider a table of messages where each message is encoded in a barcode. The barcodes are depicted in the natural table arrangement: for any 2 messages in the same row, the corresponding barcodes are also in the same row, and likewise for columns. To protect such correspondence, we propose the following rules of assigning the visual cue:

  1. The numerical value of the visual cue symbol on the top row, leftmost block is 1. The value increments by 1 from left to right. At the end of the row, the increment process continues at the leftmost block of the row below if any.

  2. The rightmost block in each row has the additional cue which is a black dot indicating this is the end of row.

  3. The rightmost block in the bottom row has an additional large black rectangle indicating this is the last block.


Figure 1(b) shows an example of such barcode table. To verify that a table of barcodes are in the correct arrangement, User simply needs to verify the continuity of the counter, every but the last row ends with a small dot, and the last barcode ends with a big dot. It is easy to verify that by imposing the above rules, any insertion, deletion or rearrangement of the barcodes can be detected by visual inspection.

6 Alternative Methods

Besides using visual cues, there are other techniques to ensure that the barcodes are in correct order. This section compares our scheme with a few alternatives. In general, our scheme uses more pixels to carry the visual cue symbols. On the other hand, it has the following advantage: (1) It does not disrupt the user by requiring the user to scan all the barcodes. (2) It does not require the user to count the blocks on the terminal’s display unit to verify the current block sequence on the mobile. (3) It allows the placement of barcodes to spread across different positions in a scrolling page, or even in different pages. A brief illustration of the alternative methods is given in Figure 4.

(a) Mobile captures every blocks, then verifies and renders the whole message.
(b) Mobile displays the location(row/column) information encoded in the barcodes.
Figure 4: Illustration of alternative methods (for simplicity, only the barcodes and mobile device are shown here).

Embedding a HMAC of all blocks.

In this method, given a long message , Server computes a HMAC for the whole and embeds and its tag into a few barcodes. During authentication, the user first scans across all the barcodes, then Mobile responds whether the HMAC agree with the content in the barcodes (Figure 4(a)). If so, Mobile renders the long message and user navigates to obtain the required information. The advantages of this method are (1) the user does not need to verify the visual cue, and (2) the barcode is more efficient in the sense that it does not need to embed the visual cue.

However, there are a few disadvantages of this method. Firstly, the scanning process could be less preferred when the user only want to browse a subset of the message (e.g. a user who wants to check a particular record from a list of transactions). Secondly, it is not easy to navigate using the relatively smaller display panel in the mobile device. Furthermore, it is not clear how to extend this method to the models where Mobile device is not trusted.

Encoding location hints in barcode.

When the message can be represented as a form of table, one may try to secure the authenticity by using the row and column attributes as location information: Given a table , Server first divides it into sub-tables, then it encodes each sub-table together with the corresponding row and column attributes into barcodes. When Mobile decodes the barcode, it shows the corresponding attributes of the sub-table as shown in Figure 4(b).

The advantage of this method is that it does not require the user to scan barcodes or verify visual cues, and the user can readily browse a sub-table of interest. While rearrangement attack can be prevented as the row and column information are encoded in the barcode, this method subjects to deletion attacks: the adversary may remove or duplicate an entire row of barcode without being detected. Although this could be patched by encoding more information (e.g. the total number of barcodes), the verification cost will increase (the user needs to count the barcode blocks).

7 Implementation

The useability of our proposed method can be improved using “augmented reality” as described in the introduction. We implemented a proof-of-concept system using webcam and laptop to simulate the mobile device.

Deploying Machines and Softwares.

We simulate the mobile device and its camera using a Thinkpad X200 notebook (Intel core 2 duo GHz CPU and GB memory) equipped with an inexpensive usb webcam. To simulate the computing power of a typical mobile device, we allocate only CPU time and M memory for our program. We use a Dell desktop machine with Intel Core 2 Duo GHz CPU with GB of memory and Windows XP SP3 to simulate the network terminal. The resolution of the webcam is pixels with a maximum frame rate of frame per second. We tested the system on three different display units: (1) a 19 inch flat TFT monitor in Dell model Optiplex 755; (2) a 15 inch flat TFT Dell UltraSharp monitor; and (3) a 15 inch Dell CRT monitor. All configuration of the display units such as brightness resolution are reset to the default setting. In the following sections, we call these three display units monitor 1, monitor 2 and monitor 3 respectively.

We use OpenCV libraries [1] for basic image processing operation and interfaces to the camera.

Choice of Parameters.

We use AES with 128 bit key for encryption scheme, HMAC based on SHA1 for message authentication code, and calculator fonts of numeric digits as visual cues symbols. We use a -BCH error correcting code [14, 4] to correct errors. That is, for every bits, we add bit of redundancy and we are able to correct error bits. However, to prevent modification of visual cue, we reject to decode if there are more than error bits.

Image Processing Issues.

We use oversampling technique to reduce the noise of a captured image: one bit in the barcode is rendered using pixels. Let us call a group of pixels a “superpixel”. Such oversampling can reduce the noise due to mis-alignment and mitigating other artifacts, but it also reduce the channel capacity by a factor of .

We use landmark-mapping [5] method for image registration. That is, after Server generates the barcodes, it super-imposes a set of 2D points called control points, whose position is known by Mobile, on the barcode image.

After Mobile

captured a screenshot, it extracts the control points and find the best geometric transformation that maps the extracted control points to their original locations. In our implementation, we find the best linear transformation that matches the points. The transformation is then applied to the barcode image.

8 Performance

In this section we measure the performance of our proof-of-concept implementation in terms of error rate, frame rate and channel capacity.

Image Registration Error.

To measure the accuracy of our image registration, we first generate an image of many blue points with the red control points. The image is displayed on the three display units and captured by the camera. Image registration is then carried out and the displacement of blue points are measured. Here we use the Euclidean distance to measure the amount of displacement.

Our camera is able to capture a region of around blue points. Figure 5(a) shows the histogram of the displacement of all the blue points on monitor 1. Note that the average displacement is less than pixel. The image registration algorithm can be further refined by incorporating more effective and efficient known techniques.

(a) Histogram of the displacement.
(b) The error rate of the three monitors over different frames.
Figure 5: The performance of implementation.

Noise Level in Capturing Superpixels.

We now measure the probability of error in reading a superpixel. The camera is able to capture a block that consists of around superpixels at a time.

After registration, we count the mismatches of superpixel between the registered image and the original image. measurements are taken for each of the three display units. Figure 5(b) shows the error for each measurement.

Frame Rate.

The frame rate of our implementation is over frames per second running on the laptop machine. Although the implementation is not tested on mobile device, we believe a typical mobile device that has similar processing power could achieve more than frames per second, which is acceptable for most applications.

Capacity of Visual Channel.

We now give calculation for the size of payload (size of , the message Server sends to User) that can be embedded in a block that occupies pixels of Terminal’s display unit. Recall that we used pixels to encode bit of the barcode, employed a BCH error correcting code, and used L-block to preserve the related location. Thus the payload is bits for such a block.

9 Related Work

There is an extensive amount of literatures exploiting the camera as an additional visual channel for communication. Jacobs et al. [15] gave a method that establishes a channel from a controllable light source to a camera. McCune et al. proposed seeing-is-believing [18], which carries out authentication and key-exchange over a visual channel established between a device’s display and another device’s camera. Wong et al. [25] built a prototype on a Nokia Series 60 handphone that provides 46 bits for authentication over the visual channel.

Data can be transmitted to a camera effectively using 2D barcodes. There are many 2D barcode designs, for example, QR code [2] and the High Capacity Color Barcode (HCCB) [20] that uses colored triangles. Many barcodes are designed to encode data in printed copies. There are also proposals that use other types of sources in the visual channel. Collomosse et al. proposed “Screen codes” [8] for transferring data from a display to a camera-equipped mobile device, where the data are encoded as a grid of luminosity fluctuation within an arbitrary image. A challenging hurdle in using hand-held cameras to establish the channel is motion blur. A few stabilization algorithms are developed for handheld camera [22, 19], and for 2D barcodes [6].

Similar to our scheme, Costanza et al. [9] suggested a technique to embed designs into barcodes to increase the expressiveness and to bring visually meaning to them. These systems recognize the barcodes based on the topology, rather than geometry, of the codes [10], and were initially developed for tracking objects in tangible user interfaces and augmented reality applications [11]. Augmented reality has been exploited to enhance user experience on many applications including education [17], gaming [23], outdoor activities [24]. Rekimoto et al. [21] Using 2D barcodes as the visual tags in the augmented reality environment, where a camera can capture the barcode on physical object and link them to their information.

10 Conclusion

In this paper, we investigated how visual channel can be deployed to enhance security of the communication between server and user in various settings. We pointed out that although authentication of an individual barcode can be easily carried out, the interesting technical challenge is in the verification of the relationships among several barcodes. This leads us to look into the problem of “subregion authentication” where a user wants to verify selective small pieces of data within a large dataset. Although there are a few methods to overcome the problem, they introduce disruptions during the interactive session and are thus less user-friendly. To achieve seamless interactions, we proposed using visual cue to bind location information to the barcode, so as to aid the user in visually verifying the data.

Our protocols demonstrated that, the visual channel “enhanced” with the visual cue, together with the mobile device’s input/output device, jointly provide more flexibility in designing secure protocols. Viewing from another perspective, our investigation highlights limitations of visual channel, for instance, the observation that confidentiality is difficult to achieve under the setting where either the mobile device or the terminal could be dishonest. Our solution serves as an interesting example where security is achieved by coupling computer’s processing power with human perceptual system. The design of our barcode also serves as an interesting application of fragile watermark.

To demonstrate the concept, we give a system that simulates the mobile device using webcam and laptop. The performance of the system is promising. Although we have not yet implemented the framework on actual mobile device, we believe that the processing power of many current mobile devices is sufficient to provide seamless interactions.


  • [1]
  • [2] QR Code (2000). International Organization for Standarization: Information Technology-Automatic Identification and Data Capture Techniques-Bar Code Symbology-QR Code. 2000.
  • [3] M. Bellare and C. Namprempre. Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. Journal of Cryptology, pages 469–491, 2008.
  • [4] R.C. Bose and D.K. Ray-Chaudhuri. On a class of error correcting binary group codes. Information and control, pages 68–79, 1960.
  • [5] L.G. Brown. A survey of image registration techniques. ACM computing surveys, (4):376, 1992.
  • [6] C.H. Chu, D.N. Yang, and M.S. Chen. Image stablization for 2d barcode in handheld devices. In Proceedings of the 15th international conference on Multimedia, page 706, 2007.
  • [7] D.E. Clarke, B. Gassend, T. Kotwal, M. Burnside, M. Dijk, S. Devadas, and R.L. Rivest. The untrusted computer problem and camera-based authentication. In Proceedings of the First International Conference on Pervasive Computing, page 124, 2002.
  • [8] J.P. Collomosse and T. Kindberg. Screen codes: visual hyperlinks for displays. In workshop on Mobile computing systems and applications, pages 86–90, 2008.
  • [9] E. Costanza and J. Huang. Designable visual markers. In Proceedings of the 27th international conference on Human factors in computing systems, pages 1879–1888, 2009.
  • [10] E. Costanza and J. Robinson. A region adjacency tree approach to the detection and design of fiducials. Vision, Video and Graphics, pages 63–70, 2003.
  • [11] E. Costanza, S.B. Shelley, and J. Robinson. Introducing audio d-touch: A tangible user interface for music composition and performance. In Proceedings of the International Conference on Digital Audio Effects, pages 8–11, 2003.
  • [12] I.J. Cox, J. Kilian, F.T. Leighton, and T. Shamoon. Secure spread spectrum watermarking for multimedia. IEEE transactions on image processing, pages 1673–1687, 1997.
  • [13] S. Garriss, R. Cáceres, S. Berger, R. Sailer, L. van Doorn, and X. Zhang. Towards trustworthy kiosk computing. In Workshop on Mobile Computing Systems and Applications, 2006.
  • [14] A. Hocquenghem. Codes correcteurs d’erreurs. Chiffres, page 4, 1959.
  • [15] M.A. Jacobs and M.A. Insero. Method and apparatus for downloading information from a controllable light source to a portable information device, 1996. US Patent 5,535,147.
  • [16] B. Kauer. OSLO: Improving the security of Trusted Computing. In Proceedings of the USENIX Security Symposium, 2007.
  • [17] E. Klopfer and K. Squire. Environmental detectives the development of an augmented reality platform for environmental simulations. Educational Technology Research and Development, pages 203–228, 2008.
  • [18] J.M. McCune, A. Perrig, and M.K. Reiter. Seeing-is-believing: using camera phones for human-verifiable authentication. In IEEE Symposium on Security and Privacy, pages 110–124, 2005.
  • [19] E.M. Or and D. Pundik. Hand motion and image stabilization in hand-held devices. IEEE Transactions on Consumer Electronics, pages 1508–1512, 2007.
  • [20] D. Parikh and G. Jancke. Localization and segmentation of a 2d high capacity color barcode. In

    IEEE Workshop on Applications of Computer Vision

    , pages 1–6, 2008.
  • [21] J. Rekimoto and Y. Ayatsuka. Cybercode: designing augmented reality environments with visual tags. In Proceedings of DARE 2000 on Designing augmented reality environments, page 10, 2000.
  • [22] M. Sorel and J. Flusser. Blind restoration of images blurred by complex camera motion and simultaneous recovery of 3d scene structure. In Signal Processing and Information Technology, pages 737–742, 2005.
  • [23] K. Squire and M. Jan. Mad city mystery: Developing scientific argumentation skills with a place-based augmented reality game on handheld computers. Journal of Science Education and Technology, pages 5–29, 2007.
  • [24] G. Takacs, V. Chandrasekhar, N. Gelfand, Y. Xiong, W.C. Chen, T. Bismpigiannis, R. Grzeszczuk, K. Pulli, and B. Girod. Outdoors augmented reality on mobile phone using loxel-based visual feature organization. ACM International Conference on Multimedia Information Retrieval, 2008.
  • [25] F.L. Wong and F. Stajano. Multi-channel protocols. In Security protocols: 13th international workshop, page 128, 2007.