IlluminatedFocus: Vision Augmentation using Spatial Defocusing via Focal Sweep Eyeglasses and High-Speed Projector

02/06/2020 ∙ by Tatsuyuki Ueda, et al. ∙ Osaka University 0

Aiming at realizing novel vision augmentation experiences, this paper proposes the IlluminatedFocus technique, which spatially defocuses real-world appearances regardless of the distance from the user's eyes to observed real objects. With the proposed technique, a part of a real object in an image appears blurred, while the fine details of the other part at the same distance remain visible. We apply Electrically Focus-Tunable Lenses (ETL) as eyeglasses and a synchronized high-speed projector as illumination for a real scene. We periodically modulate the focal lengths of the glasses (focal sweep) at more than 60 Hz so that a wearer cannot perceive the modulation. A part of the scene to appear focused is illuminated by the projector when it is in focus of the user's eyes, while another part to appear blurred is illuminated when it is out of the focus. As the basis of our spatial focus control, we build mathematical models to predict the range of distance from the ETL within which real objects become blurred on the retina of a user. Based on the blur range, we discuss a design guideline for effective illumination timing and focal sweep range. We also model the apparent size of a real scene altered by the focal length modulation. This leads to an undesirable visible seam between focused and blurred areas. We solve this unique problem by gradually blending the two areas. Finally, we demonstrate the feasibility of our proposal by implementing various vision augmentation applications.



There are no comments yet.


page 1

page 5

page 6

page 7

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Related Work

Recognizing that synthetic blur is essential for AR applications, researchers have developed various interactive AR systems based on depth-independent spatial focus control of real-world appearances. Specifically, these AR systems fall into the following three categories: (1) visual guide, (2) F+C visualization, and (3) diminished reality (DR).

(1) Typical AR systems employ graphical widgets such as virtual arrows, to draw the user’s attention to real-world objects [Biocca:2006:AFO:1124772.1124939, 5336486, 1544664, RUSCH2013127]. These systems potentially destroy the visual experience by superimposing distracting overlays on the real scene. On the other hand, HCI researchers have proved that a subtle image modulation can also effectively direct the user’s gaze, such as luminance and color modulation [Bailey:2009:SGD:1559755.1559757] and synthetic blur [Veas:2011:DAI:1978942.1979158]. McNamara proposed applying a synthetic blur technique to a mobile AR system to direct the user’s gaze to specific areas of a real scene by blurring out unimportant areas [McNamara:2011:EAH:2087756.2087853]. This approach has an advantage of drawing a user’s attention without significantly interrupting the visual experience. (2) F+C visualization allows a user to focus on a relevant subset of the data while retaining the context of surrounding elements. Kalkofen et al. proposed an interactive F+C visualization framework for AR applications [4538846, 4569839]. Their proposed framework applies a blur effect to suppress visual clutter in context areas and successfully supports a user to comprehend the spatial relationships between virtual and real-world objects. (3) Hiding real-world objects is useful in various AR application scenarios, and researchers have worked extensively on DR techniques [1240705, 5643572, 5643590, Iwai2011, Iwai:2006:LDS:1180495.1180519]. Typical DR methods replace or fill in an undesired object (e.g., AR markers [5643590]) with its background texture to make the object invisible. A blur effect has also been applied as a DR technique. This approach preserves the presence of a diminished object compared to typical DR techniques. In their X-ray vision AR system, Hayashi et al. proposed protecting the privacy of a person by blurring their face and body [HAYASHI2010125].

In all the above-mentioned previous systems, blurred real-world appearances are displayed on VST displays, such as HMDs and smartphones. Because the blur effect can be implemented by a simple video signal processing, depth-independent, and spatially varying blur can be synthesized and displayed on VST displays. On the other hand, as discussed in Section 1, there is no simple solution to realize such a flexible focus control in any vision augmentation platforms including OST-AR and SAR. Because users see the real world directly in these systems, a blur needs to be controlled optically. Previous studies have proposed optical solutions for blurring a real object in a spatially varying manner. Himeno et al. placed a lens array plate in front of an object and spatially switched the state of the lens array to a flat transparent plate by filling transparent liquid having the same refractive index as that of the lens array [Himeno:2018:FPS:3279778.3279784]. Blur Mirror is a media art installation consisting of a motorized mirror array [blur_mirror]. An object reflected by the mirrors is selectively blurred by spatially varying the vibration of the mirrors. Although these systems succeeded in spatially blur a real object, a relatively large optical setup needs to be positioned between the user and the object. When either the blurring target or the user moves, they need to physically readjust the setup to keep their spatial relationship consistent. In this paper, we relax this physical constraint by applying wearable glasses and a computational illumination, by which we can continuously provide a desired blur effect even when the object or user moves.

We apply ETLs to the eyeglasses used in the proposed system. Previous research integrated such focus tunable lenses in AR and VR displays [7014259, Konrad:2017:ACN:3072959.3073594, 8456852, Chang:2018:TMD:3272127.3275015, 4637321, Jo:2019:TPL:3355089.3356577]. The majority of these studies used the lenses to solve the vergence-accommodation conflict of typical HMDs. Among them, the work most related to our research realized multifocal displays by driving the lenses to sweep a range of focal lengths at a high frequency and switching displayed content of different focal planes in synchronization with the focal sweep using a high-speed display [8456852, Chang:2018:TMD:3272127.3275015, 4637321, Jo:2019:TPL:3355089.3356577]. The proposed system consists of devices similar to those used in previous studies and also applies fast focal sweep. However, we use a high-speed projector to illuminate real objects to control the blur intensities of their appearances in a spatially varying manner rather than displaying virtual objects at different focal planes. We demonstrated the first prototype at an academic conference [Ueda:2019:IVA:3355049.3360530]. In the current paper, we describe the technical details and evaluate the proposed method qualitatively and quantitatively.

2 Depth-Independent Spatial Focus Control of Real-World Appearances

Figure 1: Principle of the proposed technique. The ETL focal length is modulated periodically at 60 Hz. The blue cylinder is illuminated when it is defocused, while the green and orange boxes are illuminated when they are focused. Consequently, an observer perceives an image in which only the blue object placed in the middle appears blurred.

We realize depth-independent spatial focus control by the fast focal sweep of human eyes and synchronized high-speed illumination (Figure 1). Here, focal sweep refers to an optical technique that periodically modulates the optical power of an optical system (observer’s eyes in our case) such that every part of an observed real scene is focused once in each sweep. If part of the scene is illuminated by a synchronized illuminator only when it is in the focal range of the observer’s eyes, it appears focused. On the other hand, if another part is illuminated only when it is out of focus, it appears blurred. When the periodic optical power modulation is performed at higher than the critical fusion frequency (CFF), the observer does not perceive the modulation and the blink of the illumination. We apply ETLs to periodically modulate the optical power of observer’s eyes. Note that here we assume that the observer is wearing ETL glasses. A high-speed projector is used as the illuminator, which leads to the focus control of the observer’s eyes on a per-pixel basis; thus, at a high spatial resolution.

In the rest of this section, we model the image formation of real-world objects in a user’s eye with ETLs, as a mathematical basis of our technique (Section 2.1). Then, we discuss the blur range of the user with a given optical power of the ETL (Section 2.2). When real objects are within this range, they become blurred on the retina of the user. Based on the blur range, we describe a design guideline for effective illumination timings and the range of the optical power modulation in the focal sweep (Section 2.3). Finally, we discuss a method to alleviate visible seams between the focused and blurred areas (Section 2.4). Note that, in this paper for simplicity and without loss of generality, we consider only the positive optical powers of the ETL. The methods can be directly extended to negative optical powers.

2.1 Image formation of real-world objects in the human eye with ETL

As a mathematical basis of our technique, we model the image formation of real world objects in a human eye with ETLs. Specifically, the model computes the size of the blur circle of a point in a real scene on the retina of a user’s eye. The human eye consists of several refracting bodies such as the cornea, aqueous humor, vitreous humor, and crystalline lens. We consider these together as a single lens and the retina as an image plane without loss of generality [8458263]. When a point in a real scene is defocused, it is imaged as a spot (i.e., a blur circle) on the image plane called a circle of confusion (CoC).

Figure 2: RTM analysis. (left) A ray passes through space. Note that . (middle) A ray passes through a thin lens. (right) Two points are conjugate to each other for a given RTM.

We compute the size of blur circle on the image plane (i.e., retina) based on ray transfer matrix (RTM) analysis (Figure 2). RTM analysis is a mathematical tool used to perform ray tracing calculations under paraxial approximation. The calculation requires that all ray directions are at small angles relative to the optical axis of a system such that the approximation remains valid. An RTM is represented as follows:


where is an RTM and a light ray enters an optical component of the system crossing its input plane at a distance from the optical axis and travels in a direction that makes an angle with the optical axis. After propagation to the output plane that ray is found at a distance from the optical axis and at an angle with respect to it. If there is free space between two optical components, the RTM is given as follows:


where is the distance along the optical axis between the two components. Another simple example is that of a thin lens whose RTM is given by


where is the optical power (inverse of focal length) of the lens.

Figure 3: Parameters in the RTM analysis of the proposed system.

A user of our system wears ETLs as eyeglasses such that the eye and the ETL share the same optical axis (Figure 3). The size of the blur circle on the image plane is computed by tracing the marginal ray from a point on a real object. Assuming that the ETL is larger than the pupil of the eye, the marginal ray passes at the edge of the pupil. The angle of the marginal ray at the object can be computed by solving the following equation


where , , , , and represent the diameter of the pupil, the angle of the ray at the eye, the distance between the ETL and the eye, the optical power of the ETL, and the distance between the object point and the ETL, respectively. Thus,


Finally, size of the blur circle on the image plane is computed by solving the following equation


where , , and represent the angle of the ray at the image plane (i.e., retina), the distance between the lens of the eye and the retina, and the optical power of the lens of the eye, respectively. Substituting Equation 5 in Equation 6 gives the resultant size of the blur circle as follows:


2.2 Blur range of human eye with ETL

Figure 4: Blur range based on Equation 8.

For each optical power of the ETL , we can compute the range of the distance from the ETL to a point on the optical axis, within which real objects become blurred on the retina. The blur range is determined by two factors: accommodation and DOF. Accommodation refers to the process by which the human eye changes its optical power to maintain focus on an object as its distance varies from the far point (the maximum distance) to the near point (the minimum distance). We denote the optical powers for the far and near points as and , respectively. For each optical power of the eye, real objects are in acceptably sharp focus on the retina when they are within the DOF. Suppose we denote the acceptable size of the CoC as , the maximum and minimum distances of the DOF ( and , respectively) are computed using Equation 7 as follows:


When real objects are at points and distant from the ETL on the optical axis, the light from these objects hits the top or bottom endpoints of the acceptable CoC. Then, the blur range for an optical power of the ETL is determined by two borders illustrated in Figure 4 as:

  • is less than subject to (near border).

  • is larger than subject to (far border).

Figure 4 shows the blur range for different assuming the human eye as the reduced eye model [reduced_eye]. We found that the near border is less than 80 mm for all positive optical powers. The proposed method requires a projector to illuminate real objects, and it is difficult for most commercially available projectors to provide such illumination when the objects are less than 80 mm away from a user’s face. Therefore, in the rest of this paper, we only consider the far border of the blur range. Figure 4 shows the value at which a real object at a certain distance from the ETL is blurred on the retina.

2.3 Design guideline of illumination timing and focal range sweep

Figure 5: Illumination timings considering accommodation. The focal length of the ETL is modulated at 60 Hz. Objects to appear blurred are illuminated when the optical power of the ETL is the greatest in the modulation. The other objects to appear focused are simultaneously illuminated when the optical power is zero. These objects appear focused when the observer gazes at them. This is perceptually equal to a situation where these objects appear focused simultaneously.

Figure 4 provides an insight into human vision, i.e., a human with normal vision can accommodate an object located from near to far distances, when the optical power of the ETL is low. This means that it is sufficient to illuminate objects to appear focused only when the optical power is zero (i.e., ). In this case, when a user looks at an object (and thus, focuses on it), only those objects located at the same distance are in focus at the same time. When the user looks at another object, the previously focused objects become out of focus. However, in most cases, our attention is only on an object that we are looking at; thus, a situation where objects we look at are always in focus is perceptually equal to these objects being in focus simultaneously. Figure 4 provides another insight, i.e., an object placed at any distance can be blurred when the optical power is sufficiently large. Therefore, as a guideline for illumination timing, we illuminate objects to appear focused when and those to appear blurred when (Figure 5).

A large requires a wide focal sweep range. This leads to a long period in the optical power modulation because a typical ETL physically changes the lens thickness to modulate its optical power. Because the sweep frequency needs to be larger than the CFF in the proposed method, the sweep range cannot be increased excessively. In addition, the wider the focal sweep range is, the more the optical power changes within one frame of the projector. To illuminate real objects accurately at a desired optical power, the sweep range should not be increased.

Therefore, the focal sweep range needs to be both as small as possible and it is sufficiently large to make a target real object appear blurred. Assume a typical situation where (1) several real objects are located at different positions in a real scene, (2) the distances from these objects to the ETL are known, and (3) some objects appear blurred. We determine the sweep range in the following two steps. First, we compute the minimum optical power value required to make each appear blurred. This value is on the “far border” of Figure 4. Then, we select the maximum optimal power among the minimum values. Note that we refer to the selected optical power as in the rest of the paper. We determine the focal sweep range as 0 diopter (D) to , where is a user-defined offset.

2.4 Alleviating visible seam caused by apparent scaling of real-world objects

Figure 6: Visible seam by apparent scaling of a blurred area. Left: the doughnut region (red lines) is illuminated when . Consequently, the doughnut region appears blurred and becomes larger relative to the optical center. Right: the apparent scaling causes the gap and overlap between the doughnut and the other regions, resulting in dark and bright seams, respectively.

When we see a real-world object through typical eyeglasses that correct for either myopia or hyperopia, the apparent size of the object becomes smaller or larger. The same phenomenon occurs in the proposed system where the ETLs are used as eyeglasses. A unique problem with the proposed system is that the apparent scaling of a real scene is not spatially uniform. More specifically, the apparent size of a real object differs spatially, when different optical powers are applied to different parts of the object. For example, when the system makes only a certain area of the object appear blurred by illuminating this area when and the other area when , only the blurred area becomes larger relative to the optical center. When the area where is located closer to the optical center than the area where , a gap occurs between these areas and appears as a dark seam (Figure 6). On the other hand, when the two areas are in the reverse locations, these areas overlap and the overlapped area appear as a bright seam. This section describes a method to alleviate the seams.

Figure 7: Ray trancing of the composition of the ETL and eye.

First, we discuss the extent to which the apparent size of a real object is changed (i.e., scaling factor) by the ETL using a ray tracing technique. As shown in Figure 7, light rays from a real object are refracted at the ETL such that they form a real image. The human eye observes these refracted rays. The distance from the ETL to the image is obtained by the thin lens formula:


Suppose the height of the object from the optical axis is , that of the image is obtained by the similarity of triangles and substituting Equation 9 as follows:


For the human eye, the visual angle of the real object is determined by a refracted ray passing through the center of the lens. Thus, the visual angle is obtained by the following equation


Substituting Equations 9 and 10, the angle is computed as follows:


Therefore, the scaling factor of the real object under compared to the object under can be computed as follows:


Second, we describe the method to alleviate a seam caused by the apparent scaling of a part of a real object. We apply a simple feathering or blending technique. Suppose there are two areas next to each other, one without scaling (i.e., ) and the other with scaling (i.e., ) that are illuminated at and , respectively. We can calculate the seam region as the difference of the scaled area between before and after scaling using Equation 13. We illuminate the seam region using the high-speed projector both at and . At , the intensity of the illumination in the seam region is decreased linearly from the unscaled area (intensity=1.0) to the scaled area (intensity=0.0). At , the intensity is decreased linearly from the scaled to the unscaled area to ensure that the sum of these contributions becomes 1.0 at any seam region.

3 System Evaluation

We evaluated the proposed technique using a prototype system. First, we investigated how accurately our mathematical models predicted the size of a blur circle and the blur range (Sections 3.2 and 3.3, respectively). Then, we determined if the proposed technique achieved spatial defocusing (Section 3.4). Finally, we evaluated how well visible seams between blurred and focused regions could be alleviated (Section 3.5).

3.1 Experimental setup

Figure 8: System configuration (red box: digital signal, blue box: analog signal).

We constructed a prototype system consisting of a pair of ETLs and a synchronized high-speed projector (Figure IlluminatedFocus: Vision Augmentation using Spatial Defocusing via Focal Sweep Eyeglasses and High-Speed Projector(a)). We used polymer-based liquid lenses as the ETLs. The polymer-based ETLs achieve faster focal change than other types of ETL while maintaining a relatively large aperture size. Consequently, they have been exploited in a wide range of optical systems, from micro-scale systems, such as microscopes, to larger-scale systems, such as HMDs [Konrad:2017:ACN:3072959.3073594, 8456852, Chang:2018:TMD:3272127.3275015, 4637321] and projectors [7014259]. Specifically, we inserted two ETLs (16 mm aperture, Optotune AG, EL-16-40-TC) in an eyeglass frame fabricated from an FDM 3D printer to form a wearable (6912867 mm, 200 g) device (Figure IlluminatedFocus: Vision Augmentation using Spatial Defocusing via Focal Sweep Eyeglasses and High-Speed Projector(a)). The optical power of the ETL was controlled from -10 D to 10 D by changing the electrical current. The digital signal generated by a workstation (CPU: Intel Xeon E3-1225 v5@3.30GHz, RAM: 32 GB) was input to a D/A converter (National Instruments, USB-6343) and converted to analog voltage. This voltage was then converted to an electric current by a custom amplifier circuit using an op-amp (LM675T). Finally, the current was fed to the ETL. According to the ETL’s data sheet, the input analog voltages in our system ranged from -0.07 to 0.07 V. We employed a consumer-grade high-speed projector (Inrevium, TB-UK-DYNAFLASH, 1024768 pixels, 330 ANSI lumen) that can project 8-bit grayscale images at 1,000 frames per second. Projection images were generated by the workstation and sent to the projector via a PCI Express interface. The display timing of each projection image was adjusted by a 5 V trigger signal from the workstation via the D/A converter. The system configuration is depicted in Figure 8. We assume that our system works in a dark environment.

Figure 9: Measured optical powers by input waves with different ranges.

The IlluminatedFocus system performed a periodic focal sweep by applying a sinusoidal wave as an input signal to the ETLs. The frequency of the wave was set to 60 Hz throughout the experiment. We applied waves of different offsets and amplitude to the ETL and measured the resulting optical powers. We prepared 71 input voltage values (from -0.07 to 0.07 V at 0.002 V intervals) and used every combination of these values (2485 in total) as the maximum and minimum values of the input sinusoidal waves. The optical power was measured using a photodiode and a laser emitter (see our previous paper [izawa] for more details). Figure 9 shows two examples of the time series of the measured optical powers by input waves with different ranges. As shown in this figure, we found that the output values were periodical; however, they did not form clean sinusoidal waves. Therefore, we stored one period of each output wave along with the corresponding input wave in a database in order to look up the optical power at a given phase of the input wave. Once a target range of optical power is given, we determined the input wave by searching the database for one having the narrowest range among those capable of generating the target range of the optical power.

To synchronize the ETLs and the high-speed projector, we used the same photodiode to measure the delay of the high-speed projector from a trigger signal of the workstation to the actual projection. As a result, we found that the delay was 0.46 ms. Using this delay information and the data in the database, we can use to projector to illuminate a real object exactly when the ETLs’ optical powers are the target optical power. We conducted a preliminary test and determined 0.2 D as the offset value (Section 2.3).

3.2 Blur circle size

Figure 10: Blur size measurement. Top left: captured dot pattern on the surface (contrast and brightness adjusted). Top right: measurement setup. Bottom: measured radius of the blur circle. Note that the brightness of the dot pattern images are adjusted for better visibility.

We evaluated how a real scene appeared blurred using the proposed system. We measured a dot pattern on a planar surface using a camera (Sony 7S II, lens: Sony FE 24mm F1.4 GM) on which the ETL is mounted (Figure 10). We prepared five measurement distances between the surface and the camera ( mm). We measured the dot pattern with eleven different optical powers () under two conditions: normal and proposed. The normal condition was used as the baseline. In the normal condition the dot pattern was observed with fixed ETL optical power. In a preliminary test, we printed a black dot pattern on a sheet of white paper and observed it with different optical powers. However, due to the low contrast of the printed media, the blur circles of the dots were not observable at more than 1.0 D. Therefore, we used the high-speed projector to project a dot pattern onto a uniformly white surface in a dark room. The camera’s exposure time was set to 1/60 s and the dot pattern was projected for 1 ms (i.e., one frame of the projector), and the optical power of the ETL was fixed. In the proposed condition, we applied the focal sweep to the ETL and projected the same dot pattern for 1 ms when the optical power was one of the eleven values. The input wave to the ETL was determined as discussed in Section 3.1. More specifically, we prepared ten input waves to generate target ranges of the output optical power by combining 0.0 D and the eleven optical powers (i.e., 0.0 to 0.0 D, 0.0 to 0.5 D, 0.0 to 1.0 D, …, 0.0 to 5.0 D). Note that the optical power of the ETL was actually not swept in the first target range (0.0 to 0.0 D). The camera exposure time was the same as the normal condition.

Figure 10 shows the measured dot patterns. The appearance of the real scene (the projected dot pattern) is similar under the normal and proposed conditions. This result indicates that the proposed system successfully synchronized the ETL and the projector achieved focus control of real-world appearances that was the same as that of a normal lens. More quantitatively, we binarized each captured image using Otsu’s method [4310076], picked the center blur circles, fit circles to the selected blur circles, and measured their radii. Figure 10 shows the measured radii under both normal and proposed conditions with blur sizes computed using our image formation model (Equation 7). In this result, the measured radii were slightly larger but very close to the model values. The slight offset was due to the projected dots not being infinitesimal points.

Taking this into account, we consider that our image formation model well predicted the size of the blur circle in the proposed system. We computed the difference between the size of blur circles under normal and proposed conditions. The average difference was 1.4 pixels with a standard deviation (SD) of 1.8 pixels. This difference

occurred because the optical power of the ETL changed within a single frame of the dot pattern projection (i.e., 1 ms); consequently, the blur circles of different sizes were accumulated in the capture process.

3.3 Blur range

We measured the blur range of users of our system based on a typical visual acuity test. When a participant could not distinguish the correct direction of a Landolt ring of the visual angle of one minute, we considered that the ring was located within the blur range of the participant’s eye. We printed an array of six Landolt rings in randomized directions on a piece of paper and placed it in front of the ETL. We prepared different sizes of the rings according to the distance from the ETL to maintain the visual angles of the rings as one minute. Each participant viewed the rings using their dominant eye through the ETL and reported the direction of the rings. If more than three directions were correct, we considered that the current distance was not within the blur range. The Landolt rings were placed at nine distances from the ETL ( mm). At less than 500 mm, the projected image region was too small to illuminate the Landolt rings in our setup. However, 2500 mm was a sufficiently far distance for our intended applications (Section 4). For each distance, we changed the optical power of the ETL from 0 to 2.4 D at 0.2 D intervals (i.e., 13 optical powers). Then, we recorded the maximum optical power at which each participant could distinguish the directions of more than three Landolt rings.

Figure 11: Blur range measurement. Left: averages and SD of maximum optical powers at which each participant could distinguish the directions of more than three (out of six) Landolt rings. Right: plotted data with the computational model of “far border” by Equation 8 (Figure 4).

Eight participants were recruited from the local university (male: 7, female: 1, age: 22-32). Six participants were nearsighted. Their vision was corrected with the ETL by offsetting its optical power. The optical power values in the following results are adjusted to consider the offset. Throughout the experiment, each participant’s head was fixed using a chin rest. Figure 11(left) shows the average and SD of the maximum optical power values at each distance. We plotted the average values with the model “far border” (Figure 4) in Figure 11(right). This result indicates that the blur range can be accurately predicted by the proposed model (Equation 8). Therefore, in the subsequent experiments and applications, we used the model and the “far border” to compute the focal sweep range (Section 2.3).

3.4 Spatial defocusing

Figure 12: Evaluation of spatial defocusing: (a) Experimental setup. Captured results where (b) only object B appears blurred and (c) only object B appears focused.

We investigated whether or not depth-independent spatial blur control is possible in the proposed system. We placed four objects denoted A, B, C, and D at different locations as shown in Figure 12(a). The objects were 250, 500, 500, and 1300 mm, respectively, from the ETLs. According to the design guideline in Section 2.3, the focal sweep range was determined as 0 D to . Then, objects to appear focused were illuminated when the optical power of the ETLs was 0 D, and objects to appear blurred were illuminated when it was .

We prepared two spatial defocusing conditions. In the first condition, only the center object appeared blurred and the other objects appeared focused. in this condition was the optical power on the “far border” of 500 mm (Figure 4). In the second condition, only the center object appeared focused and the other objects appeared blurred. in this condition was the optical power on the “far border” of 250 mm. We asked ten local participants to observe the objects through the ETLs under the two conditions. As before, participant’s heads were fixed during the experiment. After observation of each condition, we asked the participants to identify which object appeared blurred. All the participants responded that only object B appeared blurred under the first condition and objects A, C, and D appeared blurred under the second condition.

Figure 12 shows captured appearances of the objects under the two experimental conditions using a camera (Ximea MQ013CG-ON) attached to one of the ETLs. To reproduce the perceived appearances, the illumination timings were different from the above. Specifically, the projector illuminated each object to appear focused when it is in focus of the camera with the ETL, and another to appear blurred when it is out of focus. For example, in the first condition, the projector illuminated objects A, B, C, and D when the focusing distances of the camera were 250, 500, 500, and 1300 mm, respectively. All participants agreed that the captured image showed similar appearances they observed in the above mentioned user study. Therefore, we confirmed that the proposed system achieved depth-independent spatial blur control. In particular, it is optically impossible to produce the appearances of Figure 12 with normal lens systems. These results verify the effectiveness of the proposed spatial defocusing technique.

3.5 Alleviating visible seam

We conducted an experiment to investigate the effectiveness of our method to alleviate the visible seam caused by the apparent scaling of observed real objects. We prepared eight experimental conditions (= 2 textures2 seams2 optical powers). Specifically, we used two textured surfaces (document and picture) as observed objects. Each paper was placed 500 mm away from the ETLs. As discussed in Section 2.4, there are two types of seams (gap and overlap) according to the spatial relationship of the blurred and focused areas. Therefore, two seam conditions were prepared for each texture. The width of a seam varies according to the optical power of the blurred area. To check if the method works for different seam widths, we prepared two optical powers (1 D and 2 D) for the experiment. For each condition, we compared the appearance of the surface with our method to that without our method.

The same participants (Section 3.4) observed the textured surfaces under the eight conditions. As before, throughout the experiment, each participants’ head was fixed using a chin rest. After observing a pair of appearances with and without our method in each condition, the participants were asked if the seam was alleviated by our method. All the participants answered that the proposed method could alleviate the seam in all conditions.

Figure 13: Captured results of the visible seam alleviation.

Figure 13 shows captured appearances of the surfaces under all eight conditions using the same camera used in the previous experiment (Section 3.4). All participants agreed that the figure shows appearances similar to what they observed. From the captured results, we confirmed that the proposed method could successfully alleviate the visible seam caused by the apparent scaling of observed real objects.

4 Applications

We developed four different vision augmentation application prototypes using the proposed IlluminatedFocus technique. In this section, we show how they worked. Please see the supplementary video for more details. Note that we used an ETL-mounted camera with a lens adapter for the recording. The adapter added a dark ring in the video images. In actual applications, a user does not see the ring.

4.1 Visual guide

Figure 14: Visual guide applications: (a) Part of a musical score sheet appears in focus to support a practice session. (b) Museum guidance where a curator explains the object by moving the focused area from its face to its body. (c) Tool selection support drawing a user’s attention to a tool by decreasing the saliency of other tools (i.e., blur and desaturation).

As the first application, we implemented three visual guidance systems that naturally direct the user’s gaze to a specific part of a real object by making that part appear focused and the other parts appear blurred. Figure 14(a) shows the first example where a part of a musical score to be played is made to appear focused. We assume that the musical score is scanned in advance and the notes on the score are recognized. The focused area automatically moves over the notes so that a player can use the proposed system to practice the score while keeping a desired tempo. Figure 14(b) shows the second example. We assume a situation where a curator in a museum or a teacher in a class sequentially explains several parts of a target object to visitors or students. The curator (teacher) moves the focused area manually using a pointing device (e.g., a touch panel) according to the explanation to draw the visitors’ (students’) attention to this specific area. Figure 14(c) shows the last example. In this example, a tool to be used in the next operation is made to appear focused. We assume a situation where an inexperienced person uses the proposed system to assemble a complicated electrical system. The person is not familiar with the tools and has no idea which one to use in each step. Our system can support such a situation to draw the person’s attention to the right tool. In this application, we used a full-color high-speed projector (Texas Instruments, DLP LightCrafter 4500) and applied a radiometric compensation technique [10.1111:j.1467-8659.2008.01175.x] to make the right tool appear focused and to decrease the color saturation of the other tools. This example verifies the effectiveness of the combination of the proposed and SAR techniques. Overlaying a virtual arrow or characters is an alternative solution for visual guidance. The visibility of the overlaid information depends on the appearance of the background. In case of a cluttered background, it is difficult for an observer to understand the information [6381407]. In such situations, we believe that our blur-based approach provides better visual guidance.

Figure 15: User study: (a) Experimental setup. (b) The appearances of the objects under the blur (b-1) and arrow (b-2) conditions, respectively. The leftmost object was indicated. (c) The box plots of the participants’ answers ().

We conducted a user study to investigate how users react to our artificial blur in the visual guide application. We placed five objects on a table (Figure 15(a)) and asked participants to gaze at an object which is indicated either by a projected arrow or spatial defocusing (i.e., only the indicated object appears focused). Thus, there were two experimental conditions regarding the indication method (i.e., arrow condition and blur condition). The object to be gazed was randomly switched at 2 seconds intervals. Each participant performed this task for 1 minute in each condition. Right after each task, the participant answered the following four questions based on 7-point Likert scale (1 = strongly yes, 7 = not at all):

Q1: Did you notice flickers?

Q2: Do you feel motion sickness?

Q3: Are you tired?

Q4: Were the indications difficult to understand?

The environment light was turned off in the experiment.

Ten participants were recruited from a local university. They saw the objects through the ETLs which were fixed as shown in Figure 15(a). The appearances of the objects in the both experimental conditions and the results were shown in Figure 15(b-1, b-2, c). For each question, we performed a paired -test between the result in the arrow condition and that in the blur condition. We confirmed that there was a significant difference only in the fourth question (). From the results of Q1, Q2 and Q3, we confirm that the levels of the negative reactions to our artificial blur are as low as those to the normal projection mapping. In addition, the result of Q4 shows that a visual guide interface based on our artificial blur provides a better understandability to a user than the conventional GUI-based interface.

4.2 F+C visualization

Figure 16: F+C visualization application. See supplementary video for better understanding of the spatial layout on the desk.

As the second application, we implemented an F+C visualization system. This system supports a student studying at a desk to concentrate on reading textbooks and writing in notebooks. The system makes the textbooks and notebooks appear focused and the other areas on the desk appear blurred. Figure 16 shows some appearances of the desk in the system. From the result, we confirmed that the system successfully suppressed visual clutters on the desk, and consequently provided a student with an effective learning environment where they could focus only on their studies.

4.3 Diminished reality

Figure 17: DR applications. (a) Diminishing undesirable signs of facial skin aging by blurring them out. The red and blue arrows indicate blotches and a wrinkle, respectively. (b) Blurring a dynamic object (a toy spider). The yellow arrow indicates a motion capture marker. (b-1) The spider on a black stick is not blurred. (b-2, b-3) The spider appears blurred by the system at different locations.

As the third application, we implemented a DR system. This system conceals undesirable signs of facial skin aging (e.g., blotches, pores, and wrinkles) by blurring them out. Figure 17(a) and the supplementary video show the results. We can also apply the proposed system to dynamic objects. Figure 17(b) shows an example. In this example, the system blurs a moving toy spider. The position of the spider is measured by a motion capture system, and the system interactively moves the blurred area according to the position. The same system can be applied to other scenarios. For example, we can conceal private documents on a shared tabletop in a face-to-face collaboration. Typical DR methods can replace or fill in an undesired object with its background texture to make it completely invisible. Our system cannot completely diminish a target; however, it allows a user to see the DR result by their naked eye. Thus, it can obviously provide the DR result with more reality than typical DR methods that only work on VST displays.

4.4 DOF enhancement

Figure 18: Enhanced DOF effect for 2D pictures. (a-1, b-1) The foreground object appears blurred. (a-2, b-2) The background object appears blurred.

For the fourth application, we explored a new vision augmentation scenario. Because focusing and defocusing affect the perceived 3D structure of a real scene, we believe that our method can affect the depth perception of a real object by blurring it. We applied our method to add an enhanced DOF effect to 2D pictures. More specifically, we made either the foreground or background of a picture appear blurred to enhance the perceived depth variation of the picture. Figure 18 shows a captured result of the application. We showed the portrait picture (Figure 18(top)) to 10 local participants in two conditions, i.e., with and without the DOF enhancement, and asked in which condition they perceived more depth variation. All participants answered that greater depth variation was perceived with the DOF enhancement. Therefore, we confirmed that the proposed method can support this vision augmentation application.

5 Discussion

Through the evaluations and application implementations described in Sections 3 and 4, we confirmed that depth-independent spatial defocusing is achieved by the proposed IlluminatedFocus technique. Although previous systems achieved the same technical goal, they applied VST-AR approaches that prevented a user from understanding the context of a real scene and communicating with others with good eye contact because the user’s eyes were blocked by a VST display. The proposed technique solved this problem by allowing the user to see the augmented scene by nearly naked eyes. Through the evaluation, we also confirmed that the optical characteristics of the proposed system can be well described by a mathematical model based on the typical thin lens model discussed in Section 2. Our design guideline discussed in Section 2.3 also worked well in the experiments. The proposed system is not limited to static objects. It also works for dynamic objects, as shown in Section 4. In theory, there are no geometric requirements among a viewer, projector, and objects thanks to our design guideline considering the viewer’s accommodation, and there is no constraint on the number of objects. We believe that an important contribution of this paper is establishing these valid and useful technical foundations, which will allow future researchers and developers to build their own vision augmentation applications on top of the IlluminatedFocus technique.

As discussed in previous computer graphics research [Held:2010:UBA:1731047.1731057], blur can strongly influence the perceived scale of a rendered scene. A full-size scene would look smaller or a miniature can look bigger. Therefore, inappropriately designed blur would perhaps introduce a false impression. In this paper, we focused on developing a spatial defocusing technology without careful consideration of human perceptual constraints. In future, it would be crucial to investigate this issue and establish a design guideline for vision augmentation applications using the IlluminatedFocus technique for future researchers and developers.

Figure 19: Feasibility test for environment light. Left: experimental setup. Right: captured appearance under a slightly dim environment light (40 lux). Only the middle object was intended to appear blurred.

A current limitation of the proposed system is its small lens aperture. The aperture of the current system is 16 mm, which limits the FOV of a user. However, the ETL industry is an emerging field and ETL’s characteristics are being improved rapidly. For example, ETLs have been getting thinner and lighter. Recently, it was announced that a new ETL provides an aperture that is almost two times larger (30 mm) than the current one [Padmanabaneaav6187]. Therefore, we believe that the form factor of our system can be similar to typical eyeglasses in near future, which solves the FOV problem. Another limitation is that we assume to use our technique in a dark environment where only the projector illuminates the scene. We conducted a simple experiment to check how an environment light affected the spatial defocusing result in our system. Figure 19 shows the result under slightly dim lighting (40 lux). From this result, we confirmed that the system appeared to work under environmental lighting (see Figure IlluminatedFocus: Vision Augmentation using Spatial Defocusing via Focal Sweep Eyeglasses and High-Speed Projector(b) for the experimental setup). However, in theory, a user could see both the focused and defocused appearances of the scene. Investigating the visual perception of the user in this condition is interesting future work. Another more technical future direction is to build an environment where multiple projectors cooperatively illuminate the real scene such as in a previous study [Fender:2017:MRO:3132272.3134117], resulting in an environment is not dark.

Figure 20: Edges of a blurred area become visible when it moves fast.

The current system does not assume fast movement of a blurred area. Fast movement makes the edge of the blurred area visible. Assume a simple case where a squared blurred area moves to the right (Figure 20). In addition, assume we illuminate the blurred area in the first half of 1/60 s and the focused area in the second half. In this situation, the right edge is illuminated for 1/60 s when it moves and the left edge is not illuminated for 1/60 s. Consequently, the edges are visible to a user. We apply blending to the edges to alleviate the visible seam as described in Section 3.5, which suppresses this effect when the movement is sufficiently slow [4538845]. However, if the movement is too fast, the edge becomes conspicuous to the viewer. This problem occurs in the DR application for human skin, as shown in the supplementary video. In future, we intend to modify our design guideline to solve this problem.

The size of the blur circle in the proposed system is slightly larger than that in a normal lens system, as shown in Section 3.2. This is due to the integration of blur circles of different sizes within the illumination period (i.e., 1 ms in our experiment). To generate the optical power more accurately, we need to illuminate a scene for a shorter period. DLP projectors are capable of illuminating a real scene at a much higher frame rate ( 40k fps). However, the higher frame rate results in an appearance that is too dark for a user to perceive. Therefore, there is a tradeoff between the accurate optical power generation and the brightness of the scene. We can address the tradeoff by controlling the ETL more flexibly. Currently, we apply a simple sinusoidal wave as the input signal to the ETL; consequently, its optical power stays at the target power for a very short period. We can increase the period by applying a staircase modulation [izawa]. This allows a projector to illuminate the real scene for a longer period during which the size of the blur circle does not change.

6 Conclusion

This paper presented the IlluminatedFocus technique by which spatial defocusing of real-world appearances is achieved regardless of distances from users’ eyes to observed real objects. The core of the technique is the combination of focal sweep eyeglasses using ETLs and a synchronized high-speed projector as illumination for a real scene. We described the technical details involved in achieving the spatial defocusing including the mathematical model of the blur range, the design guideline of the illumination timing and the focal sweep range, and the technique to alleviate a visible seam between focused and blurred regions. Through the experiments, we confirmed the feasibility of our proposal for vision augmentation applications. As future works, we will investigate more robust and flexible techniques to use the method under more relaxed conditions, such as under environment lighting.

This work was supported by JSPS KAKENHI grant number JP17H04691 and JST, PRESTO Grant Number JPMJPR19J2, Japan.