Augmented Reality Oculus Rift

04/29/2016 ∙ by Markus Höll, et al. ∙ 0

This paper covers the whole process of developing an Augmented Reality Stereoscopig Render Engine for the Oculus Rift. To capture the real world in form of a camera stream, two cameras with fish-eye lenses had to be installed on the Oculus Rift DK1 hardware. The idea was inspired by Steptoe steptoe2014presence. After the introduction, a theoretical part covers all the most neccessary elements to achieve an AR System for the Oculus Rift, following the implementation part where the code from the AR Stereo Engine is explained in more detail. A short conclusion section shows some results, reflects some experiences and in the final chapter some future works will be discussed. The project can be accessed via the git repository



There are no comments yet.


page 1

page 7

page 11

page 12

page 17

page 18

page 38

page 39

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Augmented Reality (AR) is the modern approach to create some kind of graphic holograms and place them into reality. Creating virtual worlds and objects is already possible since a long time using computer graphics but those virtual worlds generated with computer graphics were so far strictly separated from the real world, without any connection. Augmented Reality is now the key point to change this co-existence by merging both, the virtual world and the real world, together into a common visualisation. It is basically an interface to create holograms which we know already from science fiction movies or games. Using this technology, we are able to extend the space of the real world with digital information and visualisations of a virtual world.

Virtual Reality (VR) on the other hand, is a complete immersion into a virtual environment. It is achieved by blending out the real world around completely. There are some key factors which are very important to guarantee a deep immersion into a virtual world, like a stereo 3D view, to which humans are accustomed from biological eyes and also a proper head rotation. Such a virtual 3D view can be achieved with special hardware, the so called head-mounted-displays (HMDs). Representatives of these HMDs are for example the Oculus Rift, HTC Vive and Sony’s VR.

There are already some AR prototype devices out there, like the Google Glass, Microsoft’s Hololens and Smartphones running AR applications. The AR immersions are however limited due to the hardware construction. We created an AR System which makes use of the superior immersion of an VR HMD. The Oculus Rift DK1 served as the basic hardware for our project. Extending the Oculus Rift with two fish-eye cameras (IDS uEye UI-122-1LE-C-HQ) on the front plate gave us the possibility to extend the VR hardware to an AR device.

In the theoretical section 2, we will have a look at the key elements of camera calibration with fish eye lenses to capture the real world, creating virtual holograms and merging them both together. Also, we will discuss some theoretical basics of an Oculus Rift integration into an existing graphics engine.

The third section 3 is treating the practical implementation of the AR stereo engine. We will discuss the implementation here in more detail, explaining some important code parts like the main function, camera capturing, calibration, render loop, stereo 3D rendering and the Oculus Rift integration.

The modified AR hardware and some results are shown in chapter 4.

And lastly we will discuss some future related work in chapter 5 on which we will continue working on.

2 Theory

2.1 Computer Vision - Capturing the real world

2.1.1 Camera Models, 3D-2D Mapping

Capturing the real world 3D space coordinates onto an undistorted 2D image is called a 3D-2D mapping [2]. In order to compute such an undistorted mapping, the cameras have to be calibrated previously to find according intrinsic camera parameters, which are also called the camera matrix of a pinhole camera model. There are several different camera models which are used to calibrate cameras. The pinhole camera model is the most traditional one, which assumes a perfect aperture. This means a proper camera calibration is essential to achieve good results by computing the undistorted 3D-2D mapping.

2.1.2 Fish-Eye Lenses, Wide Field-Of-View

Traditional camera systems have a very small field-of-view (FOV) about 45. These limitations are problematic for capturing the scene motion with a moving camera [3]. Cameras with fish-eye lenses have the property of a very wide FOV. This property makes fish-eye lenses interesting in fields of photography and computer vision. Figure 1 illustrates such a fish-eye lense.

Figure 1: Camera model with fish eye lense. Image courtesy of ”Scaramuzza et al. [4]”.

Cameras with fish-eye lenses cause significantly higher image errors on the 3D-2D mapping due to the higher lense distortion. Especially on the edges the distortion is significantly higher than in the center. It is not suitable to calibrate cameras with such a high FOV using a traditional pinhole camera model.

2.1.3 Omnidirectional Camera Calibration

The omnidirectional camera model of Scaramuzza [4]

finds the relation between the 3D vector and a 2D pixel using a mirror or a fish-eye lens in combination with a camera.

3D-2D coordinate mapping from a 2D pixel and a 3D vector using Scaramuzza’s omnidirectional camera model is illustrated in Figure 2

Figure 2: 3D-2D coordinate mapping using the omnidirectional camera model. Image courtesy of ”Scaramuzza et al. [4]”.

The projection can be achieve by the projection function f(p) at equation 1 which is a polynomial function. The coefficients of the projection function f(p) are the calibration parameters and p describes the distance from the omnidirectional image center. The degree of polynom can be chosen, however, according to Scaramuzza he experienced best results with a polynom of 4.


After finding the calibration parameters, the lense distortion can be corrected by finding image point corresponendences. The result is an undistorted image mapped from 3D to 2D, illustrated in Figure 3

Figure 3: Left: distorted camera image. Right: undistorted camera image. Image taken from [5].

2.2 Computer Graphics - Creating virtual 3D holograms

2.2.1 Architecture

Computer graphics is a key element to extend reality with holograms. Programming computer graphics is different, because the code is accessed by the GPU instead of the CPU which works in a different way. The main reason for this is because the GPU’s architecture is totally different from CPU architecture. CPU cores are designed to run single threads as fast as possible, with a huge cache and smaller algorithmic logic unit (ALU). GPUs on the other hand are designed to run highly parallel. Therefore, a graphic API is needed to communicate with the GPU, for example DirectX, OpenGL or Vulkan. An abstract architecture is shown in Figure 4

Figure 4: Architecture of a graphic interface. Image taken from [6].

2.2.2 Render Pipeline

Programs running on a GPU are called shaders, which are written for example in HLSL when using DirectX API, or GLSL when using OpenGL API. Some years ago, the render pipeline was a fixed function pipeline, which means their funcitonality were implemented in hardware without customization. Since the invention of the unified shader model there are alot of stages that are individually programmable by developers using different kinds of shaders for different purposes. Each graphics engine has to have at least one vertex- and one fragment shader implemented. Modern graphic engines also have additional geometry- and tesselation shaders between the vertex- and the fragment shader. Figure 5 shows a very rough abstraction of the render pipeline.

Figure 5: Abstraction of a basic render pipeline. Image taken from [7].

2.2.3 Application

The application stage takes care of user inputs and is executed on the CPU. Further on, the application stage feeds the geometry stage with geometric primitives (points, lines and triangles) like Akenine [7] points out. Virtual objects are constructed by defining vertices (points) within a space and computing polygons from them. Complex 3D models are designed previously using modeling programs like Blender, Maja or 3dsMax. The model files (.obj, .3ds, .fxb etc.) consisting of the vertices are then loaded through the application stage. Figure 6 shows how a simple 2D polygon would be defined on a carthesian xy-plane.

Figure 6: Simple 2D polygon defined by 4 different vertices in carthesian xy-plane. Image taken from [8].

2.2.4 Geometry

The programmable vertex shader is part of the geometry stage. In the geometry stage, some important computations are the model-view transform, plane projection and clipping. The model transform is placing the model’s vertex coordinates into the virtual world . The 3D scene is obsevered by a virtual camera which is placed also in the world with a specific translation and rotation. The view-transform places the camera to the origin and this transformation is further on applied to all of the rendered models. This is done because the view computation is much easier with this approach. The view transformation is illustrated in Figure 7.

Figure 7: model-view transformation - placing models into the virtual world and transforming camera’s position to the origin. Image taken from [7].

Projecting now the observed 3D scene onto a 2D plane is the next step. According to Akenine [7], there are 2 commonly used projection methods depending on the viewing volume, namely a ortographic or a perspective projection. In our case, the viewing volume is a frustum which causes the effect that object which are farther away appear smaller. This can be done using perspecitve projection, which is shown in Figure 8.

Figure 8: Left: orthographic projection. Right: perspecitve projection. Image taken from [7].

This is the minimum functionality which a vertex shader has to do at least per model vertex. The whole process can be done by multiplying each model’s homogenous vertex with 3 matrizes which are called the model matrix, view matrix and projection matrix shown in Equation 2


Clipping is used to render only objects which are (partially) in the virtual camera’s frustum to avoid lower performance. With all that, the most basic functionality of the geometry stage is roughly covered.

2.2.5 Rasterizer

The goal of this stage is to give the pixels their final color values. Post processing computation, texturie mapping, depth testing, illumination and alot more can be done with a fragment shader. One pixel can be covered by more than one object, for each of those fragments the computation takes place to color the pixel. Depth testing also takes place fragment-wise which is computed using z-value comparison of each fragment’s vertices. An illustration of the Polygon rasterization can be seen in Figure 9.

Figure 9: Left: Model in vertex stage. Right: Model in rasterization stage. Image taken from [9].

The process of texturing gives 3D polygons a more realistic appearance. According texture coordinates can be read from each visible fragment to compute a texture mapping on 3D models. Those texture coordinates are commonly defined in the model files and defined by u and v coordinates in a 2D image space. The effect of texture mapping is shown in Figure 10.

Figure 10: Left: 3D Model with- and without texture mapping. Right: 2D texture. Image taken from [7].

The lighting is also computed withing the fragment shader, because it makes no sense calculating lighting beforehand at the vertex stage where lighting computation even would take place on vertices who are not even in the clipping space and therefore not visible for the observer. Instead, lighting is computed fragment wise. The Bidirectional-Reflectance-Distribution-Function (BRDF) of a simple lambertian shading with a perfectly diffuse reflector is given by


The diffuse reflectance is given by , meaning the fragment’s color. When the surface point is only illuminated by a single light source, the radiance is then given by the Equation 4.


n is the surface normal vector of the fragment, s is the the light direction from the surface point to the light source and is the irradiance from the light source, meaning the light color. Figure 11 illustrates a surface illuminated by a single point light.

Figure 11: single point light illuminating a surface. Image taken from [8].

2.3 Augmented Reality - Merging both worlds together

With the real world captured and the virtual world fully created, it is now possible to merge both information streams together into one visualization to achieve a first basic AR system. It is therefore neccessary to render the camera stream into a dynamic 2D texture, which gets updated on each render cycle. The camera stream has to be rendered orthogonal to the screen.

To avoid a depth overlap, the camera stream is rendered without z-buffering, which means it will be treated as a background plane from a computer graphics point of view. With this, it is warranted that the created holograms will be rendered visible for the observer without colliding with the camera stream.

The virtual world however, is just a space in without a visible ground, sky or anything like that. It is just a space to place the virtual 3D models in. The merging process of a textured 3D cube hologram is shown in Figure 12.

(a) camera stream
(b) Augmented
(c) 3D hologram
Figure 12: Merging the holograms

2.4 Virtual Reality Immersion - Stereoscopic Rendering

2.4.1 3D View

With HMDs, it is possible to immerse fully into virtual worlds like computer games. HMDs are designed to give a stereoscopic view of the virtual scene like we are used to with human eyes. This understanding of depth is achieved by observing a scene from 2 different cameras, normally our eyes, which are slightly translated along the x-axis. The human eyes have a interpupillary distance (IPD) of approximately 65 mm as stated here [10]. The Oculus Rift is designed in a way that the left eye sees the left half of the intern screen and the right eye sees the right half of the intern screen as illustrated in Figure 13.

Figure 13: HMD’s eye view cones. Image taken from [10].

Achieving a natural 3D view from the real world is simply achieved by using 2 stereo cameras translated by a IPD of about 65 mm aswell. The camera stream as a 2D texture can later on be translated and adjusted accordingly to achieve a correct 3D view of the real world.

To integrate a HMD to the AR engine, it is therefore needed to render the whole AR scene twice, alternating the real world camera stream aswell as translating the virtual camera according to the IPD of human eyes and rendering the virtual scene from different viewpoints with that.

2.4.2 Post-Processing, Image Distortion

However, since the Oculus Rift enhanced the virtual reality immersion through a very wide FOV achieved by the lenses, the original images would show a pincushion distortion as pointet out by Oculus [10]. To counteract that, the images have to be post-processed by applying a barrel distortion shown in Figure 14.

Figure 14: barrel distortion to counteract the lense-based pincushion distortion of the HMD. Image taken from [10].

2.4.3 Stereoscopic Projection Transformation

The perviously used projection matrix based on a perspective projection can no longer be used in stereoscopic rendering. Therefore, it is not sufficient to only translate the virtual cameras along the x-axis, because the cameras would no longer project at the same plane. Thus the projection matrix has to be modified to compute a stereo projection transformation as pointet out by Nvidia [11]. Stereo projection transformation is shown in Figure 15.

Figure 15: stereo projection transformation for stereoscopig 3D projection. Image taken from [11].

According to Nvidia [11], a general computation of the stereo projection transformation can be achieved by modifying the normal perspective projection matrix.

Left handed row major matrix (D3D9 style):

Right handed column major matrix (OpenGL style):

where side is -1 for left and +1 for right, pij are coefficients of the mono perspective projection, convergence is the plane where left and right frustums intersect and separation is the normalized version of interaxial by the virtual screen width.

2.4.4 Head Rotation, Hardware Sensor

Further on, the Oculus Rift’s hardware integrated gyro sensor can be used to gain information about the human’s head rotation (yaw, pitch and roll). They can now be used to synchronize the head rotation with the virtual mono camera to apply the same kind of view rotation observing the 3D holograms. The head rotation is illustrated in Figure 16.

Figure 16: Head rotation using gyro sensor from the Oculus Rift. Image taken from [10].
Stereoscopic AR Rendering:
  1. receive head rotation from gyro sensor integrated in the Oculus Rift hardware and apply rotation to virtual mono camera.

  2. Translate virtual mono camera about  32 mm along the negative x-axis and render AR scene with left camera stream into 2D texture.

  3. Translate virtual mono camera about  32 mm along the positive x-axis and render AR scene with right camera stream into 2D texture.

  4. Set the Swapchain’s RenderTargetView now finally to the Backbuffer, to present the upcoming stereo image on the application’s screen.

  5. supply the Oculus SDK with the 2 stereo rendered 2D textures and initiate the post-processing barrel distortion, provieded by the Oculus SDK.

3 Implementation

The code of the project is located at a public GIT repository which can be accessed via

3.1 Hardware Modification

To develop an AR system, it was of course neccessary to add both of the IDS uEye UI-122-1LE-C-HQ cameras in a way to capture stereo images. The cameras are bolted onto a plastic glass which is wired ontop of the front plate of the Oculus Rift DK1. The modified hardware is shown below.

Figure 17: Hardware Modifications on Oculus Rift DK1

3.2 OCamCalib Matlab Toolbox

Using the OCamCalib Omnidirectional Camera Calibration Toolbox for Matlab, developed by Scaramuzza [12] it was possible to find intrinsic camera parameters for both of the IDS uEye UI-122-1LE-C-HQ cameras which were used in this project.

The toolbox implemented also an automative corner detection on a checker board sample, shown in Figure 18

Figure 18: Calibration -Automatic Corner Detection

A visualization integrated in the toolbox shows the extrinsic camera parameters due to the samples images that have been taken. The illustration below shows the OcamCalib visualization.

Figure 19: Calibration - Extrinsic Camera Parameters

3.3 Engine Design

The AR Stereo Engine has been written in DirectX11 using C++. It has basically to handle 3 major assignments, on the one hand the image capturing, creation the virtual scene and lastly melting them both together. All the classes are illustrated in Figure 20. The left classes concern about the camera stream, on the right side are the graphics related classes and in the middle is the Oculus integration class. The scale factor of the class is standing for the importance of the class to give a better picture of the engine’s design. The core of the whole system is, however, the GraphicsAPI.

On the left side there is the ARiftControl class, which takes care of input handling and getting images from camera image buffers. All the low-level image capturing from the cameras had to be written manually, because due to the bandwith limitations of USB 2.0 cameras, it was not possible to stream both cameras parallel using OpenCV over the same HUB. So it was neccessary to implement the image capturing manually, and thats what the IDSuEyeInputHandler is used for. The class directly uses IDSuEye Driver functions to initialize the cameras, allocating memory to them and retrieving image frames.

On the right side, there are the graphic related classes. The GraphicsAPI class is the core of the whole AR Engine which means it is communicating with all other graphics classes.

The OculusHMD static class is basically the interface to the Oculus Rift. With that, the HMD’s initialization, post-processing rendering and head rotation synchronization can be done.

EyeWindow is only used as an additional functionality when the user is rendering with the stereo engine but without an HMD device. It takes care of placing the stereo images besides each other on the screen itself without the post-processing barrel distortion.

RenderTexture saves the rendered scene into two 2D texture, which later on gets supplied to the OculusHMD instance.

Model represents the 3D computer graphic objects, which means the actual holograms.

Each Model has its own Texture object which can be loaded to compute a texture mapping on them.

The BitMap is needed for rendering the camera stream orthogonal to the screen which is handled of course differently to normal 2D textures here.

Camera class is like a normal virtual camera.

HeadCamera is further used to simulate the head rotations of the real head into the virtual camera. All the 3D view rendering is done with that.

Lighting handles the virtual illumination, which means light direction, color of the light aswell as ambient lighting.

The Shader serves as an interface to the GPU. It holds all data neccessary for the shader programs and inits the rendering.

Figure 20: AR stereo engine code design. Image property of Markus Höll.

Since explaining the whole engine is very complex and huge, only a rough explanation about the initialization can be gathered in chapter 3.4. Further on, the basic stereo render loop is explained in chapter 3.5. If more code detailes are desired, chapter 3.6 gives a deeper look into the actual implementation. Though, there are only some code details shown which are also very abstracted for the sake of readability. Section 3.7 lists some program configurations which can be enabled or disabled and chapter 3.8 shows the used librarys.

3.4 Initialization

The main function basically does an initialization of the GraphicsAPI, which is the core of the whole engine. Also the ARiftControl instance gets initialized here, but only if an AR HMD device is plugged in to the computer. This can be controlled via the define AR_HMD_ENABLED.

The ARiftControl instance is allocated and initiated by default. Details about ARiftControl can be taken from 3.6.2.

Since the Constructor of GraphicsAPI is called in line 4, all the members which are held inside there, like an instance of ARiftControl (to retrieve a camera image later on as a 2D texture), both of the DirectX11 interfaces (RenderDevice, RenderContext), HeadCamera, BitMap, Model Vector, Shader, Lighting and so on, get initialized with 0 in the beginning. They will be assigned correctly later on.

After that, ARiftControl will be initialized properly in line 9, which is explained in more detail at 3.6.2.

1blueint main(blueint, bluechar**)
3  green// DirectX Graphics and OculusHMD
4  dx11 = bluenew GraphicsAPI();
5  HANDLE handle_render_thread = 0;
6  ARiftControl cont;
8  blueif (AR_HMD_ENABLED)
9    cont.init(dx11);
11   green// Activate the Graphics (DirectX11) Thread
12   handle_render_thread = CreateThread(NULL, 0,
13    directXHandling, &cont, 0, NULL);

The project was always considered to be multi-threaded since it was clear from the start that in future work it will be developed further. This is the reason why the engine is running in another thread besides of the main thread which makes it easily extendable further on and that is what line 12 is doing, starting the projects actual core thread.

1DWORD WINAPI directXHandling(LPVOID lpArg)
3  ARiftControl* arift_c = (ARiftControl*)lpArg;
5  blueif (AR_HMD_ENABLED)
6  {
7    green// install the Oculus Rift
8    OculusHMD::initialization(dx11);
9    OculusHMD::instance()->calculateFOV();
10  }
12  dx11->window_class_.cbSize = bluesizeof(WNDCLASSEX);
13  dx11-> = CS_HREDRAW | CS_VREDRAW;
14  dx11->window_class_.lpfnWndProc = WindowProc;
15  dx11->window_class_.hInstance = dx11->hinstance_;
16  dx11->window_class_.hCursor = LoadCursor(NULL, IDC_ARROW);
17  dx11->window_class_.hbrBackground = (HBRUSH)COLOR_WINDOW;
18  dx11->window_class_.lpszClassName = dx11->applicationName_;
20  RegisterClassEx(&dx11->window_class_);
22  green// application window
23  dx11->window_ = CreateWindowEx(NULL, dx11->applicationName_, L"DirectXRenderScene",
25    NULL,  NULL, dx11->hinstance_, NULL);
29  ShowWindow(dx11->window_, SW_SHOW);
30  SetFocus(dx11->window_);
32   blueif (AR_HMD_ENABLED)
33     OculusHMD::instance()->configureStereoRendering();
35  green// msg loop
36  bluewhile (TRUE)
37  {
38     green// new msg?
39     bluewhile (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
40     {
41        TranslateMessage(&msg);
42        DispatchMessage(&msg);
43     }
44    green// quit
45    blueif (msg.message == WM_QUIT)
46      bluebreak;
48    blueif (msg.message == WM_CHAR)
49    {
50      green//trigger ARiftControl inputhandling
51      arift_c->handleKey((bluechar)msg.wParam);
52    }
53    green// Run "engine" code here
54    green// -----------------------
55    arift_c->camInput_->grabFrames();
56    frame_return = dx11->Frame();
57    green// -----------------------
58  }
59  bluereturn msg.wParam;

Line 8 is initializing the static Oculus Rift if the AR_HMD_ENABLED define is enabled and Line 9 calculates the FOV according to the hardware specification. The Oculus Rift configuration is explained in more detail at 3.6.4.

Further on, the applications main window is initialized where the DirectX scene will be rendered to. In Line 27 the GraphicsAPI gets initialized properly and sets up all the graphic related parts. It is recommended to look into the GraphicsAPI at section 3.6.1 for more details. Line 33 configures some parameters for the HMD stereo rendering.

The program enters then the main loop which is requesting input messages and iterating the render loop. If a msg which is not the WM QUIT message is triggered in form of a character, the input handling from ARiftContol 3.6.2 handles the according input.

After that, the new camera frames get captured with Line 55 and further on the next stereo render starts. Since the render loop is the core of the whole engine, it is explained in chapter 3.5 in more detail.

3.5 Stereo Render Loop

1bluebool GraphicsAPI::Frame()
3  green// request head rotation from the Oculus Rift hardware
4  OculusMotion();
6  headCamera_->SetPosition(ariftcontrol_->cameraPositionX_,
7    ariftcontrol_->cameraPositionY_,  ariftcontrol_->cameraPositionZ_);
9  green// Render the virtual scene.
10  result = Render();
12  bluereturn result;

First of all, the Oculus’ current head rotation is captured using the gyro sensor. Details about that can be seen in chapter 3.6.4. The virtual camera’s rotation is synchronized with the oculus rotation here. Further on, the Render() method is called.

1bluebool GraphicsAPI::Render()
5    OculusHMD::instance()->StartFrames();
7  green// [Left Eye] first pass of our rendering.
8  result = RenderToTexture(renderTextureLeft_, 1);
10  BeginScene(0.0f, 0.0f, 0.0f, 1.0f);
12  blueif (!HMD_DISTORTION)
13  {
14    TurnZBufferOff();
15    RenderEyeWindow(eyeWindowLeft_, renderTextureLeft_);
16    TurnZBufferOn();
17  }
19  green// [Right Eye] second pass of our rendering
20  result = RenderToTexture(renderTextureRight_,2);
22  blueif (!HMD_DISTORTION)
23  {
24    TurnZBufferOff();
25    RenderEyeWindow(eyeWindowRight_, renderTextureRight_);
26    TurnZBufferOn();
27  }
30    OculusHMD::instance()->RenderDistortion();
31  blueelse
32    EndScene();
34  bluereturn result;

In Line 8 the scene gets rendered the first time where the virtual HeadCamera will be translated about  32 mm along the negative x-axis. The render target will be set to the left RenderTexture instead of the backbuffer. Line 14 will trigger only when the post-processing barrel distortion from the Oculus SDK is not desired, therefore the rendering works differently because presenting both of the eye images is handled by the engine itself, without barrel distortion of course. To also render the scene from the viewpoint of the right eye, Line 20 calls the second renderpass where the virtual camera will be translated by  32 mm along the positive x-axis and the rendered image will be store in a RenderTexture for the right eye. Presenting both stereo images now depends on the chosen engine configuration. Only if AR_HMD_ENABLED and HMD_DISTORTION are set to 1, the last render pass is handled by the Oculus SDK and post-processing barrel distortion is applied. If the engine is configured differently, EndScene() is called instead and the stereo image rendering is handled manually.

3.6 Code Details

3.6.1 GraphicsAPI

1bluevoid GraphicsAPI::OculusMotion()
3  bluefloat oculusMotionX, oculusMotionY, oculusMotionZ;
4  OculusHMD::instance()->trackMotion(oculusMotionY, oculusMotionX, oculusMotionZ);
6  headCamera_->SetRotation(-oculusMotionX, -oculusMotionY, oculusMotionZ);

Receiving the head rotation using the HMD’s gyro sensor and set it as the HeadCamera’s rotation aswell. Details about the implementation of the Oculus device can be taken from’ chapter 3.6.4.

1bluebool GraphicsAPI::RenderToTexture(RenderTexture* renderTexture, blueint cam_id)
3  green// set render target to RenderTexture
4  renderTexture->SetRenderTarget(devicecontext_, GetDepthStencilView());
5  green// clear the buffer
6  renderTexture->ClearRenderTarget(devicecontext_, GetDepthStencilView(), 0.0f, 0.0f, 1.0f, 1.0f);
8  green// render scene into RenderTexture
9  result = RenderScene(cam_id);
11  green// set next rendertarget  to the BackBuffer
12  SetBackBufferRenderTarget();
14  bluereturn result;

This method sets the next render target to a given RenderTexture, clears the buffer and renders the scene into it. Afterwards, the next rendertarget is set back to the BackBuffer again.

1bluebool GraphicsAPI::RenderScene(blueint cam_id)
3  GetWorldMatrix(worldMatrix);
4  GetOrthoMatrix(orthoMatrix);
6  green// ********** || 2D RENDERING || ***********
7  TurnZBufferOff();
8  result = shader_->Render(devicecontext_, bitmap_->GetIndexCount(), worldMatrix, cameraStreamMatrix, orthoMatrix,
9    bitmap_->GetTexture(), undistBuffer, illumination_->GetDirection(), illumination_->GetDiffuseColor(),
10  illumination_->GetAmbientColor());
11  TurnZBufferOn();
13  green// ********** || 3D RENDERING || ***********
14  green// set head center to eye center offset
15  headCamera_->headToEyeOffset_.position = ariftcontrol_->headToEyeOffset_;
16  green// Generate the view matrix
17  headCamera_->RenderEye(cam_id == 1);
18  green// apply stereo projection transformation
19  StereoProjectionTransformation(cam_id);
20  green// render all 3D models
21  bluefor (std::vector<Model*>::iterator model = models_.begin(); model != models_.end(); model++, i++)
22  {
23    result = shader_->Render(devicecontext_, (*model)->GetIndexCount(), worldMatrix, viewMatrix, stereoProjectionMatrix,
24                        model_tex, illumination_->GetDirection(), illumination_->GetDiffuseColor(), illumination_->GetAmbientColor());
25  }

The method renders first the 2D bitmap with the camera as an image to the screen. Further, all the matrix computations are done and all 3D models are rendered here as well.

1bluevoid GraphicsAPI::StereoProjectionTransformation(blueint camID)
3  Matrix4f proj = ovrMatrix4f_Projection(OculusHMD::instance()->eyeRenderDesc_[camID-1].Fov, screenNear_, screenDepth_, bluefalse);
4  stereoprojectionmatrix_._11 = proj.M[0][0];
5  stereoprojectionmatrix_._21 = proj.M[0][1];
6  stereoprojectionmatrix_._31 = proj.M[0][2];
7  stereoprojectionmatrix_._41 = proj.M[0][3];
8  [...]

Due to stereo rendering, it is neccessary to modify the normal perspective projection matrix by applying a stereoscopic projection transformation. To achieve the correct modification for the Oculus hardware, the Oculus SDK is used here co compute the stereo projection transformation and copy to the intern stereo projection matrix for rendering.

3.6.2 ARiftControl

3  rightCameraParameters_.Nxc = 79.4f;
4  rightCameraParameters_.Nyc = 67.2f;
5  rightCameraParameters_.z = -177.0f;
7  rightCameraParameters_.p9 = 0.0f;
8  [...]
9  rightCameraParameters_.p0 = 712.870100f;
10  rightCameraParameters_.c = 1.000052f;
11  rightCameraParameters_.d = 0.000060f;
12  rightCameraParameters_.e = -0.000032f;
13  rightCameraParameters_.xc = 236.646089f;
14  rightCameraParameters_.yc = 394.135741f;
15  rightCameraParameters_.width = 752.0f;
16  rightCameraParameters_.height = 480.0f;
18  green// same for leftCameraParameters
19 [...]

The constructor’s assignment is primary to set all the camera parameters gained from the camera calibration for the left- and the right camera.

1bluevoid ARiftControl::handleKey(bluechar key)
3  lastKeyTime = std::chrono::system_clock::now();
4  blueswitch (inputMode_)
5  {
6    bluecase InputMode::DEFAULT: {...}
7    bluecase InputMode::MODEL: {...}
8    bluecase InputMode::WORLD: {...}
9    bluecase InputMode::CAMERA: {...}
10    bluecase InputMode::MOVEMENT: {...}
11  }

The input handling is based on different modes like MODEL, CAMERA, DEFAULT, and so on. During DEFAULT mode, all actions go to the camera stream like translating both cameras or zooming in- and out of the camera stream. MODEL mode is used to manipulate the virtual objects and translating them in the virtual 3D space, CAMERA is doing the same but with the virtual mono camera and so on.

3.6.3 IDSuEyeInputHandler

1bluebool IDSuEyeInputHandler::openCams(blueint left_cam,blueint right_cam)
3  hcam_[0] = left_cam;
4  is_InitCamera(&hcam_[0], NULL);
5  is_SetColorMode(hcam_[0], IS_CM_RGBA8_PACKED);
6  is_SetDisplayMode(hcam_[0], IS_SET_DM_DIB);
7  is_SetExternalTrigger(hcam_[0], IS_SET_TRIGGER_SOFTWARE);
8  green// start capture and wait for first image to be in memory
9  is_CaptureVideo(hcam_[0], IS_WAIT);
10  switchAutoSensorGain(1);
11  switchAutoSensorShutter(1);
13  green// same for right_cam
14  [...]
16  green// add memory to cam
17  is_AllocImageMem(hcam_[cam], CAMERA_WIDTH, CAMERA_HEIGHT, CAMERA_DEPTH*CAMERA_CHANNELS, &new_mem_addr, &new_mem_id);
18  cam_img_mem_[cam].push_back(std::make_pair(new_mem_addr,new_mem_id));
19  is_AddToSequence(hcam_[cam], cam_img_mem_[cam].back().first, cam_img_mem_[cam].back().second);
20  is_SetImageMem(hcam_[cam], cam_img_mem_[cam].back().first, cam_img_mem_[cam].back().second);

The IDSuEyeInputHandler::openCams() function starts the communication with the uEye vendor driver and does some camera configurations like setting the color mode, switch auto sensor gain and switch auto sensor shutter. Further on, also memory is added to the cameras.

1 bluebool IDSuEyeInputHandler::grabFrame(blueint cam)
3  is_LockSeqBuf(hcam_[cam - 1], IS_IGNORE_PARAMETER, last_img_mem);
4  memcpy(driver_data, last_img_mem, CAMERA_BUFFER_LENGTH);
5  is_UnlockSeqBuf(hcam_[cam - 1], IS_IGNORE_PARAMETER, last_img_mem);
7  blueif (cam == 1)  green// no camera flip needed
8  {
9    WaitForSingleObject(cameraMutexLeft_, INFINITE);
10    memcpy(cameraBufferLeft_, driver_data, CAMERA_BUFFER_LENGTH);
11    ReleaseMutex(cameraMutexLeft_);
12  }
13  blueelse  green// camera flip needed image
14  {
15    WaitForSingleObject(cameraMutexRight_, INFINITE);
16    blueunsigned bluechar *buffer = cameraBufferRight_;
17    bluechar *driver_buffer = driver_data + CAMERA_BUFFER_LENGTH;
18    blueint byte_per_pixel = (CAMERA_CHANNELS * CAMERA_DEPTH) / 8;
19    bluefor (blueint pixel_id = 0; pixel_id < CAMERA_WIDTH * CAMERA_HEIGHT; pixel_id++)
20    {
21      memcpy(buffer, driver_buffer, byte_per_pixel);
22      buffer += byte_per_pixel;
23      driver_buffer -= byte_per_pixel;
24    }
25    ReleaseMutex(cameraMutexRight_);
26  }

Since one of the cameras had to be placed upside down onto the frontplate of the Oculus Rift, that camera image has to be flipped.

3.6.4 OculusHMD

The OculusHMD instance is communicating directly with the Oculus SDK Framework. It is basically doing the device creation here. The details of the SDK’s functions can be read in the Oculus Developer Guide [10].

1bluevoid OculusHMD::calculateFOV()
3  bluefor (blueint eye = 0; eye<2; eye++)
4  {
5     eyeSize_[eye] = ovrHmd_GetFovTextureSize(hmd_, (ovrEyeType)eye,
6        hmd_->DefaultEyeFov[eye], 1.0f);
7  }

The method is calculating the correct FOV texture size. Since the scene has to be rendered into two 2D textures for the Rift, it is essential that these textures have the exact texture size to guarantee the desired FOV of the HMD hardware.

1bluevoid OculusHMD::configureStereoRendering()
3  ovrD3D11Config d3d11cfg;
4  d3d11cfg.D3D11.Header.API = ovrRenderAPI_D3D11;
5  d3d11cfg.D3D11.Header.BackBufferSize = Sizei(hmd_->Resolution.w, hmd_->Resolution.h);
6  d3d11cfg.D3D11.Header.Multisample = 1;
7  d3d11cfg.D3D11.pDevice = graphicsAPI_->GetDevice();
8  d3d11cfg.D3D11.pDeviceContext = graphicsAPI_->GetDeviceContext();
9  d3d11cfg.D3D11.pBackBufferRT = graphicsAPI_->rendertargetview_;
10  d3d11cfg.D3D11.pSwapChain = graphicsAPI_->swapchain_;
12  ovrHmd_ConfigureRendering(hmd_, &d3d11cfg.Config,
13  ovrDistortionCap_Chromatic | ovrDistortionCap_Overdrive,
14  hmd_->DefaultEyeFov, eyeRenderDesc_))
16  useHmdToEyeViewOffset_[0] = eyeRenderDesc_[0].HmdToEyeViewOffset;
17  useHmdToEyeViewOffset_[1] = eyeRenderDesc_[1].HmdToEyeViewOffset;
18  ovrHmd_GetEyePoses(hmd_, 0, useHmdToEyeViewOffset_, eyeRenderPose_, NULL);
19  ovrHmd_AttachToWindow(OculusHMD::instance()->hmd_, graphicsAPI_->window_, NULL, NULL);
21  green// disable health and security warnings
22  ovrHmd_DismissHSWDisplay(hmd_);

The graphic buffers sizes are configured here for the hardware device and also the swapchain, rendercontext and renderdevice.

1bluevoid OculusHMD::StartFrames()
3  ovrHmd_BeginFrame(hmd_, 0);

Calls a function from the Oculus SDK which is acting equivalently like the normal BeginScene() method from a render loop.

1bluebool OculusHMD::RenderDistortion()
3  ovrD3D11Texture eyeTexture[2];
4  Sizei size;
8  green// Stereo Eye Render
9  ovrRecti eyeRenderViewport[2];
10  eyeRenderViewport[0].Pos = Vector2i(0, 0);
11  eyeRenderViewport[0].Size = size;
13  eyeTexture[0].D3D11.Header.API = ovrRenderAPI_D3D11;
14  eyeTexture[0].D3D11.Header.TextureSize = size;
15  eyeTexture[0].D3D11.Header.RenderViewport = eyeRenderViewport[0];
16  eyeTexture[0].D3D11.pTexture = graphicsAPI_->renderTextureLeft_->renderTargetTexture_;
17  eyeTexture[0].D3D11.pSRView = graphicsAPI_->renderTextureLeft_->GetShaderResourceView();
18  green// same for eyeRenderViewport[1] with renderTextureRight_
19  [...]
21  ovrHmd_EndFrame(hmd_, eyeRenderPose_, &eyeTexture[0].Texture);

This method’s purpose is to supply the Oculus SDK with both of rendered RenderTextures. During the call to ovrHmdEndFrame() also the SDK barrel distortion is applied to the image as a post-processing effect.

1bluevoid OculusHMD::trackMotion(bluefloat& yaw, bluefloat& eyepitch, bluefloat& eyeroll)
3  ovrTrackingState tracking_state = ovrHmd_GetTrackingState(hmd_, ovr_GetTimeInSeconds());
5  blueif (tracking_state.StatusFlags & (ovrStatus_OrientationTracked | ovrStatus_PositionTracked))
6  {
7    OVR::Posef pose = tracking_state.HeadPose.ThePose;
8    pose.Rotation.GetEulerAngles<Axis_Y, Axis_X, Axis_Z>(&yaw, &eyepitch, &eyeroll);
10    yaw = RadToDegree(yaw);
11    eyepitch = RadToDegree(eyepitch);
12    eyeroll = RadToDegree(eyeroll);
13  }

Using the Oculus SDK, it is very easy to gather the hardware’s current rotation state and save it (radians to degree).

3.6.5 Shader

1bluebool Shader::InitializeShader(ID3D11Device* device, HWND hwnd, WCHAR* vsFilename, WCHAR* psFilename, WCHAR* undistShaderFilename)
3  green// Compile all 3 shader programs from file
4  result = D3DCompileFromFile(vsFilename, NULL, NULL, "LightVertexShader", "vs_5_0",
5    D3D10_SHADER_ENABLE_STRICTNESS, 0, &vertexShaderBuffer, &errorMessage);
6  [...] green// Fragment shader
7  [...] green// Undistortion shader
9  green// Fragment Shader (Texture Mapping, Illumination)
10  result = device->CreatePixelShader(pixelShaderBuffer->GetBufferPointer(),
11    pixelShaderBuffer->GetBufferSize(), NULL, &pixelshader_);
12  [...] green// Fragment Shader (Undistortion)
13  [...] green// Vertex Shader (Transf, Proj.)
15  green// 3D Vertices
16  polygonLayout[0].SemanticName = "POSITION";
17  [...]
18  green// 2D Texture Coordinates
19  polygonLayout[1].SemanticName = "TEXCOORD";
20  [...]
21  green// Normals
22  polygonLayout[2].SemanticName = "NORMAL";
23  [...]
25  green// assign the vertex layout.
26  result = device->CreateInputLayout(polygonLayout, numElements, vertexShaderBuffer->GetBufferPointer(),
27    vertexShaderBuffer->GetBufferSize(), &layout_);
29 green// set uniforms withing the shader program
30  result = device->CreateBuffer(&lightBufferDesc, NULL, &lightBuffer_);
31  [...] green// matrixBuffer
32  [...] green// undistortionBuffer

Reads in the shader code for the Vertex-, Fragment- and Undistortion Shader from a file, compiles it and sets the Polygonlayout of the 3D vertices, 2D texture coordinates and normals. Also copys the uniform buffers within the shader programs, which are the matrixBuffer, lightbuffer and undistortionBuffer.

1bluevoid Shader::RenderShader(ID3D11DeviceContext* deviceContext, blueint indexCount, bluebool undistort)
3  green// vertex layout
4  deviceContext->IASetInputLayout(layout_);
6  deviceContext->VSSetShader(vertexshader_, NULL, 0);
7  blueif (undistort)
8    deviceContext->PSSetShader(undistortionShader_, NULL, 0);
9  blueelse
10    deviceContext->PSSetShader(pixelshader_, NULL, 0);
12  deviceContext->PSSetSamplers(0, 1, &samplestate_);
14  green// render
15  deviceContext->DrawIndexed(indexCount, 0, 0);

Does the actual graphic rendering and sets the shaders which should be used.

3.7 Program Configuration

The program has some different configuration possibilities, which are handled by using the following defines within the code:

  • OpenLabNight_DEMO - enables an animated virtual scene which has been shown at the OpenLabNight of the ICG institute

  • AR_HMD_ENABLED - configure if a HMD is connected to the computer

  • HMD_DISTORTION - configure if the barrel distortion from the Oculus SDK is desired

  • SHOW_FPS - enable Frames-Per-Second visuals

  • FULL_SCREEN - start the program in full screen or window mode

  • RIFT_RESOLUTION_WIDTH - set the HMDs full display resolution

  • RIFT_RESOLUTION_HEIGHT - set the HMDs full display resolution

  • CAM1 - camera 1 device ID

  • CAM2 - camera 2 device ID

  • CAMERA_CHANNELS - set camera channels

  • CAMERA_WIDTH - set camera width

  • CAMERA_HEIGHT - set camera height

  • CAMERA_DEPTH - set camera bit depth

  • CAMERA_BUFFER_LENGTH - compute camera memory in bytes

  • SCREEN_DEPTH - configure the virtual far plane

  • SCREEN_NEAR - configure the virtual near plane

3.8 Libraries

  • Windows SDK Win8.1

  • DirectX ToolKit

  • Oculus SDK 0.4.4

  • IDS uEye Driver

  • OCamCalib Matlab Toolbox

4 Conclusion

Since all the other AR hardware devices like the Google Glass and Microsoft’s Hololens are limited in case of augmentation space due to the screen limitations, the immersion with the Oculus Rift is very intense compared to them. Through the stereoscopic view, the holograms appear like they are really in the real world. And since the HMD hardware completely masks out reality, it is possible to augment the whole screen space for both eyes. This yields the huge AR immersion.

Developing a new engine from scratch with DirectX was a good decision instead of using an already existing graphics engine, since it gives full control over what the engine does and how it does it because it is all C++ code. With that, a huge degree of freedom is achieved since the development takes place at the ”hardware’s metal”. This decision was also made for the sake of extensibility, since it was already clear right from the beginning that the AR engine with the Oculus Rift will be extended with more other features and algorithms.

Previously, the undistortion of the camera images were computed using OpenCV library, but the high distortion could not be handled on the CPU as fast due to lower fps. That’s why the undistortion is computed on a fragment shader within the engine. The performance gain was very satisfying, without any fps dropdowns.

Since the Oculus Rift integration is very clean encapsulated in the code, it should be very straight forward to also integrate other HMDs if that should be desired in the future.

However, there is one bottleneck in case of extensibility and that is the basic camera capturing. Since both of the cameras are USB 2.0 cameras, it was not possible to stream both cameras over the same USB HUB using OpenCV, which would establish an easy way of camera streaming. The bus bandwith with USB 2.0 cameras simultaneously was not sufficient to handle the huge data loads. That is why it was necessary to write the camera streaming manually again ”on the metal”, using the IDS uEye drivers. If in future other cameras are desired, it would be a good decision to consider one from the same manufactorer since they will probably rely on the same hardware driver. Figure

21, 22, 23 and 24 show some final results from the AR stereo engine. The model files and textures shown in the sample results are taken from web sources like TF3DM under educational usage purposes.

Figure 21: Results - Godzilla
Figure 22: Results - Starship
Figure 23: Results - Godzilla Tiny
Figure 24: Results - Multiple Holograms

5 Future Work

Since the project’s results are very satisfying, it is already clear how the engine will be extended and evolve in future. One of them is an integration of the Large-Scale Direct Monocular Simultaneous Localization and Mapping (LSD-SLAM) algorithm [13]. LSD-SLAM outclasses the common augmentation method of pattern placements in the real world to recognize the virtual viewpoint. The algorithm keeps track of the camera translation and rotation and computes intern a 3D pointcloud of an unknown environment, which makes it possible to really move in both worlds without any pattern placements.

The second project will be an integration of a realtime Hand Tracking algorithm [14]

for hand pose estimation. This will make it possible to directly interact with the 3D holograms using hand gestures. Also a hand segmentation could be done, which will make it more realistic when interacting with the virtual objects. We are already assigned to both future projects.

Another one is to design a 3D printed hardware attachment to put on 2 cameras at will on the frontplate of the Oculus Rift. This makes it possible to share the AR attachemenet over the world with other people.


  • [1] W. Steptoe, S. Julier, and A. Steed, “Presence and discernability in conventional and non-photorealistic immersive augmented reality,” in Mixed and Augmented Reality (ISMAR), 2014 IEEE International Symposium on, pp. 213–218, IEEE, 2014.
  • [2] C. Bräuer-Burchardt and K. Voss, “A new algorithm to correct fish-eye-and strong wide-angle-lens-distortion from single images,” in Image Processing, 2001. Proceedings. 2001 International Conference on, vol. 1, pp. 225–228, IEEE, 2001.
  • [3] J. Gluckman and S. K. Nayar, “Ego-motion and omnidirectional cameras,” in Computer Vision, 1998. Sixth International Conference on, pp. 999–1005, IEEE, 1998.
  • [4] D. Scaramuzza, A. Martinelli, and R. Siegwart, “A flexible technique for accurate omnidirectional camera calibration and structure from motion,” in Computer Vision Systems, 2006 ICVS’06. IEEE International Conference on, pp. 45–45, IEEE, 2006.
  • [5] H. Li and R. Hartley, “A non-iterative method for correcting lens distortion from nine point correspondences,” OMNIVIS 2005, vol. 2, p. 7, 2005.
  • [6] E. Lengyel, Mathematics for 3D game programming and computer graphics. Cengage Learning, 2005.
  • [7] T. Akenine-Möller, E. Haines, and N. Hoffman, Real-time rendering. CRC Press, 2008.
  • [8] J. Vince, Mathematics for computer graphics. Springer Science & Business Media, 2013.
  • [9] K. Proudfoot, W. R. Mark, S. Tzvetkov, and P. Hanrahan, “A real-time procedural shading system for programmable graphics hardware,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 159–170, ACM, 2001.
  • [10] Oculus VR LLC, Oculus Developer Guide.
  • [11] Nvidia, “Stereoscopic 3d demystified: From theory to implementation in starcraft 2,” 2011.
  • [12] D. Scaramuzza, A. Martinelli, and R. Siegwart, “A toolbox for easily calibrating omnidirectional cameras,” in Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pp. 5695–5701, IEEE, 2006.
  • [13] J. Engel, T. Schöps, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in Computer Vision–ECCV 2014, pp. 834–849, Springer, 2014.
  • [14]

    M. Oberweger, P. Wohlhart, and V. Lepetit, “Training a feedback loop for hand pose estimation,” in

    Proceedings of the IEEE International Conference on Computer Vision, pp. 3316–3324, 2015.