Computer Graphics history has several examples of important hardware milestones. They changed the way real time algorithms could be designed and implemented and created vast opportunities for advances in research.
One aspect of graphics cards that has been advancing consistently in the last decades is shader flexibility. In the beginning we had graphics libraries using a fixed rendering pipeline, which could only receive data and instructions from the CPU. No GPU side programming could be done at that time. This aspect was changed later, with the advent of programmable shaders. Vertex and Pixel Shaders were introduced, creating a revolution in the possibilities for real time graphics. Later on, those capabilities were increased with the exposition of more programmable rendering stages . Applications could implement Tesselation and Geometry Shaders to have access to customizable geometry resolution and primitive connectivity.
With the increment in shader flexibility, several algorithms were proposed to solve general problems using graphics hardware . The technique was to adapt the problem description to fit the rendering pipeline and the Single Instruction Multiple Data (SIMD) model that shaders use. The next step in graphics hardware was clear: a model generalization. The result was called General Purpose Graphics Processing Units (GPGPU), a unified way to make general parallel computing using buffers in graphics memory and programming languages created specifically for that purpose [18, 24]. Examples of such languages include CUDA [25, 32, 8], OpenCL [23, 34, 22] and OpenACC .
However, GPGPU applications are hard to develop in essence. Differently from shaders in the rendering pipeline, it needs explicit synchronization and memory control. This more generic model came with the costs of complexity. Nonetheless, it was explored in a vast set of problems, including but not limited to Collision Detection and Response [17, 9], Physically-based Simulation, Fluid Dynamics , Global Illumination , Image and Video Processing [39, 7], Segmentation [26, 30], High Performance Computing and Clusters [10, 14], Signal Processing 1]15], Cryptography, Cryptocurrency Mining , Databases 
, Big Data and Data Science. All those applications proved the model robust.
However, it was the time to migrate from hard-to-develop GPGPU to application-specific platforms for the most interesting and important problems. Machine Learning was already experiencing a revolution at that time, with exciting results coming from the association of graphics hardware and deep neural networks. The first retail graphics cards with application-specific hardware had deep neural network training and inference sections, called tensor cores. Additionally, frameworks created on top of GPGPU libraries provided a simpler API for development.
Recently, the same approach was used to create a solution for Real time Global Illumination. The so-called RTX platform can produce faithful images using Ray Tracing (RT), which is historically known to have prohibitive performance for real time applications. This landmark creates interesting opportunities for new visualization applications. In particular, content makers for Virtual Reality (VR) can greatly benefit from the added realism to create immersive, meaningful experiences.
Thus, the demand for a VR/RT integrated solution is clear. However, realistic VR needs stereo images for parallax sensation. The obvious consequence is a duplication of the performance hit caused by ray tracing. A good algorithm should balance performance and image quality, something that can be done using RTX Ray Tracing and a proper trace policy. The recent announcement of ray tracing support for older architectures  emphasizes even more the necessity of a flexible algorithm for such task. Another point that must be taken into consideration is stereo camera registration. Depending on how the ray directions are calculated based on camera parameters, the stereo images may diverge when seen in a head mounted display (HMD).
This paper discusses the problem of integrating VR and RT, proposing a flexible solution. Section II describes the technological background used. Section III contains the details of the components needed for the VR/RT integration, which is described in depth in Section IV. Section V contains the evaluation of the experiments. Finally, Section VI is the conclusion.
Ii-a RTX Ray Tracing
NVidia RTX is a hardware and software platform with support for real time ray tracing. The ray tracing code of an application using this architecture consists of CPU host code, GPU device code, the memory to transfer data between them and the Acceleration Structures for fast geometry culling when intersecting rays and scene objects.
Specifically, the CPU host code manages the memory flow between devices, sets up, controls and spawn GPU shaders and defines the Acceleration Structures. On one hand, the bottom level Acceleration Structure contains the rendering primitives (triangles for example). On the other hand, the top level Acceleration Structure is a hierarchical grouping of bottom level ones. Finally, the GPU role is to run instances of the ray tracing shaders in parallel. This is similar to the well-established rasterization rendering pipeline.
Historically, GPUs process data in a predefined rendering pipeline, which has several programmable and fixed processing stages. The main idea is to start with a group of stages that process the vertices, feeding a fixed Rasterizer, which in its turn generates data for pixel processing in another stage group. Finally, the result image is output by the final fixed stage.
Currently, programmable shaders are very flexible in essence. The Vertex Shader works on the input vertices, using transformation matrices to map them to other spaces. The Hull, Tesselator and Domain Shaders subdivide geometry and add detail inside graphics memory, optimizing performance. The Geometry Shader processes primitives and mesh connectivity, possibly creating new primitives in the process. The Fragment Shader works on the pixels coming fromt the Rasterizer and the Output stage outputs the resulting image. Figure 1 shows the rendering pipeline in detail.
The ray tracing GPU device code runs in a similar pipeline scheme. The differences are the stages taken. The goal of the first stages is to generate the rays. Afterwards, a fixed intersection stage calculates the intersection of the rays with the scene geometry. Then, the intersection points are reported to the group of shading stages. Notice that more rays can be created at this point, resulting in a recursion in the pipeline. The final fixed stage outputs the generated image.
The details of the pipeline are as follows. A Ray Generation Shader is responsible for creating the rays, which are defined by their origins, directions and payloads (custom user-defined data). A call to TraceRay() launches a ray. The next stage is a fixed traversal of the Acceleration Structure, which is defined by the CPU host code beforehand. The Acceleration Traversal uses an Intersection Shader to calculate the intersections. All hits found pass by tests to verify if they are the closest hit or if they must be ignored because of transparent material. In case a transparent material is detected, the Any-Hit Shader is called for all hits so shading can be accumulated, for example. After no additional hits are found, the Closest-Hit Shader is called for the closest intersection point. In case no hits are found, the Miss Shader is called as a fallback case. It is important to note that additional rays can be launched in the Closest-Hit and Miss shaders. Figure 2 shows an in-depth pipeline scheme. More detailed information about RTX Ray Tracing can be seen in  and applications can be found in .
RTX Ray Tracing can be accessed in four ways. On one hand there are the low level APIs Vulkan , DirectX 12  and OptiX. They provide more flexibility but less productivity. On the other hand there is Falcor
, an open-source real-time rendering framework designed specifically for rapid prototyping. It has support for ray tracing shaders and is the recommended way to use RTX Ray Tracing in a scientific environment.
Falcor code and installation instructions can be found at https://github.com/NVIDIAGameWorks/Falcor . The bundle comes with a Visual Studio solution, structured in two main components: a library project called Falcor with high level components and Sample projects which use those components to perform computations, effects or to provide tools for other supportive purposes.
Each Sample project consists at least of a main class inheriting from Renderer and a Data folder. The Renderer class defines several relevant callbacks which can be overridden as necessary. Examples include onLoad(), onFrameRender(), onGuiRender(), onMouseEvent() and so forth. The Data folder is where non-C++ files necessary for the Sample (usually HLSL Shaders) should be placed. Falcor automatically copies them at compilation time to the binary’s folder so programs have no access problems.
Iii Ray Tracing VR
Our goal is to build a stereo-and-ray-tracing-capable renderer for VR. For this purpose, we will exploit the functionalities of Falcor that provide support for Stereo Rendering, Simple Ray Tracing and Global Illumination Path Tracing.
Falcor is designed to abstract scene and Acceleration Structure setup so our focus will be on describing Shader code and the CPU host code to set it up. The next subsections explain the logic for three Falcor Samples with the objective of using their components later on as building blocks for our new renderer. We refer to code in the Falcor Samples, so it is advisable to access it in conjunction with this section for a better understanding.
Iii-a Simple Ray Tracer
HelloDXR is a simple ray tracer with support for mirror reflections and shadows only. As would be expected, the Sample specifies two ray types: primary and shadow.
The ray generation shader rayGen() is responsible of converting pixel coordinates to ray directions. This is done by a transformation to normalized device coordinates followed by another transformation using the inverse view matrix and the camera field of view. The function TraceRay() is used to launch the rays. The ray type index and the payload are provided as parameters.
The Closest-Hit Shader primaryClosestHit() calculates the final color of the pixel. It has two components: an indirect reflection color and a direct color. The reflection color is calculated by getReflectionColor(), which reflects the ray direction using the surface normal and shoots an additional ray in that direction. The payload has a ray depth value used to limit recursion. The direct color is the sum of the contributions of each light source at the pixel, conditioned by the shadow check checkLightHit(). If the light source is not occluded, the contribution is calculated by evalMaterial(), a Falcor built-in function to shade pixels based on materials.
The PathTracer Sample implements the Path Tracing algorithm . It uses a RenderGraph to chain four rendering steps: G-Buffer rasterization, global illumination, accumulation and tone mapping. The graph is defined in CPU host code, using addPass() and addEdge() to create the passes and links their input and output respectively.
Each pass has its own Shaders. The G-Buffer pass uses the rasterization pipeline to output shading data to a set of textures. More specifically, the built-in function prepareShadingData() is used in a Pixel Shader to fetch and sample the material, whose data is output to the G-Buffers. Falcor default fallback Vertex Shader is used in this step.
The global illumination pass calculates direct and indirect contributions as well as shadows. It uses a ray generation shader and two ray types: indirect and shadow. Each type is defined in host code by a hit group (a closest-hit and an any-hit shader) and a miss shader. The setup is done using a descriptor and calling addHitGroup() and addMiss() respectively. Method setRayGen() is called to define the ray generation shader as well. Each ray type must have a unique index, which is referenced in the shaders. Shadow rays have index 0 and indirect ones have index 1.
The ray generation shader controls the path tracing. Briefly, the coordinates of each pixel are used to compute random number generator seeds, which are used to calculate random directions for indirect rays. The indirect sample is chosen randomly from the diffuse hemisphere or specular direction. An analogous idea is used for direct lighting as well, which is computed by randomly choosing a light source to check for visibility. As shown in , this integration converges to the complete evaluation of the illumination of the scene.
Now, the Hit Group and Miss Shaders are described. The shadow ray Shaders are very simple. The Miss Shader sets the ray visibility factor to 1 from the default 0, which means that the ray origin is visible to the light and should be lit by it. The Any-Hit shader just checks if the ray origin has a transparent material using the built-in evalRtAlphaTest() function. Case it is a transparent material, the hit is ignored so the ray can continue its path. There is no Closest-Hit Shader for shadows since the visibility factor should change only if no objects are hit.
Analogously, indirect rays have their own Shaders. The main difference between the ray types is the Closest-Hit Shader, which calculates the direct light at the intersection point and shoot an additional ray case the depth is bellow the global threshold. The Miss Shader samples a color from the environment map, indexed by the ray direction, and the Any-Hit Shader is equal to the shadow ray’s.
The accumulation pass is also very simple. Its CPU code maintains a texture with the previous frame and ensures that accumulation is done only when the camera is static. This image is accumulated with the current frame, coming from the global illumination pass. The accumulation consists of an incremental average.
Finally, Falcor built-in tone mapping is used to adjust the colors of the image. Class ToneMapping abstracts this pass.
The StereoRendering Sample is an application to render stereo image pairs using rasterization. The CPU host code ensures connection with the HMD (initVR()), issues the Shaders to generate the images and transfers them to the device (submitStereo()). Specifically, it maintains a struct containing the camera matrices and properties for both eyes. The geometry is drawn once, but it is duplicated inside the GPU by the Shaders. A frame buffer array with two elements is maintained for that purpose (mVrFbo). When a frame finishes, each array slice has the view of one eye.
The GPU code consists of a Vertex Shader, a Geometry Shader and a Pixel Shader. The Vertex Shader (StereoRendering.vs.hlsl) just passes ahead the vertex positions in world coordinates and additional rendering info such as normals, colors, bitangent, texture and light map coordinates, if available.
The projection is left to the Geometry Shader (StereoRendering.gs.hlsl), which is also responsible for duplicating the geometry. It receives as input three vertices of a triangle and outputs six vertices. Each input vertex is projected twice, once for each of the view-projection matrices available at the camera struct. The geometry for each eye is output into the related array slice by setting a render target index at struct GeometryOut.
Finally, the Pixel Shader (StereoRendering.ps.hlsl) is very simple. It just samples material data using the built-in function prepareShadingData() and accumulates the contributions of each light source using the built-in function evalMaterial().
Our goal is to develop a new renderer that combines the capabilities described in the previous section. It should be capable of stereo rendering and ray tracing in real time. This section describes the process and the possible choices and alternatives to address the problems encountered.
Iv-a Stereo Convergence
One key problem of integrating VR and RT is the stereo image registration. Depending on how this process is done, the images may diverge and it can be impossible for the human vision to focus correctly on the scene objects. This phenomenon may result in viewer discomfort or sickness.
To understand the ray generation process it is good to think about perspective projection and the several related spaces it involves.  contains an exceptional explanation of this topic, which will be summarized here.
Conceptually, the process consists of a chain of transformations starting at the world space, passing through the camera space and the normalized device coordinate space and ending at the raster space. The camera space is the world space with a translated origin to the camera position. The normalized device coordinate space is the camera space with the near and far planes transformed. The near plane is at the square with top-left corner at (0,0,0) and bottom-right corner at (1,1,0) and the far plane is at the square with top-left corner at (0,0,1) and bottom-right corner at (1,1,1). Finally, the raster space is the normalized device coordinate space scaled by the image resolution. Figure 3 shows how the spaces relate to each other.
The transformation for a projection camera can be constructed in two steps. First, building a canonical perspective matrix with distance to the near plane and distance to the far plane . The projected coordinates and are equal to the original ones divided by the coordinate. is remapped so the values in the near plane have and the values in the far plane have :
This operation can also be encoded as a matrix using homogeneous coordinates:
As a side note, the original position of the projection planes would be important for rasterization because the map of to is not linear, what could possibly result in numerical issues at depth test, for example. However, we are only interested in the projection directions for ray tracing, thus those distances can be totally arbitrary.
The second step is scaling the matrix so points inside the field of view project map to coordinates between on the view plane. For square images, both and lie between the expected interval after projection. Otherwise, the direction in which the image is narrower maps correctly, and the wider direction maps to a proportionally larger range of screen space values. The scaling factor that maps the wider direction to the range can be computed using the tangent of half of the field of view angle. More precisely, it is equal to:
as can be seen in Figure 4.
To launch rays from pixels we use the inverse transformation chain. We start at the raster space, passing through the normalized device coordinate and camera spaces and ending at the world space. More specifically, to compute a ray direction we must convert the raster coordinates of its associated pixel to normalized device coordinates, scale by the reciprocal of the factor used to map the field of view to the range and use the inverse view transformation matrix to map the result to the world space. The conversion from raster coordinates to normalized device coordinates , given the image dimensions , is expressed by the following equation:
which is composed of a normalization by the image dimensions and operations to map the resulting image space from the interval to .
The remains of the transformation chain can be done in a optimized way, using a precomputed tangent of half the field of view angle (in dimension ), the aspect ratio
and the basis vectors of the inverse view matrix. The operation is done by the expression:
As can be seen, this expression transforms the normalized device coordinates using the parts of the inverse view matrix that would affect each of the coordinates and scales them using the tangent of the field of view for dimension . The scale value is corrected by the aspect ratio for the dimension.
In our tests we could not generate correctly registered stereo images using this optimized expression, because it does not take into account stereo rendering. For this reason, we used two other approaches: the inverse of the projection matrix, and a rasterization G-Buffer pre-pass. Both options ensure correct stereo images, with different pros and cons. The first option does not need any additional rasterization pass or memory for the required texture. However, the G-Buffer provides more flexibility for the algorithm as will be discussed in Section IV-B. It is important to note that the positions in the texture are equivalent to intersection points of rays launched from the camera position. This property comes from the fact that the camera position is equivalent to the projection center and each ray is equivalent to the projection line for the associated pixel.
Iv-B Ray Tracing Overhead
The major drawback of usual stereo rendering is the overhead caused by the additional camera image. This problem is emphasized even more in ray tracing, which demands heavy computation to generate the images. Several techniques have been proposed to address this issue. They usually create the additional image by transforming the contents of the original or by temporal coherence using previous frames. However, artifacts not present when the scene is rendered twice can be introduced in the process.
The RTX platform opens new ways to explore this problem. Additionally, the extension of the ray tracing support for older graphics card architectures  encourages new algorithms based on smart ray usage. We benefit from Falcor’s design to explore and evaluate the possibilities using a methodology based on fast cycles of research, prototyping, integration and evaluation. The result is a list of several possible approaches, generated by changing component routines of a ray tracing algorithm. In summary, those changes result from the following questions.
How the first intersections are calculated?
Rays shot from camera position.
Which effects are applied?
Direct light and shadows only.
Perfect-mirror specular reflections and shadows only.
The different algorithms are created by combining the different functionalities described in Section III. We start by integrating Simple Ray Tracing and Stereo Rendering. On one hand, Stereo Rendering includes all logic to connect with the HMD and to control the data flow between the ray tracing shaders and the device. On the other hand, Simple Ray Tracing features a ray tracing shader, which is modified to launch rays based on two view matrices or two position G-Buffers, one for each eye. On the G-Buffer case a rasterization pre-pass is also performed, analogously to the Path Tracing.
Next, PathTracer’s components are integrated, enabling better control of the effects applied. We benefit from Falcor’s RenderGraph, which is extremely useful for algorithms with multiple rendering passes. The changes needed are listed next.
Adding an additional mirror ray type, equivalent to the primary ray type.
Including a function to compute direct light with shadows only. If the G-Buffer is available, the direct contribution comes for free from it.
Adding a branch in the Ray Generation Shader to choose between the effects: raster, direct light plus shadows, specular reflections and path tracing.
An interesting question arises when we analyse the current algorithm. A ray tracing procedure with a G-Buffer pre-pass is actually a hybrid algorithm based on both rasterization and ray tracing. What if we extrapolate this hybrid paradigm and allow materials to be raster or ray-traced in the scene? This question generated an additional change in the integrated renderer. We introduced a material ID to enable support for per-material effect selection. With this feature, an user can control performance by changing the material IDs of objects in the scene from more complex to simpler effects and vice versa. The final algorithm is very flexible and suited for stereo ray tracing or for older graphics card architectures, environments where performance matters.
In Section IV we pointed out component routines that could be changed in a ray tracing algorithm to achieve different levels of performance and image quality. In this section we are interested in measure and evaluate those changes. Our goal is to find the best solution of compromise between those two indices.
The evaluation methodology consists of several tests using three scenes: Falcor Arcade , Epic Games Unreal Engine 4 Sun Temple  and Amazon Lumberyard Bistro . The Arcade is a very simple scene with minimal geometry, used as a toy example. The Temple and the Bistro are part of the Open Research Content Archive (ORCA) and are more dense, with sizes comparable to scenes actually encountered in games and VR experiences.
All tests are done in a PC with an Intel(R) Core(TM) i7-8700 CPU, 16GB RAM, a RTX 2080 and a HTC Vive. The target performance is 90 fps, the HMD update frequency. Depending on how far the application is from this value, the device starts to reproject images and to lose frames, what can result in motion sickness for the user. Everything that could perform above 90 fps is also capped to that value. In our tests, users reported that performances near 45 fps are good, with minimal hickups noticed due to reprojection. Timings below this value started to feel uncomfortable. The values include the image transfer to the HMD.
V-a G-Buffers and Stereo
The first test evaluates the impact of the G-Buffer pre-pass and stereo rendering. As discussed in Section IV, the hybrid algorithm depends on G-Buffers, so it is important to assess their viability early on. The methodology is:
We have two controls in the experiment: a mono raster shader and a stereo raster shader.
All ray tracing materials use the mirror-like reflection shader with one bounce.
When stereo is not enabled, only the images for the left eye are generated.
As expected, the stereo rendering is the bottleneck and the G-Buffer overhead is negligible in comparison with it. Thus, our proposal is to use the hybrid algorithm to customize scenes and balance the indices. The next step is to measure how materials interact with them.
The second test focuses on material effects. The idea is to use the hybrid algorithm and change the material ids on-the-fly to balance quality and performance. The methodology is:
Stereo is always enabled.
The camera position is fixed.
Materials are changed to balance image quality and performance. There are three possibilities.
Raster and ray-traced shadows.
Ray-traced mirror-like reflections and shadows.
The application supports an additional Path Tracing shader. However, as discussed in Section III-B, the algorithm needs to accumulate samples over frames from a static camera to eliminate noise. This restriction is hard to be imposed in a VR experience, where the camera is controlled by the user’s head movement. Thus, we will not be using this shader in the experiments.
The Arcade stays at 90 fps regardless of effect choice so we only show the best result in Figure 5, obtained when using the ray tracing shader with shadows and reflections. In Figure 6 we show the results for the Sun Temple, while Figure 7 contain the results for the Bistro. Moreover, Tables III and IV quantify the performance for each case.
|Raster + ray-traced shadows||90|
|Ray-traced reflections on statues and wall decorations||75|
|Ray-traced Reflections on everything||45|
|Raster + ray-traced shadows||45|
|Ray-traced Reflections and shadows||45|
The use of effects drastically changes the mood and fidelity of the scenes, resulting in a much better immersive experience. It is important to remark that Tables III and IV only show a small subset of all possible material setups. Those are the ones with minimal interaction to change the ids. However, with proper material id tweak, the application can achieve even faster frame rates while maintaining image quality. We could think an automatic algorithm to set ids based on importance values given by artists and distance of objects from the camera. Non-important or very distant objects could be set as raster or raster plus shadows, instead of other heavier materials.
In this paper we presented Ray-VR, a very flexible algorithm to integrate real time RT in VR. At the time of manuscript submission, even Epic Games and Unity Technologies, big players in the VR market, do not have support for real time VR/RT in their game engine solutions. As far as we know, Ray-VR is the first algorithm to be successful at such task.
Ray-VR performance is very flexible in essence. It can adapt a VR experience to different hardware constraints. High performance devices can benefit from high quality ray-traced images, creating more immersive environments. However, other devices can still run the experience in real time, but with less effects.
The algorithm is also totally compatible with current VR creation workflow. The user interaction needed to change material ids is straight-forward, suited for artists at asset creation time, for developers at development time and for designers at testing time.
The human interaction needed to change the material ids can also be considered a limitation, however. Ideally, we want an algorithm that changes the material ids automatically. With this in mind, we briefly described improvements that could converge to a solution in Section V-B
. The artists could assign importance values to the assets at creation time. This methodology in conjunction with other heuristics such as object distance, for example, could result in an automatic algorithm for material id setup. It could also optimize the importance value based on the original ones given by the artists and a given fps budget. Ray-VR is flexible enough to support such operations after small changes in the current algorithm.
An intuitive example is the statue at the Sun Temple. It is by far the most important asset in the scene and could have a high importance value. Walls for example, could receive much less attention, since they usually are part of the background of the scene. A more sophisticate attempt could be to create a neural network to learn how to set the material ids with scene examples in order to optimize performance and image quality.
-  (2008) Gpucv: an opensource gpu-accelerated framework forimage processing and computer vision. In Proceedings of the 16th ACM international conference on Multimedia, pp. 1089–1092. Cited by: §I.
-  (2018) Introduction to the vulkan graphics api. In ACM SIGGRAPH 2018 Courses, SIGGRAPH ’18, New York, NY, USA, pp. 3:1–3:146. External Links: Cited by: §II-B.
-  (2010) Accelerating sql database operations on a gpu with cuda. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 94–103. Cited by: §I.
-  (2018) The Falcor Rendering Framework. Note: https://github.com/NVIDIAGameWorks/Falcor External Links: Cited by: §II-B, §II-B, §V.
-  (2019-03) Accelerating The Real-Time Ray Tracing Ecosystem: DXR For GeForce RTX and GeForce GTX. External Links: Cited by: §I, §IV-B.
Big data deep learning: challenges and perspectives. IEEE access 2, pp. 514–525. Cited by: §I.
-  (2010) Exploring nvidia-cuda for video coding. In Proceedings of the first annual ACM SIGMM conference on Multimedia systems, pp. 13–22. Cited by: §I.
-  (2013) CUDA programming: a developer’s guide to parallel computing with gpus. 1st edition, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. External Links: Cited by: §I.
-  (2014) Lazy work stealing for continuous hierarchy traversal on deformable bodies. In 2014 International Conference on Computer Graphics Theory and Applications (GRAPP), pp. 1–8. Cited by: §I.
-  (2009) Accelerating linpack with cuda on heterogenous clusters. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp. 46–51. Cited by: §I.
-  (2004) GPU gems: programming techniques, tips, and tricks for real-time graphics. Vol. 590, Addison-Wesley Reading. Cited by: §I.
-  (2017-10) Unreal engine sun temple, open research content archive (orca). Note: http://developer.nvidia.com/orca/epic-games-sun-temple External Links: Cited by: §V.
-  E. Haines and T. Akenine-Möller (Eds.) (2019) Ray Tracing Gems. Apress. Note: http://raytracinggems.com Cited by: §II-A.
-  (2010) An mpi-cuda implementation for massively parallel incompressible flow computations on multi-gpu clusters. In 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, pp. 522. Cited by: §I.
-  (2008) Neural network implementation using cuda and openmp. In 2008 Digital Image Computing: Techniques and Applications, pp. 155–161. Cited by: §I.
-  (1986) The rendering equation. In Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’86, New York, NY, USA, pp. 143–150. External Links: Cited by: §III-B, §III-B.
-  (2010) GProximity: hierarchical gpu-based operations for collision and distance queries. In Computer Graphics Forum, Vol. 29, pp. 419–428. Cited by: §I.
-  (2006) GPGPU: general-purpose computation on graphics hardware. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC ’06, New York, NY, USA. External Links: Cited by: §I.
-  (2017-07) Amazon lumberyard bistro, open research content archive (orca). Note: http://developer.nvidia.com/orca/amazon-lumberyard-bistro External Links: Cited by: §V.
-  (2016) Introduction to 3d game programming with directx 12. Mercury Learning & Information, USA. External Links: Cited by: §II-B.
-  (2007) CUDA compatible gpu as an efficient hardware accelerator for aes cryptography. In 2007 IEEE International Conference on Signal Processing and Communications, pp. 65–68. Cited by: §I.
-  (2011) OpenCL programming guide. Pearson Education. Cited by: §I.
-  (2009) The opencl specification. In 2009 IEEE Hot Chips 21 Symposium (HCS), pp. 1–314. Cited by: §I.
-  (2007) Gpu gems 3. Addison-Wesley Professional. Cited by: §I.
-  (2008-03) Scalable parallel programming with cuda. Queue 6 (2), pp. 40–53. External Links: Cited by: §I.
-  (2008) Implementation of medical image segmentation in cuda. In 2008 International Conference on Information Technology and Applications in Biomedicine, pp. 82–85. Cited by: §I.
-  (2010) OptiX: A General Purpose Ray Tracing Engine. In ACM SIGGRAPH 2010 Papers, SIGGRAPH ’10, New York, NY, USA, pp. 66:1–66:13. Note: event-place: Los Angeles, California External Links: Cited by: §I, §II-B.
-  (2005) Gpu gems 2: programming techniques for high-performance graphics and general-purpose computation. Addison-Wesley Professional. Cited by: §I.
-  (2016) Physically based rendering: from theory to implementation. Morgan Kaufmann. Cited by: Fig. 3, Fig. 4, §IV-A.
-  (2011) GSLIC: a real-time implementation of slic superpixel segmentation. University of Oxford, Department of Engineering, Technical Report. Cited by: §I.
-  (2009) Implementation of a lattice–boltzmann method for numerical fluid mechanics using the nvidia cuda technology. Computer Science-Research and Development 23 (3-4), pp. 241–247. Cited by: §I.
-  (2010) CUDA by example: an introduction to general-purpose gpu programming, portable documents. Addison-Wesley Professional. Cited by: §I.
-  (2016) Vulkan Programming Guide: The Official Guide to Learning Vulkan. Always learning, Addison-Wesley. External Links: Cited by: §II-B.
-  (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12 (3), pp. 66. Cited by: §I.
-  (2013) Bitcoin and the age of bespoke silicon. In 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pp. 1–10. Cited by: §I.
-  (2009) High-performance signal processing on emerging many-core architectures using cuda. In 2009 IEEE International Conference on Multimedia and Expo, pp. 1825–1828. Cited by: §I.
-  (2012) OpenACC—first experiences with real-world applications. In European Conference on Parallel Processing, pp. 859–870. Cited by: §I.
-  (2018-08) Introduction to DirectX Raytracing. In ACM SIGGRAPH 2018 Courses, Note: event-place: Vancouver, British Columbia External Links: Cited by: Fig. 1, Fig. 2, §II-A, §II-B.
-  (2008) Parallel image processing based on cuda. In 2008 International Conference on Computer Science and Software Engineering, Vol. 3, pp. 198–201. Cited by: §I.